Advanced Plotting in R
Objectives
At the and of this module you will have been demonstrated a good range of useful plots, with instructions to help understand the code. The grammar of graphics (ggplot) is very useful for a wide range of plots, and this module introduces some useful aspects. The page on Plotting gives some general comments on options that can be changed within the plots
Contents
- Simple plotting and mean separation tables for 1-way ANOVA
- Plotting Likert scales plot for 6 groups
- Putting in a few summary values and making a plot
- Simple plotting of standard errors on a plot
- Plotting of confidence intervals
- Plotting of regression lines and confidence bands
- The use of the package lattice for plotting
- Plotting of proportions as mosaic plots
- Plotting of main effects and interactions
- Using the package ggplot2
- Residual plots from models
- Heat plots
- Biplots
1. Looking at a simple one way completely randomised ANOVA design
ANOVA design getting output and looking at the mean separation, using library(lattice). Click here to download the pdf of this leson BesT_Mod6_ANOVAMelonPlot
(Melon yield data from Mead and Curnow One Way ANOVA)
melon <- read.csv("melon.csv")
# View(melon)
head(melon) ## shows the top six lines of data
Group Yield
1 A 25.12
2 A 17.25
3 A 26.42
4 A 16.08
5 A 22.15
6 A 15.92
#str(melon) ## not run; gives the data structure
## Show there is a Factor in the data
The data consists of two columns, one is a category (the group)
names(melon)
[1] "Group" "Yield"
plot(melon) ## Boxplot
melon.mod1 <- lm(Yield ~ Group, data=melon) ## set up model
#### print coefficients from the model
melon.mod1
##
## Call:
## lm(formula = Yield ~ Group, data = melon)
##
## Coefficients:
## (Intercept) GroupB GroupC GroupD
## 20.490 16.913 -1.007 9.407
aov(melon.mod1)
## Call:
## aov(formula = melon.mod1)
##
## Terms:
## Group Residuals
## Sum of Squares 1292.2103 367.2364
## Deg. of Freedom 3 20
##
## Residual standard error: 4.285069
## Estimated effects may be unbalanced
summary(melon.mod1)
The output show the model, the summary of residuals and the coefficients of the given model as intercept and treatment effects. We generally require the means of the groups for reporting.
## Call:
## lm(formula = Yield ~ Group, data = melon)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.0633 -2.5475 -0.5933 3.1633 6.4167
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 20.490 1.749 11.713 2.08e-10 ***
## GroupB 16.913 2.474 6.836 1.21e-06 ***
## GroupC -1.007 2.474 -0.407 0.68840
## GroupD 9.407 2.474 3.802 0.00112 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.285 on 20 degrees of freedom
## Multiple R-squared: 0.7787, Adjusted R-squared: 0.7455
## F-statistic: 23.46 on 3 and 20 DF, p-value: 9.316e-07
replications(Yield ~ Group, data=melon)
## Group ## 6
A test for balance, we can ask for Replications:
!is.list(replications(Yield ~ Group, data=melon))
## [1] TRUE
melon.aov <- aov(melon.mod1) ## make a melon anova object
melon.aov
## Call:
## aov(formula = melon.mod1)
##
## Terms:
## Group Residuals
## Sum of Squares 1292.2103 367.2364
## Deg. of Freedom 3 20
##
## Residual standard error: 4.285069
## Estimated effects may be unbalanced
summary(melon.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## Group 3 1292.2 430.7 23.46 9.32e-07 ***
## Residuals 20 367.2 18.4
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We had previously checked that this design was balanced. We can also get the least significant difference using a function from the package agricolae
$statistics
## Mean CV MSerror
## 26.81833 15.97813 18.36182
##
## $parameters
## Df ntr t.value alpha test name.t
## 20 4 2.085963 0.05 Fisher-LSD Group
##
## $means
## Yield std r LCL UCL Min Max
## A 20.49000 4.694422 6 16.84087 24.13913 15.92 26.42
## B 37.40333 3.950497 6 33.75421 41.05246 31.98 43.32
## C 19.48333 5.552551 6 15.83421 23.13246 11.42 25.90
## D 29.89667 2.229894 6 26.24754 33.54579 27.58 33.20
##
## $comparison
## Difference pvalue sig. LCL UCL
## A - B -16.913333 0.0000 *** -22.073978 -11.752689
## A - C 1.006667 0.6884 -4.153978 6.167311
## A - D -9.406667 0.0011 ** -14.567311 -4.246022
## B - C 17.920000 0.0000 *** 12.759356 23.080644
## B - D 7.506667 0.0066 ** 2.346022 12.667311
## C - D -10.413333 0.0004 *** -15.573978 -5.252689
##
## $groups
## NULL
## Difference pvalue sig. LCL UCL
## A - B -16.913333 0.0000 *** -22.073978 -11.752689
## A - C 1.006667 0.6884 -4.153978 6.167311
## A - D -9.406667 0.0011 ** -14.567311 -4.246022
## B - C 17.920000 0.0000 *** 12.759356 23.080644
## B - D 7.506667 0.0066 ** 2.346022 12.667311
## C - D -10.413333 0.0004 *** -15.573978 -5.252689
## Tables of means ## Grand mean ## ## 26.81833 ## ## Group ## Group ## A B C D ## 20.49 37.40 19.48 29.90 ## ## Standard errors for differences of means ## Group ## 2.474 replic. 6
The Grand mean of the experiment can be obtained, and the means for the groups.Tables of effects Group A B C D -6.328 10.585 -7.335 3.078
The following from the multicomp library gives the pairwise comparisons using Tukeys
library(multcomp)
TestPairsT <- glht(melon.aov, linfct = mcp(Group = "Tukey"))
summary(TestPairsT) # pairwise tests
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Tukey Contrasts##
## Fit: aov(formula = melon.mod1)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## B - A == 0 16.913 2.474 6.836 < 0.001 ***
## C - A == 0 -1.007 2.474 -0.407 0.97662
## D - A == 0 9.407 2.474 3.802 0.00556 **
## C - B == 0 -17.920 2.474 -7.243 < 0.001 ***
## D - B == 0 -7.507 2.474 -3.034 0.03108 *
## D - C == 0 10.413 2.474 4.209 0.00224 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
2. Plotting a small Likert dataset for 6 groups1.
This is a small example using ggplot2. Labels are entered as lists, and then the counts in each of the Likert category as a dataframe, which uses the label lists too. It shows how to add data and plot 6 plots in a grid. The x axis is a Likert scale. (see Technical Asides for information about Likert scores). This would also work for an ordered categorical factor. BesT_Mod6_GGPLOT2bar
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.5
typeBus <- c("Producers", "Importers", "Manufacturers", "Retailers",
"Inspectors", "Standards")
subcat <- c("SD", "D", "N", "A", "SA")
head(dat1 <- data.frame(sector = gl(6, 5, length = 30, labels = typeBus),
Preferences = gl(5, 1, length = 30, labels = subcat),
perc = c(2, 4, 16, 45, 35, 15, 25, 35, 13, 7,
10, 20, 40, 25, 10, 8, 20,40, 25, 8,
4, 10, 32, 38, 14, 3, 8, 28, 42, 28)))
## sector Preferences perc
## 1 Producers SD 2
## 2 Producers D 4
## 3 Producers N 16
## 4 Producers A 45
## 5 Producers SA 35
## 6 Importers SD 15
## bar plot
ggplot(dat1, aes(Preferences, perc)) + facet_wrap(~sector, nrow = 2) +
geom_bar(stat = "identity") + ylab("Respondents (%)")
## bar plot with thinner bars
ggplot(dat1, aes(Preferences, perc)) + facet_wrap(~sector, nrow = 2) +
geom_bar(stat = "identity", width = 0.6) + ylab("Respondents (%)")
Leave a Reply