Module 6

Top

Advanced Plotting in R

Objectives

At the and of this module you will have been demonstrated a good range of useful plots, with instructions to help understand the code. The grammar of graphics (ggplot) is very useful for a wide range of plots, and this module introduces some useful aspects. The page on Plotting gives some general comments on options that can be changed within the plots

Contents

  1. Simple plotting and mean separation tables for 1-way ANOVA
  2. Plotting Likert scales plot for 6 groups
  3. Putting in a few summary values and making a plot
  4. Simple plotting of standard errors on a plot
  5. Plotting of confidence intervals
  6. Plotting of regression lines and confidence bands
  7. The use of the  package lattice for plotting
  8. Plotting of proportions as mosaic plots
  9. Plotting of main effects and interactions
  10. Using the package ggplot2
  11. Residual plots from models
  12. Heat plots
  13. Biplots

Back to top

1. Looking at a simple one way completely randomised ANOVA design

ANOVA design getting output and looking at the mean separation, using library(lattice).  Click here to download  the pdf of this leson BesT_Mod6_ANOVAMelonPlot

(Melon yield data from Mead and Curnow One Way ANOVA)

melon <- read.csv("melon.csv")
# View(melon)


head(melon)  ## shows the top six lines of data
   Group Yield
1     A 25.12
2     A 17.25
3     A 26.42
4     A 16.08
5     A 22.15
6     A 15.92
#str(melon)   ## not run; gives the data structure
## Show there is a Factor in the data 

The data consists of two columns, one is a category (the group)

names(melon)
[1] "Group" "Yield"
plot(melon)  ## Boxplot

melon.mod1  <-  lm(Yield ~ Group, data=melon)  ## set up model


####   print coefficients from the model
melon.mod1
## 
## Call:
## lm(formula = Yield ~ Group, data = melon)
## 
## Coefficients:
## (Intercept)       GroupB       GroupC       GroupD  
##      20.490       16.913       -1.007        9.407
aov(melon.mod1)
## Call:
##    aov(formula = melon.mod1)
## 
## Terms:
##                     Group Residuals
## Sum of Squares  1292.2103  367.2364
## Deg. of Freedom         3        20
## 
## Residual standard error: 4.285069
## Estimated effects may be unbalanced
summary(melon.mod1)
The output show the model, the summary of residuals and the coefficients of the given model as intercept and treatment effects. We generally require the means of the groups for reporting. 

## Call:
## lm(formula = Yield ~ Group, data = melon)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.0633 -2.5475 -0.5933  3.1633  6.4167 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   20.490      1.749  11.713 2.08e-10 ***
## GroupB        16.913      2.474   6.836 1.21e-06 ***
## GroupC        -1.007      2.474  -0.407  0.68840    
## GroupD         9.407      2.474   3.802  0.00112 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.285 on 20 degrees of freedom
## Multiple R-squared:  0.7787, Adjusted R-squared:  0.7455 
## F-statistic: 23.46 on 3 and 20 DF,  p-value: 9.316e-07
replications(Yield ~ Group, data=melon)

## Group ## 6

A test for balance, we can ask for Replications:

!is.list(replications(Yield ~ Group, data=melon))

## [1] TRUE

melon.aov  <-  aov(melon.mod1)  ## make a melon anova object
melon.aov
## Call:
##    aov(formula = melon.mod1)
## 
## Terms:
##                     Group Residuals
## Sum of Squares  1292.2103  367.2364
## Deg. of Freedom         3        20
## 
## Residual standard error: 4.285069
## Estimated effects may be unbalanced
summary(melon.aov)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## Group        3 1292.2   430.7   23.46 9.32e-07 ***
## Residuals   20  367.2    18.4                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We had previously checked that this design was balanced. We can also get the least significant difference using a function from the package agricolae

 $statistics
##       Mean       CV  MSerror
##   26.81833 15.97813 18.36182
## 
## $parameters
##   Df ntr  t.value alpha       test name.t
##   20   4 2.085963  0.05 Fisher-LSD  Group
## 
## $means
##      Yield      std r      LCL      UCL   Min   Max
## A 20.49000 4.694422 6 16.84087 24.13913 15.92 26.42
## B 37.40333 3.950497 6 33.75421 41.05246 31.98 43.32
## C 19.48333 5.552551 6 15.83421 23.13246 11.42 25.90
## D 29.89667 2.229894 6 26.24754 33.54579 27.58 33.20
## 
## $comparison
##       Difference pvalue sig.        LCL        UCL
## A - B -16.913333 0.0000  *** -22.073978 -11.752689
## A - C   1.006667 0.6884       -4.153978   6.167311
## A - D  -9.406667 0.0011   ** -14.567311  -4.246022
## B - C  17.920000 0.0000  ***  12.759356  23.080644
## B - D   7.506667 0.0066   **   2.346022  12.667311
## C - D -10.413333 0.0004  *** -15.573978  -5.252689
## 
## $groups
## NULL
##       Difference pvalue sig.        LCL        UCL
## A - B -16.913333 0.0000  *** -22.073978 -11.752689
## A - C   1.006667 0.6884       -4.153978   6.167311
## A - D  -9.406667 0.0011   ** -14.567311  -4.246022
## B - C  17.920000 0.0000  ***  12.759356  23.080644
## B - D   7.506667 0.0066   **   2.346022  12.667311
## C - D -10.413333 0.0004  *** -15.573978  -5.252689
## Tables of means
## Grand mean
##          
## 26.81833 
## 
##  Group 
## Group
##     A     B     C     D 
## 20.49 37.40 19.48 29.90 
## 
## Standard errors for differences of means
##         Group
##         2.474
replic.     6

The Grand mean of the experiment can be obtained, and the means for the groups.

Tables of effects
Group
    A      B      C      D 
 -6.328 10.585 -7.335  3.078

The following from the multicomp library gives the pairwise comparisons using Tukeys

library(multcomp)
TestPairsT <- glht(melon.aov, linfct = mcp(Group = "Tukey"))
summary(TestPairsT) # pairwise tests
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts## 
## Fit: aov(formula = melon.mod1)
## 
## Linear Hypotheses:
##            Estimate Std. Error t value Pr(>|t|)    
## B - A == 0   16.913      2.474   6.836  < 0.001 ***
## C - A == 0   -1.007      2.474  -0.407  0.97662    
## D - A == 0    9.407      2.474   3.802  0.00556 ** 
## C - B == 0  -17.920      2.474  -7.243  < 0.001 ***
## D - B == 0   -7.507      2.474  -3.034  0.03108 *  
## D - C == 0   10.413      2.474   4.209  0.00224 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

This is a small example using ggplot2.  Labels are entered as lists, and then the counts in each of the Likert category as a dataframe, which uses the label lists too. It shows how to add data and plot 6 plots in a grid. The x axis is a Likert scale. (see Technical Asides for information about Likert scores). This would also work for an ordered categorical factor. BesT_Mod6_GGPLOT2bar

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.5
typeBus <- c("Producers", "Importers", "Manufacturers", "Retailers",
          "Inspectors", "Standards")

subcat <- c("SD", "D", "N", "A", "SA")

head(dat1 <- data.frame(sector =  gl(6, 5, length = 30, labels = typeBus),
                        Preferences = gl(5, 1, length = 30, labels = subcat),
                        perc = c(2, 4, 16, 45, 35, 15, 25, 35, 13, 7,
                                    10, 20, 40, 25, 10, 8, 20,40, 25, 8,
                                    4, 10, 32, 38, 14, 3, 8, 28, 42, 28)))
##      sector Preferences perc
## 1 Producers          SD    2
## 2 Producers           D    4
## 3 Producers           N   16
## 4 Producers           A   45
## 5 Producers          SA   35
## 6 Importers          SD   15
## bar plot

ggplot(dat1, aes(Preferences, perc)) + facet_wrap(~sector, nrow = 2) +
  geom_bar(stat = "identity") + ylab("Respondents (%)")

## bar plot with thinner bars

ggplot(dat1, aes(Preferences, perc)) + facet_wrap(~sector, nrow = 2) +
  geom_bar(stat = "identity", width = 0.6) + ylab("Respondents (%)")

Putting in a few summary values and making a plot

Simple plotting of standard errors on a plot

Plotting of confidence intervals

Plotting of regression lines and confidence bands

The use of the  package lattice for plotting

Plotting of proportions as mosaic plots

Plotting of main effects and interactions

Using the package ggplot2

Residual plots from models

Heat plots

Biplots

 

Leave a Reply