One-Way ANOVA for Percentage Weed Control Data

Image credit: Charles Deluvio

This post is also published on the Open Weed Science website.

Background

Assessments of visual weed control in response to herbicide treatments is likely the most common experiment and data analysis conducted within the Weed Science discipline. The goal of these experiments is to investigate herbicides that promote the best level of control of targeted weed species. These experiments are usually organized in a one-way randomized complete block design with multiple treatments (herbicides). At a pre-established number of days after treatment (DAT; researcher choice), a percent (%) visual control rate is given to each targeted weed. The weed control rate varies from 0% (no weed control) to 100% (complete weed control). Therefore, before setting up the experiments, the researcher is aware that the results will range from 0 to 100%.

This type of dataset typically does not follow a normal (Gaussian) distribution. Herein we use the beta distribution (logit scale), a continuous probability distribution where response variable values range from 0 to 1. Before starting the analysis, the weed control rates (0-100%) should be converted to the interval 0-1. For example, if the weed control of Palmer amaranth is 90%, it should be listed as 0.900 in your excel spreadsheet file. Also, beta distribution will not accept 0 nor 1 values. Thus, weed control of 0% and 100% should be adjusted to 0.001 and 0.999, respectively.

Getting started

R is a programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing.

RStudio is an integrated development environment for R, a programming language for statistical computing and graphics.

Create a new R project file

  • Click in File -> New project… -> New directory -> New Project -> Save the R work directory anywhere you want in your laptop.

The saved file will have an R project and you should copy and paste all your raw data (excel file) in the same work directory (folder).

  • Click in File -> New File -> R script (.R)

or

  • Click in File -> New File -> R markdown… (.Rmd)

Although I like working within R markdown, R script is easier to work with in R studio.

Install packages

In order to accomplish this exercise, you will need to install the following R packages:

install.packages(tidyverse)
install.packages(glmmTMB)
install.packages(lme4)
install.packages(lmerTest)
install.packages(emmeans)
install.packages(RCurl)
install.packages(car)
install.packages(kableExtra)

Run all codes above by clicking in the “Run” option in the top right corner of you R script or R markdown. These codes will install the necessary packages for analyzing weed control (%) experiments. Once you install these packages you will not need to install them again, unless you update R and/or RStudio.

Data

  • Run the package below
library(tidyverse)
library(RCurl)

The data used for this exercise is from a published manuscript by Oliveira et al. 2017 investigating control of a waterhemp (Amaranthus tuberculatus) population with several with several postemergence herbicides. For this exercise, herbicides are named A, B, C, D, and E. In addition, only results from year 2014 is presented. Run the following codes to load the data into R:

df_path <- getURL("https://raw.githubusercontent.com/openweedsci/data/master/posts/control.csv")
weedcont <- read_csv(df_path)

After running the codes above, the data should appear as weedcont (“control” was chosen as the name of the dataset; you could have called it “data” or something else).

weedcont
## # A tibble: 15 x 4
##     year herbicide   rep control
##    <dbl> <chr>     <dbl>   <dbl>
##  1  2014 A             1    0.99
##  2  2014 A             2    0.99
##  3  2014 A             3    0.99
##  4  2014 B             1    0.95
##  5  2014 B             2    0.99
##  6  2014 B             3    0.8 
##  7  2014 C             1    0.95
##  8  2014 C             2    0.85
##  9  2014 C             3    0.9 
## 10  2014 D             1    0.5 
## 11  2014 D             2    0.6 
## 12  2014 D             3    0.5 
## 13  2014 E             1    0.9 
## 14  2014 E             2    0.85
## 15  2014 E             3    0.8

The dataset contains 2 years (2013 and 2014), 5 herbicides (A, B, C, D, E), 3 reps (1, 2 and 3), and visual control of waterhemp (converted to proportions [0.001-0.999] already).

Levene’s Test for Homogeneity of Variance

  • Run the packages below
library(car)

Levene’s Test of the null that the variances amongst herbicide treatments are the same.

  • Levene’s test is robust to departures from normality.

leveneTest function

leveneTest(weedcont$control,  weedcont$herbicide)
## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  4  0.6828 0.6197
##       10

Results show p-value = 0.6197 thus the null hypothesis is accepted indicating that variances amongst herbicide treatments are the same.

Model

  • Run the packages below
library(glmmTMB)
library(lme4)
library(lmerTest)
  • Generalized Linear Mixed Models using Template Model Builder by Mollie Brooks

  • Beta family with logit

  • Mixed model

The model has control as a response variable, herbicide as fixed effects, and rep as random effects.

glmmTMB function

model <- glmmTMB(control ~ herbicide + (1|rep), beta_family(link = "logit"), data=weedcont)
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you

## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you

## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you

Anova

Anova.glmmTMB function

glmmTMB:::Anova.glmmTMB(model)
## Analysis of Deviance Table (Type II Wald chisquare tests)
## 
## Response: control
##            Chisq Df Pr(>Chisq)    
## herbicide 83.163  4  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The P-value < 2.2e-16; there is a lower evidence that a difference of some magnitude between herbicides tested could occur by chance alone assuming null hypothesis is true.

Cheking the herbicide control on waterhemp

  • Run the package below
library(emmeans)

Interaction-style plots for estimated marginal means (emmip)

  • type=“response”, herbicide and intervals back from the logit scale

  • coord_flip(), inverted the axis (better visualization)

emmip function

emmip(model, ~herbicide, type="response") 

This figure helps visualize waterhemp control given herbicide treatments. It is useful in treatments arranged in factorial design.

Estimated marginal means (least-squares means)

  • type=“response”, herbicide and intervals backtransformed from the logit scale

  • cont=“pairwise”, comparisons between each herbicide treatments

  • adjust=“none”, fisher’s least significant difference (try tukey)

The lsmeans provides the weed control (prop), SE (standard error) and confidence intervals (lower.CL and upper.CL). Moreover, lsmeans provides the pairwise contrasts between treatments (herbicides).

emmeans function

lsmeans <- emmeans(model, ~ herbicide, cont="pairwise", adjust="none", type="response", alpha=0.05)
lsmeans
## $emmeans
##  herbicide response     SE df lower.CL upper.CL
##  A            0.979 0.0112  8    0.930    0.994
##  B            0.940 0.0220  8    0.864    0.974
##  C            0.897 0.0282  8    0.812    0.947
##  D            0.533 0.0471  8    0.424    0.638
##  E            0.845 0.0335  8    0.752    0.908
## 
## Confidence level used: 0.95 
## Intervals are back-transformed from the logit scale 
## 
## $contrasts
##  contrast odds.ratio      SE df t.ratio p.value
##  A / B         2.972  2.0017  8  1.618  0.1444 
##  A / C         5.293  3.1360  8  2.813  0.0228 
##  A / D        40.556 23.1658  8  6.482  0.0002 
##  A / E         8.470  4.9722  8  3.640  0.0066 
##  B / C         1.781  0.8957  8  1.147  0.2845 
##  B / D        13.644  5.8918  8  6.052  0.0003 
##  B / E         2.850  1.3197  8  2.261  0.0536 
##  C / D         7.662  2.7531  8  5.667  0.0005 
##  C / E         1.600  0.6313  8  1.192  0.2675 
##  D / E         0.209  0.0665  8 -4.921  0.0012 
## 
## Tests are performed on the log odds ratio scale

Plot lsmeans

  • comparisons=TRUE, comparisons between treatments with red arrows

  • type=“response”, herbicide and intervals backtransformed from the logit scale

  • alpha=0.05, significance level to use in constructing comparison arrows

  • adjust=“none”, Fisher’s least significant difference (try tukey)

plot function

plot(lsmeans, ~ herbicide, comparisons=TRUE, type="response", alpha=0.05, adjust="none")

The black dot represents the mean waterhemp control whereas the purple represents the confidence intervals for each herbicide. Red arrows indicate the comparison across herbicide treatments, whereas if red arrows do not overlap, treatments are different. See contrasts from lsmeans for validation.

Extract and dsplay information on all pairwise comparisons of estimated marginal means

  • alpha=0.05, numeric value giving the significance level for the comparisons

  • Letters=letters, display letter grouping treatments according to pairwise comparisons

  • adjust=“none”, fisher’s least significant difference (try tukey)

  • reversed = TRUE, the order of use of the letters is reversed

CLD function

cld <-CLD(lsmeans$emmeans, alpha=0.05, Letters=letters, adjust="none", reversed = TRUE)
cld 

Warning

"The CLD function and methods are deprecated. Compact-letter displays (CLDs) encourage a misleading interpretation of significance testing by visually grouping means whose comparisons have P > alpha as though they are equal. However, failing to prove two means are different does not prove that they are the same. In addition, CLDs make a hard distinction between P values nearly equal to alpha but on opposite sides."
  • Some people have told me about CLD not working anymore. A solution is to use the cld function of multcomp package.
library(multcomp)
## Loading required package: mvtnorm
## Loading required package: survival
## Loading required package: TH.data
## Loading required package: MASS
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
## 
## Attaching package: 'TH.data'
## The following object is masked from 'package:MASS':
## 
##     geyser
cld <- cld(lsmeans$emmeans, alpha=0.05, Letters=letters, adjust="none", reversed = TRUE)
cld
##  herbicide response     SE df lower.CL upper.CL .group
##  A            0.979 0.0112  8    0.930    0.994  a    
##  B            0.940 0.0220  8    0.864    0.974  ab   
##  C            0.897 0.0282  8    0.812    0.947   b   
##  E            0.845 0.0335  8    0.752    0.908   b   
##  D            0.533 0.0471  8    0.424    0.638    c  
## 
## Confidence level used: 0.95 
## Intervals are back-transformed from the logit scale 
## Tests are performed on the log odds ratio scale 
## significance level used: alpha = 0.05

Figure

Dot plot with confidence intervals

Since CLD function is deprecated, I recommend using the lsmeans to generate figures.

nd <- as.data.frame(lsmeans$emmeans)

Use nd to plot a figure.

ggplot function

ggplot(nd, aes(x=reorder(herbicide,response), y=response*100, color=herbicide)) + 
geom_point(size=4) + ylim(0,100) +
scale_color_manual(values=c("red", "blue", "green", "orange", "purple")) +
theme_bw() + labs(y="Waterhemp control (%)", x="Herbicides") +
geom_linerange(aes(ymin = lower.CL*100, ymax = upper.CL*100), size=1.5) + 
theme(axis.title = element_text(size=16),
axis.text = element_text(size=15),
legend.position = "none") +
coord_flip()

The mean and confidence intervals of each herbicide are displayed in theggplot figure.

Figure updated to follow a suggestion by Dr. Andrew Kniss (see comments).

Bar plot with letters

If you like presenting barplots with letters. Here is how you should proceed:

ggplot(cld, aes(x=reorder(herbicide,response), 
                y=response*100, fill=herbicide, label=.group)) + 
  geom_bar(stat="identity") + ylim(0,105) +
  scale_fill_manual(values=c("red", "blue", "green", "orange", "purple")) +
  theme_bw() +  labs(y="Waterhemp control (%)", x="Herbicides") +
  geom_text(nudge_y = 7, size=8) + 
  theme(axis.title = element_text(size=16),
        axis.text = element_text(size=15),
        legend.position = "none") +
  coord_flip()

Table

  • Run the package below
library(kableExtra)

This is a very simple table generator.

kable function

cld %>%
  kbl() %>%
  kable_classic_2(full_width = F)
herbicideresponseSEdflower.CLupper.CL.group
4A0.97883780.011189480.93011770.9938173a
5B0.93961670.021992280.86423490.9743843ab
3C0.89731530.028216880.81177130.9465425b
2E0.84521870.033520780.75151740.9079153b
1D0.53281950.047147080.42426250.6383555c

Summary

  • The weed control (%) data must be between 0 and 1

  • Use Levene’s test for checking homogeneity of variances

  • Use glmmTMB function in the model

  • Use beta family with logit in the model

  • Use Anova or Anova.glmmTMB function for checking the ANOVA

  • Use emmeans function to estimated the marginal means and contrasts

  • Similar approach is recommended with % biomass reduction or % cover datasets

Associate Reasercher

I work to reduce the impact of weeds in ecossystems. I am a Professor, weed scientist, and an enthusiast data scientist.