One-Way ANOVA for Percentage Weed Control Data

Last updated on Feb 19, 2021 10 min read

This post is also published on the Open Weed Science website.

Background

Assessments of visual weed control in response to herbicide treatments is likely the most common experiment and data analysis conducted within the Weed Science discipline. The goal of these experiments is to investigate herbicides that promote the best level of control of targeted weed species. These experiments are usually organized in a one-way randomized complete block design with multiple treatments (herbicides). At a pre-established number of days after treatment (DAT; researcher choice), a percent (%) visual control rate is given to each targeted weed. The weed control rate varies from 0% (no weed control) to 100% (complete weed control). Therefore, before setting up the experiments, the researcher is aware that the results will range from 0 to 100%.

This type of dataset typically does not follow a normal (Gaussian) distribution. Herein we use the beta distribution (logit scale), a continuous probability distribution where response variable values range from 0 to 1. Before starting the analysis, the weed control rates (0-100%) should be converted to the interval 0-1. For example, if the weed control of Palmer amaranth is 90%, it should be listed as 0.900 in your excel spreadsheet file. Also, beta distribution will not accept 0 nor 1 values. Thus, weed control of 0% and 100% should be adjusted to 0.001 and 0.999, respectively.

Getting started

Download R for your laptop system/desktop from: https://www.r-project.org

R is a programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing.

Download RStudio for your laptop system/desktop from: https://rstudio.com

RStudio is an integrated development environment for R, a programming language for statistical computing and graphics.

Create a new R project file

Click in File -> New project… -> New directory -> New Project -> Save the R work directory anywhere you want in your laptop.

The saved file will have an R project and you should copy and paste all your raw data (excel file) in the same work directory (folder).

Click in File -> New File -> R script (.R)

Click in File -> New File -> R markdown… (.Rmd)

Although I like working within R markdown, R script is easier to work with in R studio.

Install packages

In order to accomplish this exercise, you will need to install the following R packages:

install.packages(tidyverse)
install.packages(glmmTMB)
install.packages(lme4)
install.packages(lmerTest)
install.packages(emmeans)
install.packages(RCurl)
install.packages(car)
install.packages(kableExtra)

Run all codes above by clicking in the “Run” option in the top right corner of you R script or R markdown. These codes will install the necessary packages for analyzing weed control (%) experiments. Once you install these packages you will not need to install them again, unless you update R and/or RStudio.

Data

Run the package below

library(tidyverse)
library(RCurl)

The data used for this exercise is from a published manuscript by Oliveira et al. 2017 investigating control of a waterhemp (Amaranthus tuberculatus) population with several with several postemergence herbicides. For this exercise, herbicides are named A, B, C, D, and E. In addition, only results from year 2014 is presented. Run the following codes to load the data into R:

df_path <- getURL("https://raw.githubusercontent.com/openweedsci/data/master/posts/control.csv")
weedcont <- read_csv(df_path)

After running the codes above, the data should appear as weedcont (“control” was chosen as the name of the dataset; you could have called it “data” or something else).

weedcont

## # A tibble: 15 x 4
##     year herbicide   rep control
##    <dbl> <chr>     <dbl>   <dbl>
##  1  2014 A             1    0.99
##  2  2014 A             2    0.99
##  3  2014 A             3    0.99
##  4  2014 B             1    0.95
##  5  2014 B             2    0.99
##  6  2014 B             3    0.8 
##  7  2014 C             1    0.95
##  8  2014 C             2    0.85
##  9  2014 C             3    0.9 
## 10  2014 D             1    0.5 
## 11  2014 D             2    0.6 
## 12  2014 D             3    0.5 
## 13  2014 E             1    0.9 
## 14  2014 E             2    0.85
## 15  2014 E             3    0.8

The dataset contains 2 years (2013 and 2014), 5 herbicides (A, B, C, D, E), 3 reps (1, 2 and 3), and visual control of waterhemp (converted to proportions [0.001-0.999] already).

Levene’s Test for Homogeneity of Variance

Run the packages below

library(car)

Levene’s Test of the null that the variances amongst herbicide treatments are the same.

Levene’s test is robust to departures from normality.

leveneTest function

leveneTest(weedcont$control,  weedcont$herbicide)

## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  4  0.6828 0.6197
##       10

Results show p-value = 0.6197 thus the null hypothesis is accepted indicating that variances amongst herbicide treatments are the same.

Model

Run the packages below

library(glmmTMB)
library(lme4)
library(lmerTest)

Generalized Linear Mixed Models using Template Model Builder by Mollie Brooks
Beta family with logit
Mixed model

The model has control as a response variable, herbicide as fixed effects, and rep as random effects.

glmmTMB function

model <- glmmTMB(control ~ herbicide + (1|rep), beta_family(link = "logit"), data=weedcont)

## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you

## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you

## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you

Anova

Anova.glmmTMB function

glmmTMB:::Anova.glmmTMB(model)

## Analysis of Deviance Table (Type II Wald chisquare tests)
## 
## Response: control
##            Chisq Df Pr(>Chisq)    
## herbicide 83.163  4  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The P-value < 2.2e-16; there is a lower evidence that a difference of some magnitude between herbicides tested could occur by chance alone assuming null hypothesis is true.

Cheking the herbicide control on waterhemp

Run the package below

library(emmeans)

Interaction-style plots for estimated marginal means (emmip)

type=“response”, herbicide and intervals back from the logit scale
coord_flip(), inverted the axis (better visualization)

emmip function

emmip(model, ~herbicide, type="response")

This figure helps visualize waterhemp control given herbicide treatments. It is useful in treatments arranged in factorial design.

Estimated marginal means (least-squares means)

type=“response”, herbicide and intervals backtransformed from the logit scale
cont=“pairwise”, comparisons between each herbicide treatments
adjust=“none”, fisher’s least significant difference (try tukey)

The lsmeans provides the weed control (prop), SE (standard error) and confidence intervals (lower.CL and upper.CL). Moreover, lsmeans provides the pairwise contrasts between treatments (herbicides).

emmeans function

lsmeans <- emmeans(model, ~ herbicide, cont="pairwise", adjust="none", type="response", alpha=0.05)
lsmeans

## $emmeans
##  herbicide response     SE df lower.CL upper.CL
##  A            0.979 0.0112  8    0.930    0.994
##  B            0.940 0.0220  8    0.864    0.974
##  C            0.897 0.0282  8    0.812    0.947
##  D            0.533 0.0471  8    0.424    0.638
##  E            0.845 0.0335  8    0.752    0.908
## 
## Confidence level used: 0.95 
## Intervals are back-transformed from the logit scale 
## 
## $contrasts
##  contrast odds.ratio      SE df t.ratio p.value
##  A / B         2.972  2.0017  8  1.618  0.1444 
##  A / C         5.293  3.1360  8  2.813  0.0228 
##  A / D        40.556 23.1658  8  6.482  0.0002 
##  A / E         8.470  4.9722  8  3.640  0.0066 
##  B / C         1.781  0.8957  8  1.147  0.2845 
##  B / D        13.644  5.8918  8  6.052  0.0003 
##  B / E         2.850  1.3197  8  2.261  0.0536 
##  C / D         7.662  2.7531  8  5.667  0.0005 
##  C / E         1.600  0.6313  8  1.192  0.2675 
##  D / E         0.209  0.0665  8 -4.921  0.0012 
## 
## Tests are performed on the log odds ratio scale

Plot lsmeans

comparisons=TRUE, comparisons between treatments with red arrows
type=“response”, herbicide and intervals backtransformed from the logit scale
alpha=0.05, significance level to use in constructing comparison arrows
adjust=“none”, Fisher’s least significant difference (try tukey)

plot function

plot(lsmeans, ~ herbicide, comparisons=TRUE, type="response", alpha=0.05, adjust="none")

The black dot represents the mean waterhemp control whereas the purple represents the confidence intervals for each herbicide. Red arrows indicate the comparison across herbicide treatments, whereas if red arrows do not overlap, treatments are different. See contrasts from lsmeans for validation.

Extract and dsplay information on all pairwise comparisons of estimated marginal means

alpha=0.05, numeric value giving the significance level for the comparisons
Letters=letters, display letter grouping treatments according to pairwise comparisons
adjust=“none”, fisher’s least significant difference (try tukey)
reversed = TRUE, the order of use of the letters is reversed

CLD function

cld <-CLD(lsmeans$emmeans, alpha=0.05, Letters=letters, adjust="none", reversed = TRUE)
cld

Warning

"The CLD function and methods are deprecated. Compact-letter displays (CLDs) encourage a misleading interpretation of significance testing by visually grouping means whose comparisons have P > alpha as though they are equal. However, failing to prove two means are different does not prove that they are the same. In addition, CLDs make a hard distinction between P values nearly equal to alpha but on opposite sides."

Some people have told me about CLD not working anymore. A solution is to use the cld function of multcomp package.

library(multcomp)

## Loading required package: mvtnorm

## Loading required package: survival

## Loading required package: TH.data

## Loading required package: MASS

## 
## Attaching package: 'MASS'

## The following object is masked from 'package:dplyr':
## 
##     select

## 
## Attaching package: 'TH.data'

## The following object is masked from 'package:MASS':
## 
##     geyser

cld <- cld(lsmeans$emmeans, alpha=0.05, Letters=letters, adjust="none", reversed = TRUE)
cld

##  herbicide response     SE df lower.CL upper.CL .group
##  A            0.979 0.0112  8    0.930    0.994  a    
##  B            0.940 0.0220  8    0.864    0.974  ab   
##  C            0.897 0.0282  8    0.812    0.947   b   
##  E            0.845 0.0335  8    0.752    0.908   b   
##  D            0.533 0.0471  8    0.424    0.638    c  
## 
## Confidence level used: 0.95 
## Intervals are back-transformed from the logit scale 
## Tests are performed on the log odds ratio scale 
## significance level used: alpha = 0.05

Figure

Dot plot with confidence intervals

Since CLD function is deprecated, I recommend using the lsmeans to generate figures.

nd <- as.data.frame(lsmeans$emmeans)

Use nd to plot a figure.

ggplot function

ggplot(nd, aes(x=reorder(herbicide,response), y=response*100, color=herbicide)) + 
geom_point(size=4) + ylim(0,100) +
scale_color_manual(values=c("red", "blue", "green", "orange", "purple")) +
theme_bw() + labs(y="Waterhemp control (%)", x="Herbicides") +
geom_linerange(aes(ymin = lower.CL*100, ymax = upper.CL*100), size=1.5) + 
theme(axis.title = element_text(size=16),
axis.text = element_text(size=15),
legend.position = "none") +
coord_flip()

The mean and confidence intervals of each herbicide are displayed in theggplot figure.

Figure updated to follow a suggestion by Dr. Andrew Kniss (see comments).

Bar plot with letters

If you like presenting barplots with letters. Here is how you should proceed:

ggplot(cld, aes(x=reorder(herbicide,response), 
                y=response*100, fill=herbicide, label=.group)) + 
  geom_bar(stat="identity") + ylim(0,105) +
  scale_fill_manual(values=c("red", "blue", "green", "orange", "purple")) +
  theme_bw() +  labs(y="Waterhemp control (%)", x="Herbicides") +
  geom_text(nudge_y = 7, size=8) + 
  theme(axis.title = element_text(size=16),
        axis.text = element_text(size=15),
        legend.position = "none") +
  coord_flip()

Table

Run the package below

library(kableExtra)

This is a very simple table generator.

kable function

cld %>%
  kbl() %>%
  kable_classic_2(full_width = F)

	herbicide	response	SE	df	lower.CL	upper.CL	.group
4	A	0.9788378	0.0111894	8	0.9301177	0.9938173	a
5	B	0.9396167	0.0219922	8	0.8642349	0.9743843	ab
3	C	0.8973153	0.0282168	8	0.8117713	0.9465425	b
2	E	0.8452187	0.0335207	8	0.7515174	0.9079153	b
1	D	0.5328195	0.0471470	8	0.4242625	0.6383555	c

Summary

The weed control (%) data must be between 0 and 1
Use Levene’s test for checking homogeneity of variances
Use glmmTMB function in the model
Use beta family with logit in the model
Use Anova or Anova.glmmTMB function for checking the ANOVA
Use emmeans function to estimated the marginal means and contrasts
Similar approach is recommended with % biomass reduction or % cover datasets

herbicide ANOVA