How to perform an ANOVA and post-hoc tests using R | Macarena Quiroga

How to perform an ANOVA and post-hoc tests using R

An in-depth tutorial on ANOVA analysis in R: assumptions, tests, base R functions, post-hoc tests, and rstatix package.

Image made by me with the {aRtsy} package

This week I made a tutorial for how to perform an ANOVA analysis using R for the Psych Rstats Club, an open science peer-based learning platform for statistics and R. Here’s the script, enjoy!


ANOVA

ANOVA, which stands for “analysis of variance,” is an extension of the independent two-sample t-test. We use ANOVA when we want to determine if the means of more than two groups are significantly different. In this case, we have several groups defined by a single grouping variable, which is sometimes referred to as the factor variable.

For this demonstration, we will be using the iris dataset. As always, a good first step is to inspect the data:

# inspect data
iris
##     Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
## 1            5.1         3.5          1.4         0.2     setosa
## 2            4.9         3.0          1.4         0.2     setosa
## 3            4.7         3.2          1.3         0.2     setosa
## 4            4.6         3.1          1.5         0.2     setosa
## 5            5.0         3.6          1.4         0.2     setosa
## 6            5.4         3.9          1.7         0.4     setosa
## 7            4.6         3.4          1.4         0.3     setosa
## 8            5.0         3.4          1.5         0.2     setosa
## 9            4.4         2.9          1.4         0.2     setosa
## 10           4.9         3.1          1.5         0.1     setosa
## 11           5.4         3.7          1.5         0.2     setosa
## 12           4.8         3.4          1.6         0.2     setosa
## 13           4.8         3.0          1.4         0.1     setosa
## 14           4.3         3.0          1.1         0.1     setosa
## 15           5.8         4.0          1.2         0.2     setosa
## 16           5.7         4.4          1.5         0.4     setosa
## 17           5.4         3.9          1.3         0.4     setosa
## 18           5.1         3.5          1.4         0.3     setosa
## 19           5.7         3.8          1.7         0.3     setosa
## 20           5.1         3.8          1.5         0.3     setosa
## 21           5.4         3.4          1.7         0.2     setosa
## 22           5.1         3.7          1.5         0.4     setosa
## 23           4.6         3.6          1.0         0.2     setosa
## 24           5.1         3.3          1.7         0.5     setosa
## 25           4.8         3.4          1.9         0.2     setosa
## 26           5.0         3.0          1.6         0.2     setosa
## 27           5.0         3.4          1.6         0.4     setosa
## 28           5.2         3.5          1.5         0.2     setosa
## 29           5.2         3.4          1.4         0.2     setosa
## 30           4.7         3.2          1.6         0.2     setosa
## 31           4.8         3.1          1.6         0.2     setosa
## 32           5.4         3.4          1.5         0.4     setosa
## 33           5.2         4.1          1.5         0.1     setosa
## 34           5.5         4.2          1.4         0.2     setosa
## 35           4.9         3.1          1.5         0.2     setosa
## 36           5.0         3.2          1.2         0.2     setosa
## 37           5.5         3.5          1.3         0.2     setosa
## 38           4.9         3.6          1.4         0.1     setosa
## 39           4.4         3.0          1.3         0.2     setosa
## 40           5.1         3.4          1.5         0.2     setosa
## 41           5.0         3.5          1.3         0.3     setosa
## 42           4.5         2.3          1.3         0.3     setosa
## 43           4.4         3.2          1.3         0.2     setosa
## 44           5.0         3.5          1.6         0.6     setosa
## 45           5.1         3.8          1.9         0.4     setosa
## 46           4.8         3.0          1.4         0.3     setosa
## 47           5.1         3.8          1.6         0.2     setosa
## 48           4.6         3.2          1.4         0.2     setosa
## 49           5.3         3.7          1.5         0.2     setosa
## 50           5.0         3.3          1.4         0.2     setosa
## 51           7.0         3.2          4.7         1.4 versicolor
## 52           6.4         3.2          4.5         1.5 versicolor
## 53           6.9         3.1          4.9         1.5 versicolor
## 54           5.5         2.3          4.0         1.3 versicolor
## 55           6.5         2.8          4.6         1.5 versicolor
## 56           5.7         2.8          4.5         1.3 versicolor
## 57           6.3         3.3          4.7         1.6 versicolor
## 58           4.9         2.4          3.3         1.0 versicolor
## 59           6.6         2.9          4.6         1.3 versicolor
## 60           5.2         2.7          3.9         1.4 versicolor
## 61           5.0         2.0          3.5         1.0 versicolor
## 62           5.9         3.0          4.2         1.5 versicolor
## 63           6.0         2.2          4.0         1.0 versicolor
## 64           6.1         2.9          4.7         1.4 versicolor
## 65           5.6         2.9          3.6         1.3 versicolor
## 66           6.7         3.1          4.4         1.4 versicolor
## 67           5.6         3.0          4.5         1.5 versicolor
## 68           5.8         2.7          4.1         1.0 versicolor
## 69           6.2         2.2          4.5         1.5 versicolor
## 70           5.6         2.5          3.9         1.1 versicolor
## 71           5.9         3.2          4.8         1.8 versicolor
## 72           6.1         2.8          4.0         1.3 versicolor
## 73           6.3         2.5          4.9         1.5 versicolor
## 74           6.1         2.8          4.7         1.2 versicolor
## 75           6.4         2.9          4.3         1.3 versicolor
## 76           6.6         3.0          4.4         1.4 versicolor
## 77           6.8         2.8          4.8         1.4 versicolor
## 78           6.7         3.0          5.0         1.7 versicolor
## 79           6.0         2.9          4.5         1.5 versicolor
## 80           5.7         2.6          3.5         1.0 versicolor
## 81           5.5         2.4          3.8         1.1 versicolor
## 82           5.5         2.4          3.7         1.0 versicolor
## 83           5.8         2.7          3.9         1.2 versicolor
## 84           6.0         2.7          5.1         1.6 versicolor
## 85           5.4         3.0          4.5         1.5 versicolor
## 86           6.0         3.4          4.5         1.6 versicolor
## 87           6.7         3.1          4.7         1.5 versicolor
## 88           6.3         2.3          4.4         1.3 versicolor
## 89           5.6         3.0          4.1         1.3 versicolor
## 90           5.5         2.5          4.0         1.3 versicolor
## 91           5.5         2.6          4.4         1.2 versicolor
## 92           6.1         3.0          4.6         1.4 versicolor
## 93           5.8         2.6          4.0         1.2 versicolor
## 94           5.0         2.3          3.3         1.0 versicolor
## 95           5.6         2.7          4.2         1.3 versicolor
## 96           5.7         3.0          4.2         1.2 versicolor
## 97           5.7         2.9          4.2         1.3 versicolor
## 98           6.2         2.9          4.3         1.3 versicolor
## 99           5.1         2.5          3.0         1.1 versicolor
## 100          5.7         2.8          4.1         1.3 versicolor
## 101          6.3         3.3          6.0         2.5  virginica
## 102          5.8         2.7          5.1         1.9  virginica
## 103          7.1         3.0          5.9         2.1  virginica
## 104          6.3         2.9          5.6         1.8  virginica
## 105          6.5         3.0          5.8         2.2  virginica
## 106          7.6         3.0          6.6         2.1  virginica
## 107          4.9         2.5          4.5         1.7  virginica
## 108          7.3         2.9          6.3         1.8  virginica
## 109          6.7         2.5          5.8         1.8  virginica
## 110          7.2         3.6          6.1         2.5  virginica
## 111          6.5         3.2          5.1         2.0  virginica
## 112          6.4         2.7          5.3         1.9  virginica
## 113          6.8         3.0          5.5         2.1  virginica
## 114          5.7         2.5          5.0         2.0  virginica
## 115          5.8         2.8          5.1         2.4  virginica
## 116          6.4         3.2          5.3         2.3  virginica
## 117          6.5         3.0          5.5         1.8  virginica
## 118          7.7         3.8          6.7         2.2  virginica
## 119          7.7         2.6          6.9         2.3  virginica
## 120          6.0         2.2          5.0         1.5  virginica
## 121          6.9         3.2          5.7         2.3  virginica
## 122          5.6         2.8          4.9         2.0  virginica
## 123          7.7         2.8          6.7         2.0  virginica
## 124          6.3         2.7          4.9         1.8  virginica
## 125          6.7         3.3          5.7         2.1  virginica
## 126          7.2         3.2          6.0         1.8  virginica
## 127          6.2         2.8          4.8         1.8  virginica
## 128          6.1         3.0          4.9         1.8  virginica
## 129          6.4         2.8          5.6         2.1  virginica
## 130          7.2         3.0          5.8         1.6  virginica
## 131          7.4         2.8          6.1         1.9  virginica
## 132          7.9         3.8          6.4         2.0  virginica
## 133          6.4         2.8          5.6         2.2  virginica
## 134          6.3         2.8          5.1         1.5  virginica
## 135          6.1         2.6          5.6         1.4  virginica
## 136          7.7         3.0          6.1         2.3  virginica
## 137          6.3         3.4          5.6         2.4  virginica
## 138          6.4         3.1          5.5         1.8  virginica
## 139          6.0         3.0          4.8         1.8  virginica
## 140          6.9         3.1          5.4         2.1  virginica
## 141          6.7         3.1          5.6         2.4  virginica
## 142          6.9         3.1          5.1         2.3  virginica
## 143          5.8         2.7          5.1         1.9  virginica
## 144          6.8         3.2          5.9         2.3  virginica
## 145          6.7         3.3          5.7         2.5  virginica
## 146          6.7         3.0          5.2         2.3  virginica
## 147          6.3         2.5          5.0         1.9  virginica
## 148          6.5         3.0          5.2         2.0  virginica
## 149          6.2         3.4          5.4         2.3  virginica
## 150          5.9         3.0          5.1         1.8  virginica

Remember we can run a line of code by click on “Run” when you’re in that line, or you can use the keyboard shortcut control+enter or command+enter in mac.

In this dataset, we have the length and the width of the sepal and the petals of the flowers of each species. In this excercise, we would like to know if the different species differ by the width of their sepals.

Before we dive into the statistical aspects of ANOVA, let’s briefly discuss its assumptions. ANOVA assumes that the data is normally distributed, the variances are unknown but equal between groups, and the samples are independent.

To check the assumption of normality, we can use the Shapiro-Wilk test. We have to run the shapiro.test() function for the dependent variable (the sepal width) of each species, so we use a specific syntax to subset the dataset.

# shapiro test (normality assumption)
shapiro.test(iris$Sepal.Width[iris$Species == "setosa"])
## 
## 	Shapiro-Wilk normality test
## 
## data:  iris$Sepal.Width[iris$Species == "setosa"]
## W = 0.97172, p-value = 0.2715
# first we retrieve the values of the sepal width and then we subset those sepal width values for the setosa species

# then we repeat it for each species, copying and pasting it
shapiro.test(iris$Sepal.Width[iris$Species == "virginica"])
## 
## 	Shapiro-Wilk normality test
## 
## data:  iris$Sepal.Width[iris$Species == "virginica"]
## W = 0.96739, p-value = 0.1809
shapiro.test(iris$Sepal.Width[iris$Species == "versicolor"])
## 
## 	Shapiro-Wilk normality test
## 
## data:  iris$Sepal.Width[iris$Species == "versicolor"]
## W = 0.97413, p-value = 0.338

Based on the results of these tests, none of them yielded statistically significant results, indicating that the data is normally distributed.

To assess the assumption of homogeneity of variances, we can use the Levene’s test, from the car package. The first argument is the formula: remember that we always use first the dependent variable, then the tilde and last the grouping variable. The second argument is the dataset.

#Levene's Test (homoscedasticity assumption)
car::leveneTest(Sepal.Width ~ Species, data = iris)
## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value Pr(>F)
## group   2  0.5902 0.5555
##       147

The Levene’s test assesses whether the variance of the Sepal Width is equal across different species groups. In this case, since the p-value is not statistically significant, we can conclude that the assumption of homogeneity of variances is met.

Now that we have checked the assumptions, we can proceed with the ANOVA analysis. However, it is important to note that if your data is not normally distributed, you may consider using non-parametric tests such as the Kruskal-Wallis or Friedman tests. Additionally, if the assumption of homogeneity of variances is violated, a Welch ANOVA can be used.

Let’s perform the ANOVA analysis using base R functions:

# anova with base R
iris_anova <- aov(Sepal.Width ~ Species, data = iris)
# the function is called aov (analysisi of variance) and the formula is the same we used for the levene's test

# to see the result
summary(iris_anova)
##              Df Sum Sq Mean Sq F value Pr(>F)    
## Species       2  11.35   5.672   49.16 <2e-16 ***
## Residuals   147  16.96   0.115                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The summary of the ANOVA output shows that the ANOVA analysis is statistically significant. However, it doesn’t provide information about which specific groups differ from each other. To determine the pairwise differences, we can use post-hoc tests such as the Tukey’s Honestly Significant Difference (HSD) test, also called the Tukey correction:

# Tukey's correction
TukeyHSD(iris_anova)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Sepal.Width ~ Species, data = iris)
## 
## $Species
##                        diff         lwr        upr     p adj
## versicolor-setosa    -0.658 -0.81885528 -0.4971447 0.0000000
## virginica-setosa     -0.454 -0.61485528 -0.2931447 0.0000000
## virginica-versicolor  0.204  0.04314472  0.3648553 0.0087802
#tukeyhsd(iris_anova) # error!

Please remember that R is case-sensitive, so it’s important to pay attention to capitalization.

The Tukey’s test reveals that all three groups are statistically different from each other. Alternatively, you can also use other post-hoc tests, such as the Bonferroni correction or Holm’s method, to adjust the p-values for multiple comparisons. T

# Bonferroni correction

# we use a function that calculates a t test, and we specify we want a bonferroni correction

# the first argument is the dependent variable and the second one is the grouping or factor variable.
pairwise.t.test(iris$Sepal.Width, iris$Species, p.adjust.method = "bonferroni")
## 
## 	Pairwise comparisons using t tests with pooled SD 
## 
## data:  iris$Sepal.Width and iris$Species 
## 
##            setosa  versicolor
## versicolor < 2e-16 -         
## virginica  1.4e-09 0.0094    
## 
## P value adjustment method: bonferroni

You can also use Holm’s method, with the same syntax:

# Holm correction
pairwise.t.test(iris$Sepal.Width, iris$Species, p.adjust.method = "holm")
## 
## 	Pairwise comparisons using t tests with pooled SD 
## 
## data:  iris$Sepal.Width and iris$Species 
## 
##            setosa  versicolor
## versicolor < 2e-16 -         
## virginica  9.1e-10 0.0031    
## 
## P value adjustment method: holm

Although the p-values may slightly differ between these corrections, the overall conclusion remains the same.


The rstatix package

Now, let’s explore a package called rstatix, which simplifies the ANOVA analysis and is also compatible with tidyverse functions:

# to install the package
install.packages("rstatix")

# to load the package
library(rstatix)
## 
## Attaching package: 'rstatix'
## The following object is masked from 'package:stats':
## 
##     filter

With rstatix, we can perform the ANOVA analysis:

# anova with rstatix
anova_test(formula = Sepal.Width ~ Species, data = iris)
## ANOVA Table (type II tests)
## 
##    Effect DFn DFd     F        p p<.05   ges
## 1 Species   2 147 49.16 4.49e-17     * 0.401

The summary results provided by the anova_test function include not only the p-value but also the effect size. By default, the anova_test function calculates a generalized eta squared, but you can also calculate a partial eta squared if needed:

# partial eta squared
anova_test(data = iris, formula = Sepal.Width ~ Species, effect.size = "pes")
## ANOVA Table (type II tests)
## 
##    Effect DFn DFd     F        p p<.05   pes
## 1 Species   2 147 49.16 4.49e-17     * 0.401
# in this case it's the same

rstatix also provides its own function to perform the Tukey correction:

# Tukey's correction with rstatix
tukey_hsd(x = iris, formula = Sepal.Width ~ Species)
## # A tibble: 3 × 9
##   term    group1     group2     null.value estimate conf.low conf.high    p.adj
## * <chr>   <chr>      <chr>           <dbl>    <dbl>    <dbl>     <dbl>    <dbl>
## 1 Species setosa     versicolor          0   -0.658  -0.819     -0.497 3.10e-14
## 2 Species setosa     virginica           0   -0.454  -0.615     -0.293 1.36e- 9
## 3 Species versicolor virginica           0    0.204   0.0431     0.365 8.78e- 3
## # ℹ 1 more variable: p.adj.signif <chr>

Overall, these are some approaches you can use to conduct ANOVA analysis in R. I hope this explanation helps clarify the process for you. If you have any questions or need further clarification, please let me know!

As always, remember you can suscribe to my blog to stay updated, and if you have any questions, don’t hesitate to contact me. And if you like what I do, you can buy me a cafecito from Argentina or a kofi.

Macarena Quiroga
Macarena Quiroga
Linguist/PhD student

I research language acquisition. I’m looking to deepen my knowledge of statistis and data science with R/Rstudio. If you like what I do, you can buy me a coffee from Argentina, or a kofi from other countries. Suscribe to my blog here.

Related