Introduction to ANOVA
Analysis of variance (ANOVA) is a powerful and versatile statistical technique that can be applied to a wide variety of data sets. This page provides an overview of the basic types of ANOVA and a simple explanation of the underlying theory. For an in-depth explanation you will need to take a formal statistics course or read the appropriate chapters of a statistics textbook. It is well worth developing a comprehensive understanding of the technique because it is so widely used, but that is beyond the scope of this website.
If you haven’t used ANOVA before, it is recommended that you read this page first and then learn how one-factor ANOVA works, even if you actually want to use one of the other types.
Like many other statistical techniques, ANOVA assumes that your data are normally distributed. If they are not, it is sometimes possible to transform them mathematically so that the transformed data do meet normality assumptions. To learn how to determine whether your data have a normal distribution and what transformations to try if they are not, see Foundational Material. If your data are not normally distributed, and cannot be transformed to meet normality assumptions, alternative tests are available for the simpler types of ANOVA: see Nonparametric Statistics.
This is used to compare the means of three or more samples of data to determine whether there are any differences between them. It is called “one-factor” because there is just one independent (categorical) variable. It can be thought of as an extension of the two-sample t-test to multiple samples, although the underlying calculations are rather different. For example, a two-sample t-test could be used to compare the performance of a single pharmaceutical against a placebo, whereas one-factor ANOVA could be used to compare two or more pharmaceuticals against a placebo and against each other. In ANOVA terminology the categories of the independent variable are usually called “groups”, “levels”, or “treatments”.
This is used to compare the means of four or more samples when there are two independent categorical variables. The simplest type is when each independent variable has two levels so that there are four combinations in total, although it is perfectly possible to have multiple levels for both variables. For example, a clinical study might investigate the effects of two different pharmaceuticals (P1 and P2) and two different diets (D1 and D2) on blood cholesterol levels. Pharmaceutical type would one independent variable, diet type the other, and there would be four different treatment combinations (P1 and D1; P1 and D2, P2 and D1, P2 and D2). One particularly useful feature of two-factor ANOVA is its ability to test for interactions between variables; i.e., to determine whether the effect of one variable is influenced by the level of the other. In the above example, an interaction between diet and pharmaceutical would result in the effect of diet differing between the two pharmaceutical treatments. Details are provided on the page dedicated specifically to two-factor ANOVA
At its basic level this technique is in some ways similar to two-factor ANOVA, in that there are two independent categorical variables. The crucial difference is that with nested ANOVA the different levels of one variable are represented only in one of the two levels of the second variable, whereas with two-factor ANOVA all levels of one variable are combined with all levels of the other. For example, consider a clinical trial comparing the success of heart surgery at four different hospitals, each with four different surgeons that operate on a unique set of patients. In this study there are two independent variables (hospital and surgeon), but each surgeon carries out operations only in one of the two hospitals. The surgeon variable is said to be “nested” within the hospital variable, and a nested ANOVA is the appropriate statistical technique. (If every surgeon performed operations at all four hospitals the surgeon variable would be said to be “crossed” with the hospital variable and a two-factor ANOVA would be appropriate.)
Analysis of Covariance (ANCOVA).
This is used to compare the means of two or more samples when there is at least one categorical independent variable and also at least one continuous independent variable (a covariate) that may also affect your data. At a basic level it can be thought of as a combination of a two-sample t-test and a linear regression in one analysis. For example, a clinical study comparing the effects of two pharmaceuticals on blood cholesterol might also measure the body mass index (BMI) of all the patients in the study, in case this also affected blood cholesterol. Here, pharmaceutical type would be the categorical independent variable and BMI would be the covariate. As with two-factor ANOVA, ANCOVA can test for interactions between the independent variables. Details are provided on the page dedicated specifically to ANCOVA.
Repeated Measures ANOVA.
This is used to compare three or more means when the same entities are measured more than once at different times. At its most basic level, it can be thought of as an extension of a paired t-test, although (as with the similarity between a two-sample t-test and one-factor ANOVA) the underlying calculations are quite different. For example, a group of patients might be placed on a diet thought to lower blood cholesterol and their cholesterol concentrations measured at the start of the study, after six months, and again after twelve months. Except for the fact that the same patients were measured in all three levels of the independent variable (Start, Six months, Twelve months) this experimental design is similar to one-factor ANOVA, and if there had been only two levels a paired t-test could have been performed. In actual fact, this would not be a very good experimental design because even if cholesterol levels fell in the patients, there would be no way to know whether this would have happened anyway in the absence of any diet change. A better design would be to have a control group of patients whose diet was not changed and to follow these through time as well. Except for the fact that the same patients were measured at multiple times, this second design is similar to two-factor ANOVA, with diet type being one independent variable and measurement date the second.
A sometimes overlooked but important thing to consider when performing any type of ANOVA is whether the groups of data in different categories of your independent variables are fixed or random. Fixed groups are ones that are predetermined and deliberately built into the study design, often experimentally, and are a principal focus of the study. Patients receiving different pharmaceuticals in a study of drug effectiveness would be considered to be fixed groups and such a study should be analyzed with a fixed-effects ANOVA (also sometimes called Model 1 ANOVA). Any conclusions reached from such an analysis are only applicable to the particular categories represented in the study (the specific pharmaceuticals in the drug trial described above) and cannot be extended to other categories (drugs not studied)
Random groups are ones that are part of the study design but are not predetermined. Rather, they are a random sample from a much larger set of categories that could have been represented. For example, a study investigating variation in blood cholesterol levels might study six members of 50 different families, chosen at random from the large number of possible families that the researchers could have investigated. Here the families would be considered to be random categories, and such a study should be analyzed using a random-effects ANOVA (also sometimes called Model 2 ANOVA). Unlike with fixed-effects, any conclusions reached from a random-effects ANOVA can be extended to other categories of the independent variable. For example, in the study described above, if the researchers found that members of individual families tended to have very similar blood cholesterol levels, this conclusion could legitimately be extended to other families that were not part of the study.
In principle any of the above types of ANOVA (one-factor, two-factor, nested, analysis of covariance, and repeated measures) could include just fixed effects, just random effects, or (except for one-factor ANOVA) both. An analysis that that includes both random and fixed effects is called a mixed-model ANOVA. In JMP you do not choose between fixed-effects, random-effects, and mixed-model ANOVA by selecting options from a pull-down menu. Rather, you do so by changing how the independent variables are coded prior to analysis.
Analysis of variance is so called because it attempts to understand the causes of variation in the dependent variable measured in a study. Consider the following simple example, which will be analyzed using one-factor ANOVA elsewhere on this website. In 2015 researchers planted several hundred oak seedlings in four horizontal transects at different elevations on a sandy ridge: one at the bottom, one at the top, and two more at equally spaced intervals in between. They anticipated that transect location might affect the photosynthetic rates of the seedlings because water availability in the soil declined with elevation. Reduced water availability might cause the seedlings to partly close their leaf stomata to reduce water loss by transpiration. This would reduce the rate of carbon dioxide uptake for photosynthesis. The data are shown below. On the x-axis, transect elevation increases from left to right. On the y-axis, photosynthetic rate has been square-root transformed to improve the fit of the data to a normal distribution (see Foundational Material for more detail on data transformations). The black dots represent photosynthetic rates of individual seedlings within each transect. The horizontal grey line represents the grand mean photosynthetic rate of the entire dataset; the shorter horizontal green lines represent the mean photosynthetic rate of all the seedlings within each transect. The histograms on the right show the distributions of the data in each of the four transects.
Variation in photosynthetic rate within this dataset consists of two types of deviation away from the grand mean
- Deviations of the group (transect) means away from the grand mean (i.e., the differences between the horizontal green lines and the horizontal grey line). ANOVA calculates a term called the group mean square as an indicator of the average degree of variation among individuals because of their membership of different groups (in this case, being on different transects). In other words, in the absence of any other sources of variation, each individual would have a photosynthetic rate equal to its group mean, and differences among groups would be perfectly represented by differences in these group means.
- Deviations of each individual data point away from its group mean. (i.e., the differences between the black dots and the horizontal green bars for each transect). ANOVA calculates a term called the error mean square as an indicator of the average degree of variation among individuals belonging to the same group because of those individual deviations.
The ratio of the group mean square to the error mean square is given the symbol F and is called the variance ratio or the F-ratio.
The null hypothesis for a one-factor ANOVA is that the group means are all the same – in this case that the average photosynthetic rate is the same for all transects.
- If the group means are very similar, and most of the variation in the data set is due to variation within groups, the group mean square will be relatively small and the error mean square will be relatively large, and the F-ratio will be one or less.
- Conversely, if the group means are very different, and little of the variation in the data set is due to variation within groups, the group mean square will be relatively large and the error mean square will be relatively small, so the F-ratio will be larger than one.
- Thus, large F-ratios indicate that most of the variation in the dataset is due to large differences between group means, and therefore give us confidence that these differences are likely to be real and not just due to chance. If the F-ratio is associated with a p-value less than the conventional threshold of 0.05, we are justified in rejecting the null hypothesis and accepting the alternative hypothesis that the group means are different from one another.