(Please use the following links to navigate this page)
This page explains how to perform a nested analysis of variance and how to interpret the results of this test. Some of the text that follows assumes that you are already familiar with two-factor ANOVA, so if you have not yet done so it is recommended that you read the web page dedicated to that technique before proceeding further.
At its basic level nested ANOVA is in some ways similar to two-factor ANOVA in that there are two independent categorical variables. The crucial difference is that with two-factor ANOVA all levels of one variable are combined with all levels of the other, whereas with nested ANOVA the different levels of one variable are represented only in one of the two levels of the second. Because of this, it is not appropriate to test for interactions between the variables with nested designs: only the main effects are tested.
To illustrate a nested ANOVA design, consider a hypothetical clinical trial comparing the success of heart surgery at four different hospitals, each with four different surgeons that operate on a unique set of patients. In this study there are two independent variables (hospital and surgeon), but each surgeon carries out operations only in one of the four hospitals. The surgeon variable is said to be “nested” within the hospital variable. If every surgeon had performed operations at all four hospitals the surgeon variable would be said to be “crossed” with the hospital variable and a two-factor ANOVA would be appropriate.
Nested ANOVA will be illustrated using data from an experimental study of the evolution of time to flowering in Clarkia plants. In this study, seeds were collected from 150 plants in each of three populations of Clarkia unguiculata in southern California and two replicate breeding lines from each population were created by randomly choosing a subset of the original 150 plants for each replicate. The lines were subjected to artificial selected for early flowering for three generations (by which stage only six of the original families were still being allowed to breed) and at the end of the experiment the time to flowering of the selected lines was compared with control lines that had not been subject to selection.
A complete analysis of the experiment is beyond the scope of this website, but data from the selected lines can be used to illustrate how to perform a nested ANOVA, which is the appropriate statistical test for these data because there are two categorical independent variables (Population and Replicate) with one variable (Replicate) nested within the other (Population). The data are nested because the set of plants in each pair of replicates from each population was unique to that population. Comparing this study with the hypothetical clinical trial described above, population is equivalent to hospital and replicate is equivalent to surgeon.
Two null hypotheses can be tested with the data:
- There are no differences in mean time to flowering among the three populations (main effect)
- There are no differences in mean time to flowering between the two replicates of each population (nested effect)
Download the following data file and import it into JMP:Clarkia Flowering NESTED Excel file
The analysis can be performed in JMP as follows:
- Before starting the analysis, change Family from a continuous to a categorical variable. The numbers in the column identify individual plant families within each replicate, and JMP interpreted these numbers to mean that Family is continuous numeric variable. However, in reality it is categorical: the numbers are simply an identification code and could just as easily have been letters. Double click on the variable name box at the top of the column and in the pop-up window change Data Type to Character (the Modeling Type will automatically change to Nominal), then hit OK. The symbol next to Family in the Columns list on the left will change from a blue triangle to a red histogram. Although we will not be using Family as a variable in our analysis it is always a good idea to properly code all variables in a JMP file.
- From the Analyze menu select Fit Model and click or drag Mean First Flower Date into the Y box and both Population and Replicate into the Construct Model Effects box. To specify that Replicate is nested within Population, highlight Replicate in the Construct Model Effects box and Population in the Columns box and click on Nest. Replicate will change to Replicate[Population], a standard way of indicating that the first variable is nested within the second.
- Hit Run
- As with two-factor ANOVA, the output window shows a series of figures as well as several summary tables. Most of these have uses beyond the scope of this website. Close the Actual by Predicted Plot, the Residual by Predicted Plot, the two Leverage Plots and the Effect Summary table by clicking on their grey arrows.
- Click on the red arrows next to Population and Replicate[Population] and select LS Means Plot for both.
- The window will now look like this:
To download a high-resolution version of this image, click here: Clarkia Flowering NESTED Output image download
As with two-factor ANOVA, the most important pieces of information in this window are as follows:
- In the Summary of Fit table, the R2 value (RSquare = 0.8387) tells us that approximately 84% of the total variation in the dependent variable (time to flowering) can be explained by the two effects under consideration: Population and Replicate (nested within Population). The total sample size is also given (36).
- The Analysis of Variance table gives the group mean square for both effects together (the Model row) and the error mean square (the Error row), the ratio of the two (F Ratio) and the associated p-value (Prob > F < 0.0001). Since the p-value is well below the normal threshold for statistical significance (0.05), we can conclude that the two effects together do have a significant effect on time to flowering. However, this information alone does not tell us which of the two individual effects are responsible for the overall significance of the test.
- More usefully, the Effect Tests table shows the degrees of freedom (DF), F Ratio, and p-value (Prob > F) for each of the two effects individually. Here we can see that both are statistically significant (both p-values are < 0.05). Thus, we can reject both null hypotheses listed above and accept the alternative hypotheses:
- Mean time to flowering differs among the three populations
- Mean time to flowering differs between the two replicates within each population.
- The Least Squares Means Table for the Population effect shows two versions of the mean values for each group: the arithmetic mean (Mean column) and the least squares mean (Least Sq Mean column), along with the standard error of the least squares mean. Unlike two-factor ANOVA the two means are identical in this analysis, because the experimental design is nested, not factorial; i.e., there is no variable “crossed” with Population whose effects and sample sizes could influence the calculation of the mean values for the three populations. Similarly, only least squares means are reported for the Replicate[Population] effect, because arithmetic and least squares means are identical when calculated individually for each of the six sub-group of data (Granite Road Replicate 1, Granite Road Replicate 2, etc.).
- The LS Means Plots show the least squares means and their standard errors for each population as a whole (combining data from both replicates) and for each of the six population-replicate combinations. Visually these plots suggest
- that the significant Population effect in the Effect Tests table is due to the fact that the Jack and Stage population has a longer time to flowering than the other two populations.
- that the significant Replicate[Population] effect in the Effect Tests table is due mainly to the fact that the two Mill Creek replicates are different from one another. The two Granite road replicates and the two Jack and Stage replicates both look very similar to one another.
- We can investigate these apparent patterns statistically by performing Tukey-Kramer HSD tests, just as we did for one-factor and two-factor ANOVA. (If you have not already done so, please read about Tukey-Kramer HSD tests on the one-factor ANOVA web page.) Click on the red arrows next to Population and Replicate[Population] and select LS Means Tukey HSD. Then click on the red arrows next to LS Means Differences Tukey HSD and unselect Crosstabs Report. You can also select Ordered Differences Report if you wish, but this provides an unnecessary amount of information for our purposes for the Replicate[Population] effect. The two new tables will now look like this:
- The Connecting Letters Report table for Population shows that the visual impression for this effect was correct: the Jack and Stage population has a significantly longer time to flowering than Granite Road and Mill Creek, but the latter two populations do not differ significantly from each other.
- The Connecting Letters Report table for Replicate[Population] effect also supports our visual impression: the two Mill Creek replicates are significantly different from each other, but the two Granite Road and Jack and Stage Replicates are not. The other significant differences in this table simply support the overall Population Effect differences.