This page explains how to perform a one-factor ANOVA and to interpret the results of this test. Some of the text that follows assumes that you already have a basic familiarity with analysis of variance, so if you have not already done so it is recommended that you read the Introduction to ANOVA page before proceeding further.
One-factor ANOVA is used to compare the means of three or more samples of data to determine whether there are any differences among them in their mean values. It is called “one-factor” because there is just one independent (categorical) variable. It can be thought of as an extension of the two-sample t-test to multiple samples, although the underlying calculations are rather different. For example, a two-sample t-test could be used to compare the performance of a single pharmaceutical against a placebo, whereas one-factor ANOVA could be used to compare two or more pharmaceuticals against a placebo and against each other. In ANOVA terminology the categories of the independent variable are usually called “groups”, “levels”, or “treatments”.
In JMP, one-factor ANOVA can be performed using either the Fit Y by X or the Fit Model options of the Analyze menu. Both approaches are illustrated here using the data on the photosynthetic rate of oak seedlings described on the Introduction to ANOVA page. It is worth trying both options for two reasons: first, to show the similarities and differences between them; second, because two-factor ANOVA and analysis of Covariance (ANCOVA) can only be performed using Fit Model. The Fit Model option is also able to perform analyses with random effects; Fit Y by X cannot do this. (See Introduction to ANOVA for information on fixed vs. random effects).
First, import the data in the Excel spreadsheet below into JMP.
After the import, Ecotype should be coded as a categorical variable (as indicated by the red histogram in the column list on the left); all the other variables should be coded as continuous (as indicated by the blue triangles). However, before starting the analysis it is necessary to recode both the Transect and Plant # variables. On the Excel spreadsheet both were entered as numbers, so JMP interpreted them as continuous variables. However, in reality both are categorical: the numbers are just an identification code and could just as easily have been letters. For each variable in turn, double click on the variable name box at the top of the column and in the pop-up window change Data Type to Character (the Modeling Type will automatically change to Nominal), then hit OK. The symbol next to the variable in the column list should then change to a red histogram.
The researchers who did this experiment were interested in several things, but one of them was whether the photosynthetic rate of the oak seedlings varied among the four planting transects, which were run horizontally along a sandy ridge at 10 m intervals, with transect 1 being at the bottom of the ridge and transect 4 at the top. Water availability in the soil declined with elevation, and reduced water availability might have caused the seedlings to partly close their leaf stomata to reduce water loss by transpiration. This might then have reduced the rate of carbon dioxide uptake for photosynthesis. Thus, it was anticipated that seedlings at higher elevations would have lower photosynthetic rates than those at lower elevations. One-factor ANOVA is an appropriate statistical test for this experiment because the independent variable is categorical with four levels (transects 1, 2, 3, and 4) and the dependent variable (photosynthetic rate) is numeric and continuous.
The null hypothesis for the test is that there are no differences in mean photosynthetic rate among the four groups of seedlings planted along each of the four transects.
- From the Analyze menu select Fit Y by X and click or drag Transect into the X, Factor box and Sqrt Photosynthetic Rate into the Y, Response box. (Sqrt Photosynthetic Rate is a transformed variable – the square root of the actual photosynthetic rate. The transformation was carried out because the Photosynthetic Rate variable was not normally distributed, whereas its square root was. See Foundational Material for more detail on data transformations.)
- Hit OK. A new window will appear with a figure showing all the data points in four vertical columns, one for each transect. If you wish, you can spread out the points horizontally by clicking on the red arrow next to Oneway Analysis…, going to Display Options, and selecting Points Jittered.
- Click on the red arrow again and select Means/ANOVA. The window will expand to show the statistical output and the figure will be automatically modified to include Means Diamonds in green. The width of each diamond is proportional to the sample size in each group and the horizontal line connecting the left and right points indicates the group mean. The vertical points indicate the 95% confidence range around the mean. The short horizontal lines near the vertical points are called overlapping marks and are a visual indicator of whether two means are significantly different from one another. If two diamonds overlap at these positions, the two means are not significantly different from one another. However, rather than relying on a visual indicator, it is better to carry out an additional statistical test (see next two instructions).
- If you look at the Analysis of Variance table you will see that the F Ratio = 6.6012 and that the associated p-value (Prob > F) = 0.0004. This is well below the normal threshold for statistical significance (0.05). Thus, we can reject the null hypothesis and accept the alternative hypothesis that there are some differences between the group means. But which means are different and which are not? You could compare all pairs of means using two-sample t-tests, but there would be six comparisons in all, and carrying out multiple statistical tests always increases your chance of getting one or more false positives. Under these circumstances, it is necessary to carry out a test that controls for such a possibility. There are several such tests, but one of the most commonly used is called the Tukey-Kramer HSD test (HSD stands for honestly significant difference).
- Click again on the red arrow, go to Compare Means, and select All Pairs, Tukey HSD. The window will expand again to show more statistical output and the figure will be further modified. Some of the new output is not crucial for our purposes and can be hidden by clicking on the grey arrows. Click on those next to Confidence Quantile and HSD Threshold Matrix to hide these sections. The window will now look like this.
Click here to download a high-resolution image of this window SW Oaks FitYbyX ANOVA Output download
Not all of the statistical output shown in this window is essential for our purposes. The following is a summary of most important information:
- In the Summary of Fit table, the R2 value (Rsquare = 0.1599) tells you that approximately 16% of the total variation in the data can be explained by differences in mean photosynthetic rate (square root transformed) among the four transects. This table also shows the total sample size (108).
- The Analysis of Variance table shows the group mean square (the Transect row), the error mean square (the Error row), the degrees of freedom associated with each (DF) the ratio of the two (F ratio) and the associated p-value (Prob > F). If you were reporting the results of this test in a paper, you would normally do so as follows: “F = 6.601, df = 3, 104; p = 0.0004”. Sums of squares and mean squares would not normally be reported.
- The Means for Oneway Anova table shows the sample size, mean, standard error, and 95% confidence range for the dependent variable (square root photosynthetic rate) for each of the four transects.
- The two tables under Means Comparison show the results of the Tukey-Kramer HSD test. The Connecting Letters Report shows the mean value for each transect (the same as the Means for Oneway Anova table above) with a set of letters to the left. Pairs of means sharing the same letter are not significantly different from each other (p > 0.05). The Ordered Differences Report provides more information for each pairwise comparison. The most useful additional information in this table is the p-Value column. This shows, for example, that the mean values for transects 1 and 4 (top row) are significantly different from each other (p = 0.0005), whereas the mean values for transects 1 and 3 are not (p = 0.1566).
- The set of circles to the right of the figure (above the legend All Pairs Tukey-Kramer 0.05) gives a visual indication of which means are significantly different. Vertically, the position of each circle indicates the mean for each group, and circles that don't overlap (or that overlap just slightly, so that the angle where they intersect is less than 90o) indicate means that are significantly different from each other. However, it is better to get this information from the tables.
The same data will be used to perform a one-factor ANOVA using the Fit Model option. The information below assumes that you have already performed and learned how to interpret a one-factor ANOVA using Fit Y by X.
- From the Analyze menu select Fit Model. Click or drag Transect into the Construct Model Effects box and Sqrt Photosynthetic Rate into the Y box. The Personality menu should change to Standard Least Squares and the Emphasis menu to Effect Leverage. Hit Run.
- A new window will appear with three figures – an Actual by Predicted Plot, a Leverage Plot, and a Residual by Predicted Plot. These plots have various uses that are beyond the scope of this website. However, click on the red arrow next to Transect and select LS Means Plot as well as LS Means Tukey HSD.
- Now click on the red arrow next to LSMeans Differences Tukey HSD. Unselect Crosstab Report and select Ordered Differences Report.
- The window will now look like this:
Click here to download a high-resolution image of this window SW Oaks FitModel ANOVA Output download
- The Summary of Fit table shows the same information as the Fit Y by X output, with the most important information being R2 = 0.1599 and the sample size (108).
- The Analysis of Variance table is very similar to the same table in Fit Y by X, except that here the group mean square value is in the row called Model, rather than Transect. The Effect Tests table shows the same information in condensed form (without the error mean square row).
- The Least Squares Means Table shows two versions of the mean values for each transect: the arithmetic mean (Mean column) and the least squares mean (Least Sq Mean column), along with the standard error of the least squares mean. The least squares mean is a mean value for each group that takes into account the effects of other variables in the analysis and controls for differences in sample size among the categories of these other variables. In the current analysis there are no other variables, so the least squares means are equal to the arithmetic means, but in most other types of analysis they will not be. The figure below this table shows a plot of the four least squares means along with their standard errors.
- The LS Means Differences Tukey HSD tables show the same information as in the Fit Y by X output.