The Assumptions of One-Way ANOVA

  1. The response variable is normally distributed in each group
  2. The response variable has equal variance across the groups
  3. The data are drawn on the basis of random sampling

The Null and the Alternative Hypotheses

\(H_0\): There is no difference between the group means (\(\mu_1 = \mu_2 = \mu_3\))
\(H_A\): There is a difference between the group means (at least one \(\mu_i \neq \mu_j\))

The Testing Sequence

As usual we will test the assumptions of (a) normality in each group (with the shapiro.test) and then (b) equality of variance across all groups (with Levene’s test for homogeneity of variances).

However, since ANOVA is fairly robust to violations of these assumptions we won’t abandon it unless we absolutely have to. In a practical research situation you will have to go by standards accepted by your field/sub-field so if few folks opt for a non-parametric test then you should be fine doing so as well. However, it never hurts to have run a non-parametric test just in case someone criticizes your work or asks why you ran the ANOVA when the assumptions were clearly violated. My suggestion would be to point to the results of the non-parametric test which, hopefully, lead to the same conclusion as that generated by ANOVA. Remember …

  1. If normality is not rejected but we have unequal variances then Welch’s one-way ANOVA is a good choice.

\[W^{*} = \dfrac{\sum w_{j}\left(\bar{y}_j - \hat{\mu}\right)^{2} / \left( k - 1\right)}{1 + \left[2\left(k - 2\right)/\left(k^{2} - 1\right) \right]\sum h_{j}}\]

where \(w_j = \dfrac{n_{j}}{s^{2}_{j}}\); \(\hat{\mu} = \dfrac{\sum w_{j}y_{j}}{W}\); \(W = \sum w_{j}\); \(h_j = \dfrac{\left(1 - w_{j}/W^{2} \right)}{\left(n_j - 1 \right)}\), and; \(f = \dfrac{\left(k^{2} -1 \right)}{3 \sum h_{j}}\)

\[W^{*} \sim F\left(df_1 = k-1, df_2 = f \right)\]

  1. If both normality and equal variance are rejected and no transformation works, then we may have to settle for the weaker test – the Kruskal-Wallis test.

\[H = \left[\dfrac{12}{N(N+1)} \times \sum \dfrac{Tc^2}{n_{c}} \right] - 3(N+1)\]

where \(N\) is the total number of scores in the study; \(Tc\) is the rank total for each group; \(n_c\) is the number of units in each group

\[H \sim \chi^2 \text{with } df = k-1\]

Worked Examples

Biomass and Controlled Burns

One particularly contentious issue among restoration ecologists is the timing of prairie burns. Although natural fires may primarily have been sparked by late-summer lightning strikes, most controlled burns are done during the spring or fall. The timing of burning may strongly influence the outcome of prairie restorations because burns done at different times of year can favor dramatically different plant species. You could collect data to answer the following question: How does the timing of controlled burns influence the biomass of desirable prairie plant species?

Total biomass (g/m\(^2\)) of Rudbeckia hirta (Black-eyed Susan) growing in each of 15 \(0.5m^2\) plots.

One third of the plots were burned in the spring, late summer, and fall of 1998, respectively. All plots were sampled during summer 1999.

Lets read in the data.

season = c(rep(0, 5), rep(1, 5), rep(2, 5))
season = factor(season, levels = c(0, 1, 2), labels = c("spring", "summer", "fall"))
biomass = c(0.10, 0.61, 1.91, 2.99, 1.06, 5.56, 6.97, 3.01, 5.33, 3.53, 3.85, 3.01, 2.13, 2.50, 6.10)
burn = cbind.data.frame(season, biomass)

Testing for Normality

Lets run the usual box-plots and quantile-quantile plots, and calculate skewness.

H0: Biomass is normally distributed for all three seasons
HA: Biomass is not normally distributed for all three seasons

boxplot(burn$biomass ~ burn$season, horizontal = TRUE)

library(lattice)

qqmath(~ biomass | season, data = burn,
       prepanel = prepanel.qqmathline,
       panel = function(x, ...) {
          panel.qqmathline(x, ...)
          panel.qqmath(x, ...)
       })

library(moments)
tapply(burn$biomass, burn$season, FUN = skewness)
##     spring     summer       fall 
## 0.46313666 0.03593579 0.94711420
tapply(burn$biomass, burn$season, FUN = shapiro.test)
## $spring
## 
##  Shapiro-Wilk normality test
## 
## data:  X[[i]]
## W = 0.96097, p-value = 0.8147
## 
## 
## $summer
## 
##  Shapiro-Wilk normality test
## 
## data:  X[[i]]
## W = 0.94147, p-value = 0.6763
## 
## 
## $fall
## 
##  Shapiro-Wilk normality test
## 
## data:  X[[i]]
## W = 0.87899, p-value = 0.3047

So we have normality, and we have all three positively skewed, with an outlier on the right in the fall group. The skew is low for spring and summer but moderate for the fall. But what about equal variances?

Testing for Homogeneity of Variance

H0: Biomass varies equally for all three seasons
HA: Biomass does not vary equally for all three seasons

library(car)
## Loading required package: carData
leveneTest(burn$biomass ~ burn$season, center = mean)
## Levene's Test for Homogeneity of Variance (center = mean)
##       Df F value Pr(>F)
## group  2  0.3794 0.6922
##       12
leveneTest(burn$biomass ~ burn$season, center = median)
## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  2  0.1672 0.8479
##       12

Levene’s test indicate equal variances, both with center = mean and center = median.

With normality and equal variances we could move ahead with ANOVA.

Carrying out the ANOVA

H0: Mean biomass is the same in all three seasons
HA: Mean biomass differs for at least one season

Note: The command is lm(y ~ x, data), followed by the anova() command to see the ANOVA table.

lm.1 = lm(biomass ~ season, data = burn)
anova(lm.1)
## Analysis of Variance Table
## 
## Response: biomass
##           Df Sum Sq Mean Sq F value   Pr(>F)   
## season     2 31.998 15.9992  7.5154 0.007655 **
## Residuals 12 25.546  2.1289                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The p-value is 0.007655 so we can easily reject H0; the data suggest that mean biomass is different for at least one season.

Planned Comparisons

Recall that we can only run so many planned comparisons because these have to be chosen before the data are gathered. My suggested rule of thumb is to try to have one planned comparison; this should be a comparison of the key treatment group versus the control group. Otherwise you might as well run unplanned comparisons.

What is the benefit of planned comparisons? You need not adjust for multiple comparisons since you aren’t doing multiple comparisons. So the \(\alpha\) you use is the standard \(\alpha=0.05\)

In an unplanned comparison you are basically checking for all pairwise comparisons and here the \(\alpha \neq 0.05\) but instead \(\alpha_{fwc}= \dfrac{\alpha}{c}\) where \(c\) is the number of pairwise comparisons being made and \(\alpha=0.05\). Quite clearly the more the pairwise comparisons the smaller will be the \(\alpha_{fwc}\) that you have to dip below before you can reject the NULL of no difference between group \(i\) and \(j\)

Lets assume when we were designing the study we decided we would be interested in comparing spring to fall. This was our planned comparison.

library(multcomp)
## Loading required package: mvtnorm
## Loading required package: survival
## Loading required package: TH.data
## Loading required package: MASS
## 
## Attaching package: 'TH.data'
## The following object is masked from 'package:MASS':
## 
##     geyser
pc1 = glht(lm.1, linfct = mcp(season = c("spring - fall = 0")))
summary(pc1)
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: User-defined Contrasts
## 
## 
## Fit: lm(formula = biomass ~ season, data = burn)
## 
## Linear Hypotheses:
##                    Estimate Std. Error t value Pr(>|t|)  
## spring - fall == 0  -2.1840     0.9228  -2.367   0.0356 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
confint(pc1)
## 
##   Simultaneous Confidence Intervals
## 
## Multiple Comparisons of Means: User-defined Contrasts
## 
## 
## Fit: lm(formula = biomass ~ season, data = burn)
## 
## Quantile = 2.1788
## 95% family-wise confidence level
##  
## 
## Linear Hypotheses:
##                    Estimate lwr     upr    
## spring - fall == 0 -2.1840  -4.1946 -0.1734

The adjusted p-value is 0.0356 so we can reject the NULL of no difference in mean biomass between spring and fall.

Unplanned Comparisons

What if we had no planned comparisons but wanted to run unplanned comparisons once the test was done and we rejected the H0 of ANOVA? That would be simple to do:

pc2 = glht(lm.1, linfct = mcp(season = "Tukey"))
summary(pc2)
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: lm(formula = biomass ~ season, data = burn)
## 
## Linear Hypotheses:
##                      Estimate Std. Error t value Pr(>|t|)   
## summer - spring == 0   3.5460     0.9228   3.843  0.00604 **
## fall - spring == 0     2.1840     0.9228   2.367  0.08448 . 
## fall - summer == 0    -1.3620     0.9228  -1.476  0.33628   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
confint(pc2)
## 
##   Simultaneous Confidence Intervals
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: lm(formula = biomass ~ season, data = burn)
## 
## Quantile = 2.6661
## 95% family-wise confidence level
##  
## 
## Linear Hypotheses:
##                      Estimate lwr     upr    
## summer - spring == 0  3.5460   1.0858  6.0062
## fall - spring == 0    2.1840  -0.2762  4.6442
## fall - summer == 0   -1.3620  -3.8222  1.0982

Now, note that we see a statistically significant difference between summer and spring but no statistically significant differences between fall and spring, or fall and summer.

Jet-lag

Traveling to a different time zone can cause jet lag but people adjust as the schedule of light to their eyes in the new time zone resets their internal circadian clock. This change in the internal clock is called a phase shift. Researchers suggested that the human circadian clock can also be reset by exposing the back of the knee to a light. This claim met with skepticism and a new experiment was conducted.

This new experiment measured circadian rhythm by the daily cycle of melatonin production in 22 people randomly assigned to one of three light treatments. Participants were awakened from sleep and subjected to a single three-hour episode of bright lights applied (a) to the eyes only, (b) to the knees only, or (c) to neither.

The effects of the light treatment were measured two days later by the magnitude of the phase shift in each participant’s daily cycle of melatonin production. A negative value indicates a delay in melatonin production (which is what the light treatment should do as hypothesized). Does light treatment affect phase shift?

jetlag = read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter15/chap15e1KneesWhoSayNight.csv"))
head(jetlag); names(jetlag)
##   treatment shift
## 1   control  0.53
## 2   control  0.36
## 3   control  0.20
## 4   control -0.37
## 5   control -0.60
## 6   control -0.64
## [1] "treatment" "shift"

Let us check for normality and if that test is passed then we can check for homogeneity of variances.

Checking for Normality

H0: Phase shift is normally distributed for all three groups
HA: Phase shift is not normally distributed for all three groups

boxplot(jetlag$shift ~ jetlag$treatment, horizontal = TRUE)

library(lattice)
qqmath(~ shift, groups = treatment, data = jetlag,
       auto.key = list(space = "top"),
       prepanel = prepanel.qqmathline, 
       panel = function(x, ...) {
         panel.qqmathline(x, ...)
         panel.qqmath(x, ...)
       })

library(moments)
tapply(jetlag$shift, jetlag$treatment, skewness)
##     control        eyes        knee 
##  0.01308224 -0.71961847 -0.28943024
tapply(jetlag$shift, jetlag$treatment, shapiro.test)
## $control
## 
##  Shapiro-Wilk normality test
## 
## data:  X[[i]]
## W = 0.9329, p-value = 0.5428
## 
## 
## $eyes
## 
##  Shapiro-Wilk normality test
## 
## data:  X[[i]]
## W = 0.9194, p-value = 0.4648
## 
## 
## $knee
## 
##  Shapiro-Wilk normality test
## 
## data:  X[[i]]
## W = 0.98874, p-value = 0.9907

So the NULL of normal distributions in each group cannot be rejected. On to testing for homogeneity of variances.

Checking for Equal Variances

H0: Phase shift varies equally for all three groups
HA: Phase shift does not vary equally for all three groups

library(car)
leveneTest(jetlag$shift ~ jetlag$treatment, center="mean")
## Levene's Test for Homogeneity of Variance (center = "mean")
##       Df F value Pr(>F)
## group  2  0.1563 0.8564
##       19
leveneTest(jetlag$shift ~ jetlag$treatment, center="median")
## Levene's Test for Homogeneity of Variance (center = "median")
##       Df F value Pr(>F)
## group  2  0.1586 0.8545
##       19

No violations of the homogeneity of variances assumption either so we can run ANOVA on these data. First we’ll specify the hypotheses:

Carrying out the ANOVA

\(H_0\): There is no difference in mean shift between the groups \((\mu_1 = \mu_2 = \mu_3)\)
\(H_A\): There is a difference in mean shift between the groups (i.e., at least one group mean is different).

lm.2 = lm(shift ~ treatment, data = jetlag)
anova(lm.2)
## Analysis of Variance Table
## 
## Response: shift
##           Df Sum Sq Mean Sq F value   Pr(>F)   
## treatment  2 7.2245  3.6122  7.2894 0.004472 **
## Residuals 19 9.4153  0.4955                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Given the p-value is \(< 0.05\) we can safely reject \(H_0\); at least one of the group means is different.

Planned Comparisons

Assume we were interested in the knee versus the control groups.

library(multcomp)
jlP = glht(lm.2, linfct = mcp(treatment = c("knee - control = 0")))
summary(jlP)
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: User-defined Contrasts
## 
## 
## Fit: lm(formula = shift ~ treatment, data = jetlag)
## 
## Linear Hypotheses:
##                     Estimate Std. Error t value Pr(>|t|)
## knee - control == 0 -0.02696    0.36433  -0.074    0.942
## (Adjusted p values reported -- single-step method)
confint(jlP)
## 
##   Simultaneous Confidence Intervals
## 
## Multiple Comparisons of Means: User-defined Contrasts
## 
## 
## Fit: lm(formula = shift ~ treatment, data = jetlag)
## 
## Quantile = 2.093
## 95% family-wise confidence level
##  
## 
## Linear Hypotheses:
##                     Estimate lwr      upr     
## knee - control == 0 -0.02696 -0.78951  0.73558

Given the p-value it is obvious these two groups do not differ in terms of mean phase shift.

Unplanned Comparisons

Here we go on a hunt, foolishly I might add.

library(multcomp)
jlup = glht(lm.2, linfct = mcp(treatment = "Tukey"))
summary(jlup)
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: lm(formula = shift ~ treatment, data = jetlag)
## 
## Linear Hypotheses:
##                     Estimate Std. Error t value Pr(>|t|)   
## eyes - control == 0 -1.24268    0.36433  -3.411   0.0079 **
## knee - control == 0 -0.02696    0.36433  -0.074   0.9970   
## knee - eyes == 0     1.21571    0.37628   3.231   0.0117 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
confint(jlup)
## 
##   Simultaneous Confidence Intervals
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: lm(formula = shift ~ treatment, data = jetlag)
## 
## Quantile = 2.5408
## 95% family-wise confidence level
##  
## 
## Linear Hypotheses:
##                     Estimate lwr      upr     
## eyes - control == 0 -1.24268 -2.16838 -0.31698
## knee - control == 0 -0.02696 -0.95267  0.89874
## knee - eyes == 0     1.21571  0.25965  2.17178

Note that here we discover significant differences in mean phase shift between the eyes and the control groups, and between the eyes and the knee groups.

Welch’s One-Way ANOVA

Welch’s one-way ANOVA is used when normality holds but homogeneity of variance does not hold. One may be tempted to attempt a transformation in the hope that this will ensure equal variances but this seldom works.

I’ll generate some fake data investigating the effects of grip-types on cross-country skiing performance.

score = c(164.2, 162.4, 163.2, 169.7, 171.2, 172.9, 140.1, 164.7, 172.9, 151.3)
grips = c(rep(1, 3), rep(2, 3), rep(3, 4))
grips = factor(grips, levels = c(1, 2, 3), labels = c("Classic", "Integrated", "Modern"))
ski = cbind.data.frame(score, grips)

Let us do a quick check for normality and equal variances.

library(car)
with(ski, boxplot(score ~ grips, horizontal = TRUE))

with(ski, tapply(score, grips, FUN = shapiro.test))
## $Classic
## 
##  Shapiro-Wilk normality test
## 
## data:  X[[i]]
## W = 0.9959, p-value = 0.8777
## 
## 
## $Integrated
## 
##  Shapiro-Wilk normality test
## 
## data:  X[[i]]
## W = 0.9987, p-value = 0.9311
## 
## 
## $Modern
## 
##  Shapiro-Wilk normality test
## 
## data:  X[[i]]
## W = 0.97406, p-value = 0.8665
with(ski, leveneTest(score, grips, center = median))
## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value   Pr(>F)   
## group  2  9.5746 0.009925 **
##        7                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Quite clearly Shapiro-Wilk indicates that normality holds but Levene’s test soundly rejects the NULL of equal variances. This would be the ideal situation for Welch’s one-way ANOVA.

oneway.test(score ~ grips, data = ski, var.equal = FALSE)
## 
##  One-way analysis of means (not assuming equal variances)
## 
## data:  score and grips
## F = 24.975, num df = 2.0000, denom df = 4.1069, p-value = 0.005029

The p-value is 0.005 and so we can reject H0; the data indicate that mean scores do differ for at least one grip-type.

This then begs the question: How do we determine which groups are really different? The Games-Howell test is commonly used in these situations because it is designed for situations where (i) the sample sizes differ across the groups, and/or (ii) the variances are unequal. The test uses the sample size and variance of each group being compared in a specific pairwise comparison.

library(userfriendlyscience)
posthocTGH(y = ski$score, x = ski$grips)
##            n means variances
## Classic    3   163      0.81
## Integrated 3   171      2.56
## Modern     4   157    209.98
## 
##                    diff ci.lo ci.hi    t  df   p
## Integrated-Classic    8   3.7    12 7.54 3.2 .01
## Modern-Classic       -6 -36.2    24 0.83 3.0 .71
## Modern-Integrated   -14 -43.9    16 1.92 3.1 .27

The comparisons indicate a significant difference only for Classic vs. Integrated, with a p-value of 0.0084.

Non-Parametric Tests: The Kruskal-Wallis

The amount of oxygen obtained in each breath at high altitude can be as low as one-third that obtained at sea level. Studies have begun to examine whether indigenous people living at high elevations have physiological attributes that compensate for the reduced availability of oxygen. A reasonable expectation is that they should have more hemoglobin, the molecule that binds and transports oxygen in the blood. To test this, researchers sampled blood from males in three high-altitude human populations: the high Andes, high-elevation Ethiopia, and Tibet, along with a sea-level population from the USA (Bell et al. 2002). These data are given below:

http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e3cHumanHemoglobinElevation.csv

Do mean hemoglobin levels differ between these groups?

We start with the usual checks for normality.

hemo = read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e3cHumanHemoglobinElevation.csv")
summary(hemo)
##              id         hemoglobin       population  
##  Andean.male1 :   1   Min.   :10.40   Andes   :  71  
##  Andean.male10:   1   1st Qu.:14.80   Ethiopia: 128  
##  Andean.male11:   1   Median :15.45   Tibet   :  59  
##  Andean.male12:   1   Mean   :15.56   USA     :1704  
##  Andean.male13:   1   3rd Qu.:16.15                  
##  Andean.male14:   1   Max.   :25.06                  
##  (Other)      :1956
boxplot(hemo$hemoglobin ~ hemo$population, horizontal = TRUE)

library(lattice)
qqmath(~ hemoglobin | population, data = hemo,
       prepanel = prepanel.qqmathline,
       panel = function(x, ...) {
          panel.qqmathline(x, ...)
          panel.qqmath(x, ...)
       })

library(moments)
tapply(hemo$hemoglobin, hemo$population, FUN = skewness)
##      Andes   Ethiopia      Tibet        USA 
##  0.4230040  0.0437093  0.6690375 -0.2542965
tapply(hemo$hemoglobin, hemo$population, FUN = shapiro.test)
## $Andes
## 
##  Shapiro-Wilk normality test
## 
## data:  X[[i]]
## W = 0.97937, p-value = 0.293
## 
## 
## $Ethiopia
## 
##  Shapiro-Wilk normality test
## 
## data:  X[[i]]
## W = 0.995, p-value = 0.9359
## 
## 
## $Tibet
## 
##  Shapiro-Wilk normality test
## 
## data:  X[[i]]
## W = 0.9551, p-value = 0.02922
## 
## 
## $USA
## 
##  Shapiro-Wilk normality test
## 
## data:  X[[i]]
## W = 0.99235, p-value = 9.567e-08
library(car)
leveneTest(hemo$hemoglobin, hemo$population, center = median)
## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value    Pr(>F)    
## group    3  30.114 < 2.2e-16 ***
##       1958                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Neither normality nor homogeneity of variances holds across the four groups. Might a transformation work?

The Log-transform

hemo$log.hemo = log(hemo$hemoglobin)

boxplot(hemo$log.hemo ~ hemo$population, horizontal = TRUE)

library(lattice)
qqmath(~ log.hemo | population, data = hemo,
       prepanel = prepanel.qqmathline,
       panel = function(x, ...) {
          panel.qqmathline(x, ...)
          panel.qqmath(x, ...)
       })

library(moments)
tapply(hemo$log.hemo, hemo$population, FUN = skewness)
##       Andes    Ethiopia       Tibet         USA 
##  0.06522668 -0.15719251  0.35690592 -0.55585979
tapply(hemo$log.hemo, hemo$population, FUN = shapiro.test)
## $Andes
## 
##  Shapiro-Wilk normality test
## 
## data:  X[[i]]
## W = 0.9873, p-value = 0.6952
## 
## 
## $Ethiopia
## 
##  Shapiro-Wilk normality test
## 
## data:  X[[i]]
## W = 0.99394, p-value = 0.8618
## 
## 
## $Tibet
## 
##  Shapiro-Wilk normality test
## 
## data:  X[[i]]
## W = 0.97191, p-value = 0.1888
## 
## 
## $USA
## 
##  Shapiro-Wilk normality test
## 
## data:  X[[i]]
## W = 0.98154, p-value = 5.211e-14
library(car)
leveneTest(hemo$log.hemo, hemo$population, center = median)
## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value    Pr(>F)    
## group    3  15.136 9.706e-10 ***
##       1958                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

No luck here.

The Square-root

hemo$sqrt.hemo = sqrt(hemo$hemoglobin)

boxplot(hemo$sqrt.hemo ~ hemo$population, horizontal = TRUE)

library(lattice)
qqmath(~ sqrt.hemo | population, data = hemo,
       prepanel = prepanel.qqmathline,
       panel = function(x, ...) {
          panel.qqmathline(x, ...)
          panel.qqmath(x, ...)
       })

library(moments)
tapply(hemo$sqrt.hemo, hemo$population, FUN = skewness)
##       Andes    Ethiopia       Tibet         USA 
##  0.24327995 -0.05585916  0.51125721 -0.39930875
tapply(hemo$sqrt.hemo, hemo$population, FUN = shapiro.test)
## $Andes
## 
##  Shapiro-Wilk normality test
## 
## data:  X[[i]]
## W = 0.98481, p-value = 0.5481
## 
## 
## $Ethiopia
## 
##  Shapiro-Wilk normality test
## 
## data:  X[[i]]
## W = 0.99511, p-value = 0.9418
## 
## 
## $Tibet
## 
##  Shapiro-Wilk normality test
## 
## data:  X[[i]]
## W = 0.96472, p-value = 0.08487
## 
## 
## $USA
## 
##  Shapiro-Wilk normality test
## 
## data:  X[[i]]
## W = 0.98789, p-value = 9.55e-11
library(car)
leveneTest(hemo$sqrt.hemo, hemo$population, center = median)
## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value    Pr(>F)    
## group    3  21.402 1.222e-13 ***
##       1958                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

No luck here either.

The Square

hemo$sq.hemo = hemo$hemoglobin^2

boxplot(hemo$sq.hemo ~ hemo$population, horizontal = TRUE)

library(lattice)
qqmath(~ sq.hemo | population, data = hemo,
       prepanel = prepanel.qqmathline,
       panel = function(x, ...) {
          panel.qqmathline(x, ...)
          panel.qqmath(x, ...)
       })

library(moments)
tapply(hemo$sq.hemo, hemo$population, FUN = skewness)
##      Andes   Ethiopia      Tibet        USA 
## 0.78917353 0.23857087 0.99256569 0.01031461
tapply(hemo$sq.hemo, hemo$population, FUN = shapiro.test)
## $Andes
## 
##  Shapiro-Wilk normality test
## 
## data:  X[[i]]
## W = 0.95991, p-value = 0.02324
## 
## 
## $Ethiopia
## 
##  Shapiro-Wilk normality test
## 
## data:  X[[i]]
## W = 0.99109, p-value = 0.5886
## 
## 
## $Tibet
## 
##  Shapiro-Wilk normality test
## 
## data:  X[[i]]
## W = 0.92881, p-value = 0.001958
## 
## 
## $USA
## 
##  Shapiro-Wilk normality test
## 
## data:  X[[i]]
## W = 0.99612, p-value = 0.0002425
library(car)
leveneTest(hemo$sq.hemo, hemo$population, center = median)
## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value    Pr(>F)    
## group    3  56.011 < 2.2e-16 ***
##       1958                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

That does it; nothing worked thus far. On with the Kruskal-Wallis test then. Recall the hypotheses here:

\(H_0\): The distributions of hemoglobin levels are similar across the groups
\(H_A\): The distributions of hemoglobin levels are NOT similar across the groups

kruskal.test(hemo$hemoglobin ~ hemo$population)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  hemo$hemoglobin by hemo$population
## Kruskal-Wallis chi-squared = 199.56, df = 3, p-value < 2.2e-16

The p-value is practically zero so we can easily reject \(H_0\); these data indicate that hemoglobin levels are distributed differently across the groups.