class: title-slide, center, middle, inverse # .fat[.fancy[Analysis of Variance (ANOVA)]] ## .fat[.fancy[MPA 6010]] ## .fat[.fancy[Ani Ruhil]] --- <script type="text/x-mathjax-config"> MathJax.Hub.Register.StartupHook("TeX Jax Ready",function () { MathJax.Hub.Insert(MathJax.InputJax.TeX.Definitions.macros,{ cancel: ["Extension","cancel"], bcancel: ["Extension","cancel"], xcancel: ["Extension","cancel"], cancelto: ["Extension","cancel"] }); }); </script> # .fat[.fancy[Agenda]] 1. The Trouble with Multiple Comparisons 2. One-Way Analysis of Variance (ANOVA) 3. Two-Way (Factorial) Analysis of Variance (ANOVA) --- class: inverse, center, middle # .heat[.fancy[Comparing More than Two Groups]] --- ### How can you compare a continuous outcome variable when you have more than two groups? .pull-left[ <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <caption>Customer Satisfaction Scores</caption> <thead> <tr> <th style="text-align:center;"> Observation.No. </th> <th style="text-align:center;"> Atlanta </th> <th style="text-align:center;"> Dallas </th> <th style="text-align:center;"> Seattle </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 85 </td> <td style="text-align:center;"> 71 </td> <td style="text-align:center;"> 59 </td> </tr> <tr> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 75 </td> <td style="text-align:center;"> 75 </td> <td style="text-align:center;"> 64 </td> </tr> <tr> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 82 </td> <td style="text-align:center;"> 73 </td> <td style="text-align:center;"> 62 </td> </tr> <tr> <td style="text-align:center;"> 4 </td> <td style="text-align:center;"> 76 </td> <td style="text-align:center;"> 74 </td> <td style="text-align:center;"> 69 </td> </tr> <tr> <td style="text-align:center;"> 5 </td> <td style="text-align:center;"> 71 </td> <td style="text-align:center;"> 69 </td> <td style="text-align:center;"> 75 </td> </tr> <tr> <td style="text-align:center;"> 6 </td> <td style="text-align:center;"> 85 </td> <td style="text-align:center;"> 82 </td> <td style="text-align:center;"> 67 </td> </tr> </tbody> </table> ] .pull-right[ <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <caption>Descriptive Statistics</caption> <thead> <tr> <th style="text-align:left;"> Location </th> <th style="text-align:right;"> Mean </th> <th style="text-align:right;"> Variance </th> <th style="text-align:right;"> Std. Dev. </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Atlanta </td> <td style="text-align:right;"> 79 </td> <td style="text-align:right;"> 34 </td> <td style="text-align:right;"> 5.83 </td> </tr> <tr> <td style="text-align:left;"> Dallas </td> <td style="text-align:right;"> 74 </td> <td style="text-align:right;"> 20 </td> <td style="text-align:right;"> 4.47 </td> </tr> <tr> <td style="text-align:left;"> Seattle </td> <td style="text-align:right;"> 66 </td> <td style="text-align:right;"> 32 </td> <td style="text-align:right;"> 5.66 </td> </tr> </tbody> </table> <img src="anova_files/figure-html/unnamed-chunk-5-1.svg" width="75%" style="display: block; margin: auto;" /> ] --- ### Pairwise t-tests? Why not compare Atlanta to Dallas, see if they differ, then compare Atlanta to Seattle, see if there is a difference, and then repeat for Dallas and Seattle? Bad idea; you run into a situation of **`multiple comparisons`** * In any single trial we have a certain probability of a significant result by chance alone `\((\alpha)\)` and hence `\(1-\alpha\)` is the probability of no significant result * What is the probability that `at least one` of these pairs throws up a significant result by chance alone? ... this is P(Type I) error `\(= \alpha=0.05\)` `\begin{align*} P(\text{no Type I error in 1 comparison}) & = & 0.95 \\ P(\text{no Type I error in 2 comparisons}) & = & 0.95 \times 0.95 = 0.9025 \\ \text{Note: P(Type I error in 2 comparisons) is} & = & 1 - 0.9025 = 0.0975 \\ P(\text{no Type I error in 3 comparisons}) & = & 0.95 \times 0.95 \times 0.95 = 0.8573 \\ \text{Note: P(Type I error in 3 comparisons) is} & = & 1 - 0.857375 = 0.1426 \end{align*}` .fancy[.heat[the probability of making a Type I error is exploding!]] --- ### Correcting for Multiple Comparisons: `\(\alpha^* = \frac{\alpha}{\text{No. of Trials}}\)` .pull-left[ <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> Number of Trials </th> <th style="text-align:center;"> Adjusted Alpha </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 0.0500 </td> </tr> <tr> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 0.0250 </td> </tr> <tr> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 0.0167 </td> </tr> <tr> <td style="text-align:center;"> 4 </td> <td style="text-align:center;"> 0.0125 </td> </tr> <tr> <td style="text-align:center;"> 5 </td> <td style="text-align:center;"> 0.0100 </td> </tr> <tr> <td style="text-align:center;"> 6 </td> <td style="text-align:center;"> 0.0083 </td> </tr> <tr> <td style="text-align:center;"> 7 </td> <td style="text-align:center;"> 0.0071 </td> </tr> <tr> <td style="text-align:center;"> 8 </td> <td style="text-align:center;"> 0.0063 </td> </tr> <tr> <td style="text-align:center;"> 9 </td> <td style="text-align:center;"> 0.0056 </td> </tr> <tr> <td style="text-align:center;"> 10 </td> <td style="text-align:center;"> 0.0050 </td> </tr> </tbody> </table> ] .pull-right[ <img src="anova_files/figure-html/unnamed-chunk-7-1.svg" width="100%" style="display: block; margin: auto;" /> Reject `\(H_0\)` only if p-values `\(\leq \alpha^*\)` ] --- class: inverse, middle, center # .fancy[.heat[ One-way ANOVA ]] --- ### The Logic of ANOVA ANOVA is a hypothesis testing procedure that allows us to simultaneously compare three or more groups and determine if they are drawn from a common population or from different populations ANOVA also lets us test the influence of two or more `independent variables` on the `dependent variable` The test statistic is a ratio: `\(\dfrac{\text{Difference between groups}}{\text{Difference within groups}}\)` > If difference between groups `\(>\)` difference within each group, it must be because something differentiates the groups such that they are different In ANOVA we use the word `treatments` to refer to the independent variable (e.g., drugs tested, trainings, etc.) But how can we measure the `difference between groups` and the `difference within groups`? --- ### How shall we measure and analyze `difference`? We could ask: `How much does each individual differ from the overall Mean?` We could also ask: `In each group, how much does each group member differ from his/her group Mean?` What would be a good measure of difference? Well, the variance of course! So, we look at a couple of variances: (1) Total variance of `all scores` (2) Variance `between-groups` (3) Variance `within-groups` > Note: We will be calculating variability in terms of the Sum of Squares, i.e., `\(\sum(x_{i} - \bar{x}^{2})\)` --- ### Between-groups and Within-groups Variance If groups are from a common population, their means should be similar When groups exhibit differences, one asks why? ... Perhaps, (1) chance is to blame (2) individuals have intrinsic differences (3) individuals were exposed to treatments that differed across groups > For example, students in catholic schools, public schools, private schools > For example, patients exposed to Trial Drug A, Trial Drug B, Placebo > For example, counties with no mandatory mask enforcement for COVID-19 versus counties mandating masks inside public buildings versus counties with masks in all public spaces (including sidewalks) --- ### (1) Chance Chance refers to two possible sources of differences (1) Person-to-Person differences (people are unique) (2) Experimental Errors (sample-to-sample variability because of something going wrong with the experiment or sampling) Chance could/will influence all scores, within-groups and between-groups Key question becomes how large or small is the influence of `chance`? --- ### Between-groups and Within-groups Variance Within-groups, differences must be due to chance `because the treatment is a constant in each group` Between-groups, differences may be due to chance and/or treatment effects Test statistic is a ratio of `between-groups` variance to `within-groups` variance `$$F = \dfrac{\text{Variance Between-groups}}{\text{Variance Within-groups}}$$` `$$\therefore F = \dfrac{\text{Variance due to Chance} + \text{Variance due to Treatments}}{\text{Variance due to Chance}}$$` > If variance due to treatments is `\(=0\)`, what will `\(F\)` be? > If variance due to treatments is `\(>0\)` and large, what will `\(F\)` be? --- ### The Elements of ANOVA In ANOVA we refer to the independent variable(s) as the `factor(s)` The values of a factor we refer to as the `treatments` The outcome is referred to as the `response` variable With just one factor we speak of a `single-factor design` With two or more factors we speak of a `factorial design` Assumptions underlying ANOVA (1) The response variable is `\(\sim N(.)\)` (i.e., Normally distributed) (2) The variance `\(\sigma^{2}\)` of the response variable is the same for all groups (3) The observations are independent within each group (i.e., random sampling was not violated) The Hypotheses: `\(H_{0}: \text{ All population means are equal}\)` `\(H_{1}: \text{ Not all population means are equal}\)` --- .pull-left[ <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <caption>Customer Satisfaction Scores</caption> <thead> <tr> <th style="text-align:center;"> Observation.No. </th> <th style="text-align:center;"> Atlanta </th> <th style="text-align:center;"> Dallas </th> <th style="text-align:center;"> Seattle </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 85 </td> <td style="text-align:center;"> 71 </td> <td style="text-align:center;"> 59 </td> </tr> <tr> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 75 </td> <td style="text-align:center;"> 75 </td> <td style="text-align:center;"> 64 </td> </tr> <tr> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 82 </td> <td style="text-align:center;"> 73 </td> <td style="text-align:center;"> 62 </td> </tr> <tr> <td style="text-align:center;"> 4 </td> <td style="text-align:center;"> 76 </td> <td style="text-align:center;"> 74 </td> <td style="text-align:center;"> 69 </td> </tr> <tr> <td style="text-align:center;"> 5 </td> <td style="text-align:center;"> 71 </td> <td style="text-align:center;"> 69 </td> <td style="text-align:center;"> 75 </td> </tr> <tr> <td style="text-align:center;"> 6 </td> <td style="text-align:center;"> 85 </td> <td style="text-align:center;"> 82 </td> <td style="text-align:center;"> 67 </td> </tr> </tbody> </table> ] .pull-right[ <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <caption>Descriptive Statistics</caption> <thead> <tr> <th style="text-align:left;"> Location </th> <th style="text-align:right;"> Mean </th> <th style="text-align:right;"> Variance </th> <th style="text-align:right;"> Std. Dev. </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Atlanta </td> <td style="text-align:right;"> 79 </td> <td style="text-align:right;"> 34 </td> <td style="text-align:right;"> 5.83 </td> </tr> <tr> <td style="text-align:left;"> Dallas </td> <td style="text-align:right;"> 74 </td> <td style="text-align:right;"> 20 </td> <td style="text-align:right;"> 4.47 </td> </tr> <tr> <td style="text-align:left;"> Seattle </td> <td style="text-align:right;"> 66 </td> <td style="text-align:right;"> 32 </td> <td style="text-align:right;"> 5.66 </td> </tr> </tbody> </table> `\(H_{0}: \mu_{1}=\mu_{2}=\mu_{3}\)` `\(H_{1}: \text{ Not all population means are equal}\)` ] .pull-left[ `\(i\)` indexes observations; `\(j\)` indexes groups; `\(\mu_{j}\)` is mean of the `\(j^{th}\)` group; `\(n_{j}\)` is sample size of group `\(j\)` `\(x_{ij}\)` is score for observation `\(i\)` for group `\(j\)` `\(\bar{x_{j}}\)` is mean for group `\(j\)` `\(\bar{x}_{j} = \frac{1}{n_{j}} \sum_{i=1}^{n_{j}}{x_{ij}}\)` ] .pull-right[ `\(s^{2}_{j}\)` and `\(s_{j}\)` are variance and standard deviation for group `\(j\)` `\(s^{2}_{j} = \frac{1}{n_{j} - 1} \sum_{i=1}^{n_{j}}{(x_{ij} - \bar{x_{j}})^{2}}\)` ] --- `\(\bar{\bar{x}}\)` is the overall sample mean `\(\bar{\bar{x}} = \frac{1}{n_{T}} \sum_{j=1}^{k} \sum_{i=1}^{n_{j}} {x_{ij}}\)` Note that `\(n_{T} = n_{1} + n_{2} + \cdots + n_{k}\)` If `\(n_{1} = n_{2} = \cdots = n_{k}\)`, then `\(n_{T} = k(n)\)` and consequently, `\(\bar{\bar{x}} = \frac{1}{kn} \sum_{j=1}^{k} \sum_{i=1}^{n_{j}} {x_{ij}} = \frac{1}{k} \sum_{j=1}^{k} \sum_{i=1}^{n_{j}} \frac{x_{ij}}{n} = \frac{1}{k} \sum_{j=1}^{k} \bar{x_{j}}\)` .pull-left[ Mean Square due to Treatments (MSTR) is given by `\(\dfrac{\mbox{SSTR}}{k - 1}\)` `\(\mbox{MSTR} = \frac{1}{k - 1} \sum_{j=1}^{k} n_{j} {(\bar{x_{j}} - \bar{\bar{x}})^{2}}\)` `\(\mbox{SSTR} = \sum_{j=1}^{k} n_{j} {(\bar{x_{j}} - \bar{\bar{x}})^{2}}\)` ] .pull-right[ Mean Square due to Error (MSE) is given by `\(\dfrac{\mbox{SSE}}{n_{T} - k}\)` `\(\mbox{MSE} = \frac{1}{n_{T} - k} \sum_{j=1}^{k} (n_{j} - 1) s^{2}_{j}\)` `\(\mbox{SSE} = \sum_{j=1}^{k} (n_{j}-1) s^{2}_{j}\)` Note also that `\(\mbox{SST} = \mbox{SSTR} + \mbox{SSE}\)` `\(\mbox{SST} = \sum_{j=1}^{k}\sum_{i=1}^{n_{j}} (x_{ij} - \bar{\bar{x}})^{2}\)` ] --- `\(F = \dfrac{\mbox{MSTR}}{\mbox{MSE}}\)` `\(F \sim F_{df_{Numerator}; df_{Denominator}}\)` * `\(df_{Numerator} = k - 1\)` * `\(df_{Denominator} = n_{T} - k\)` Reject `\(H_{0}\)` if `\(p-value \leq \alpha\)`; Do not reject `\(H_0\)` otherwise `Alternatively:` Reject `\(H_{0}\)` if Calculated `\(F \geq\)` Critical `\(F\)`; Do not reject `\(H_0\)` otherwise > If there is as much variation between the groups as there is within the groups then the ratio will be `\(= 1\)`. > If there is more variance between the groups than within them F will be `\(> 1\)`. --- ### A Worked Example | Dating | Engaged | Married | | --: | --: | --: | | 89 | 99 | 109 | | 90 | 100 | 110 | | 91 | 101 | 111 | `Overall mean:` `\(\bar{\bar{x}} = \dfrac{89+90+91+\ldots+111}{9} = \dfrac{900}{9} = 100\)` `Dating mean:` `\(\bar{x}_{D} = \dfrac{89+90+91}{3} = 90\)` `Engaged mean:` `\(\bar{x}_{E} = 100\)` `Married mean:` `\(\bar{x}_{M} = 110\)` --- ### ... .pull-left[ | `\(x\)` | `\((x - \bar{\bar{x}})\)` | `\((x - \bar{\bar{x}})^{2}\)` | | --: | --: | --: | | 89 | `\(89 - 100 = -11\)` | `\((-11)^{2}=121\)` | | 90 | `\(90 - 100 = -10\)` | `\((-10)^{2}=100\)` | | 91 | `\(91-100 = -9\)` | `\((-9)^{2}=81\)` | | 99 | `\(99-100=-1\)` | `\((-1)^{2}=1\)` | | 100 | `\(100-100=0\)` | `\((0)^{2}=0\)` | | 101 | `\(101-100=1\)` | `\((1)^{2}=1\)` | | 109 | `\(109-100=9\)` | `\((9)^{2}=9\)` | | 110 | `\(110-100=10\)` | `\((10)^{2}=100\)` | | 111 | `\(111-100=11\)` | `\((11)^{2}=121\)` | | `\(\sum{x}=900\)` | `\(\sum(x-\bar{\bar{x}})=0\)` | `\(\sum(x-\bar{\bar{x}})^{2} =606\)` | ] .pull-right[ `\(SST = 606\)` `\(SSTR = (\bar{x}_D - \bar{\bar{x}})^{2} \times n_D \\ + (\bar{x}_E - \bar{\bar{x}})^{2} \times n_E \\ + (\bar{x}_M - \bar{\bar{x}})^{2} \times n_M\)` `\(SSTR = (90 - 100)^2 \times 3 \\ + (100 - 100)^2 \times 3 \\ + (110 - 100)^2 \times 3\)` `\(SSTR = (100) \times 3 + (0) \times 3 + (100) \times 3 = 600\)` `\(SSE = SST - SSTR = 606 - 600 = 6\)` ] --- `\(MSTR = \dfrac{SSTR}{k-1} = \dfrac{600}{3-1} = \dfrac{600}{2}=300\)` `\(MSE = \dfrac{SSE}{n_T - k} = \dfrac{6}{9-3} = \dfrac{6}{6}=1\)` Calculated `\(F=\dfrac{MSTR}{MSE} = \dfrac{300}{1} = 300\)` Critical `\(F_{2,6} = 5.14\)` Since Calculated `\(F >\)` Critical `\(F\)` we conclude that relationship satisfaction varies by type of relationship > p-value of Calculated F is `\(< 0.001\)` --- When ANOVA gives you significant results, it often pays to calculate the `effect size` -- a measure of how much of the total variation in the outcome can be attributed to group differences. Effect size can be calculated via in a number of ways, with the `eta-squared` `\(\eta^{2} = \dfrac{SSTR}{SST}=\dfrac{600}{606} = 0.99\)` being one of the more popular ones. * `\(\eta^2 < 0.5\)` is a small effect * `\(0.5 \leq \eta^2 < 0.8\)` is medium effect, and * `\(\eta^2 \geq 0.8\)` is a large effect `\(\eta^{2}\)` overestimates effects so often `omega-squared` `\((\omega^2)\)` is used instead `\(\omega^{2} = \dfrac{SSTR - (k-1)MSE}{SST+MSE} = \dfrac{600 - (3-1)\times1}{606+1} = \dfrac{598}{607} = 0.98\)` * `\(\omega^2 = 0.01\)` (small effect) * `\(\omega^2 = 0.06\)` (medium effect) * `\(\omega^2 = 0.15\)` (large effect) --- ### Another Example A program to lower blood pressure assigns participants to one of four conditions (see table), Each participant's systolic blood pressure is measured after two weeks of treatment. Hypothesis is that a combination (`Diet and Drug`) of treatments will be more effective than each individual treatment in isolation. .pull-left[ <table> <thead> <tr> <th style="text-align:right;"> Control </th> <th style="text-align:right;"> Diet Only </th> <th style="text-align:right;"> Drug Only </th> <th style="text-align:right;"> Diet and Drug </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 163 </td> <td style="text-align:right;"> 166 </td> <td style="text-align:right;"> 161 </td> <td style="text-align:right;"> 153 </td> </tr> <tr> <td style="text-align:right;"> 178 </td> <td style="text-align:right;"> 173 </td> <td style="text-align:right;"> 171 </td> <td style="text-align:right;"> 168 </td> </tr> <tr> <td style="text-align:right;"> 180 </td> <td style="text-align:right;"> 188 </td> <td style="text-align:right;"> 178 </td> <td style="text-align:right;"> 176 </td> </tr> <tr> <td style="text-align:right;"> 181 </td> <td style="text-align:right;"> 190 </td> <td style="text-align:right;"> 183 </td> <td style="text-align:right;"> 198 </td> </tr> <tr> <td style="text-align:right;"> 185 </td> <td style="text-align:right;"> 193 </td> <td style="text-align:right;"> 195 </td> <td style="text-align:right;"> 200 </td> </tr> </tbody> </table> ] .pull-right[ <img src="anova_files/figure-html/unnamed-chunk-12-1.svg" width="100%" style="display: block; margin: auto;" /> ] --- `\(H_0:\)` There is no difference in mean systolic blood pressure between the treatment groups `\(H_1:\)` All treatment groups are not the same Overall mean `\(\bar{\bar{x}} = 179\)`
--- Mean for Control group `\(\bar{x}_{Control} = 177\)` Mean for Diet only group `\(\bar{x}_{Diet} = 182\)` Mean for Drug only group `\(\bar{x}_{Drug} = 178\)` Mean for Diet and Drug group `\(\bar{x}_{Diet+Drug} = 179\)` `\(SST = 3170\)` `\(SSTR = 67.6\)` `\(SSE = 3170 - 67.6 = 3102.4\)` `\(MSTR = \dfrac{SSTR}{k - 1} = \dfrac{67.6}{4-1} = \dfrac{67.6}{3} = 22.5\)` `\(MSE = \dfrac{SSE}{n - k} = \dfrac{3102.4}{20 - 4} = \dfrac{3102.4}{16} = 193.9\)` `\(F = \dfrac{MSTR}{MSE} = \dfrac{22.5}{193.9} = 0.116\)` `\(p-value = 0.949\)` so we are unable to reject the null hypothesis; these data suggest that there is no difference in mean systolic blood pressure between the treatment groups. --- ### Does Pre-surgical fitness influence recovery times? Sample of 24 males aged 18-30 underwent corrective knee surgery. Investigating relationship between prior fitness level and number of days needed for successful physical therapy. | Below Average | Average | Above Average | | :-: | :-: | :-: | | 29 | 30 | 26 | | 42 | 35 | 32 | | 38 | 39 | 21 | | 40 | 28 | 20 | | 43 | 31 | 23 | | 40 | 31 | 22 | | 30 | 29 | -- | | 42 | 35 | -- | | -- | 29 | -- | | -- | 33 | -- | > What do you conclude? --- class: inverse, center, middle # .fancy[.heat[ Two-way ANOVA (aka Factorial ANOVA) ]] --- ### Examining the Influence of Two Independent Variables We may have more than one factor we want to consider Let us assume we have TWO factors (i.e., independent variables) For example, say we are interested in looking at how first-grade students learn under varying conditions of temperature and humidity. So we have an outcome (Quiz Score) and two independent variables: (1) Humidity, and (2) Temperature How might we test whether (1) Humidity and/or Temperature influence learning as measured by a quiz score, and (2) If the effects of one independent variable are constant across the values of the other independent variable? For example, does high temperature have the same impact on learning when temperature is the highest as when temperature is the lowest? --- ### An Example: Temperature, Humidity, and Learning | | `\(70^0\)` | `\(80^0\)` | `\(90^0\)` | | | :-- | :-- | :-- | :-- | :-- | | Low Humidity | `\(\bar{x} = 85\)` | `\(\bar{x} = 80\)` | `\(\bar{x} = 75\)` | `\(\bar{x}_{low} = 80\)` | | High Humidity | `\(\bar{x} = 75\)` | `\(\bar{x} = 70\)` | `\(\bar{x} = 65\)` | `\(\bar{x}_{high} = 70\)` | | | `\(\bar{x}_{70}=80\)` | `\(\bar{x}_{80} = 75\)` | `\(\bar{x}_{90}=70\)` | | * _Main Effect of Factor A (Humidity)_: Difference between means for high and low humidity * _Main Effect for Factor B (Temperature)_: Difference between means for `\(70^0\)`, `\(80^0\)`, and `\(90^0\)` temperature * Null Hypothesis for testing effects of Factor A: `\(H_0: \mu_{A1} = \mu_{A2}\)` * Null Hypothesis for testing effects of Factor B: `\(H_0: \mu_{B1} = \mu_{B2} = \mu_{B3}\)` * Note how the mean score drops as Temperature increases ... by exactly `\(5\)` * Note how the mean score drops as Humidity rises * At `\(70^0\)` there is a difference of `\(10\)` between Low/High humidity * At `\(80^0\)` there is a difference of `\(10\)` between Low/High humidity * At `\(90^0\)` there is a difference of `\(10\)` between Low/High humidity --- ### A Tweak ... | | `\(70^0\)` | `\(80^0\)` | `\(90^0\)` | | | :-- | :-- | :-- | :-- | :-- | | Low Humidity | `\(\bar{x} = 80\)` | `\(\bar{x} = 80\)` | `\(\bar{x} = 80\)` | `\(\bar{x}_{low} = 80\)` | | High Humidity | `\(\bar{x} = 80\)` | `\(\bar{x} = 70\)` | `\(\bar{x} = 60\)` | `\(\bar{x}_{high} = 70\)` | | | `\(\bar{x}_{70}=80\)` | `\(\bar{x}_{80} = 75\)` | `\(\bar{x}_{90}=70\)` | | * Note how the mean score drops as Temperature increases * Note how the mean score drops as Humidity rises * At `\(70^0\)` there is a difference of `\(0\)` between Low/High humidity * At `\(80^0\)` there is a difference of `\(10\)` between Low/High humidity * At `\(90^0\)` there is a difference of `\(20\)` between Low/High humidity > At Low humidity, raising the temperature has no impact > At High humidity, raising the temperature has an increasing impact --- .pull-left[ #### No Interaction <img src="anova_files/figure-html/unnamed-chunk-15-1.svg" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ #### An Interaction <img src="anova_files/figure-html/unnamed-chunk-16-1.svg" width="100%" style="display: block; margin: auto;" /> ] > **An interaction:** When the effect of a particular value of one factor (first independent variable) depends upon the value of the other factor (second independent variable) --- ### Hypotheses for Two-Way (aka Factorial) ANOVA There is no main effect of Diet. That is, * `\(H_0\)`: `\(\mu_{noDiet} = \mu_{yesDiet}\)` * `\(H_1\)`: `\(\mu_{noDiet} \neq \mu_{yesDiet}\)` There is no main effect of Drug. That is, * `\(H_0\)`: `\(\mu_{noDrug} = \mu_{yesDrug}\)` * `\(H_1\)`: `\(\mu_{noDrug} \neq \mu_{yesDrug}\)` There is no interaction effect between Diet and Drug. That is, * `\(H_0\)`: `\(\mu_{noDrug.noDiet} = \mu_{noDrug.yesDiet} = \mu_{yesDrug.noDiet} = \mu_{yesDrug.yesDiet}\)` * `\(H_1\)`: `\(\mu_{noDrug.noDiet} \neq \mu_{noDrug.yesDiet} \neq \mu_{yesDrug.noDiet} \neq \mu_{yesDrug.yesDiet}\)` --- ### The Drug and Diet Example Modified <table> <thead> <tr> <th style="text-align:right;"> No Diet, No Drug </th> <th style="text-align:right;"> Diet, No Drug </th> <th style="text-align:right;"> No Diet, Drug </th> <th style="text-align:right;"> Both Diet & Drug </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 185 </td> <td style="text-align:right;"> 188 </td> <td style="text-align:right;"> 171 </td> <td style="text-align:right;"> 153 </td> </tr> <tr> <td style="text-align:right;"> 190 </td> <td style="text-align:right;"> 183 </td> <td style="text-align:right;"> 176 </td> <td style="text-align:right;"> 163 </td> </tr> <tr> <td style="text-align:right;"> 195 </td> <td style="text-align:right;"> 198 </td> <td style="text-align:right;"> 181 </td> <td style="text-align:right;"> 173 </td> </tr> <tr> <td style="text-align:right;"> 200 </td> <td style="text-align:right;"> 178 </td> <td style="text-align:right;"> 166 </td> <td style="text-align:right;"> 178 </td> </tr> <tr> <td style="text-align:right;"> 180 </td> <td style="text-align:right;"> 193 </td> <td style="text-align:right;"> 161 </td> <td style="text-align:right;"> 168 </td> </tr> </tbody> </table> > Does Diet have a direct effect? What about Drug? Is there an interaction effect of Diet and Drug? --- ### Pedagogy, Subject, and Learning <table> <thead> <tr> <th style="text-align:right;"> Statistics/Standard </th> <th style="text-align:right;"> English/Standard </th> <th style="text-align:right;"> History/Standard </th> <th style="text-align:right;"> Statistics/Computer </th> <th style="text-align:right;"> English/Computer </th> <th style="text-align:right;"> History/Computer </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 44 </td> <td style="text-align:right;"> 47 </td> <td style="text-align:right;"> 46 </td> <td style="text-align:right;"> 53 </td> <td style="text-align:right;"> 13 </td> <td style="text-align:right;"> 45 </td> </tr> <tr> <td style="text-align:right;"> 18 </td> <td style="text-align:right;"> 37 </td> <td style="text-align:right;"> 21 </td> <td style="text-align:right;"> 42 </td> <td style="text-align:right;"> 10 </td> <td style="text-align:right;"> 36 </td> </tr> <tr> <td style="text-align:right;"> 48 </td> <td style="text-align:right;"> 42 </td> <td style="text-align:right;"> 40 </td> <td style="text-align:right;"> 49 </td> <td style="text-align:right;"> 16 </td> <td style="text-align:right;"> 41 </td> </tr> <tr> <td style="text-align:right;"> 32 </td> <td style="text-align:right;"> 42 </td> <td style="text-align:right;"> 30 </td> <td style="text-align:right;"> 51 </td> <td style="text-align:right;"> 11 </td> <td style="text-align:right;"> 35 </td> </tr> <tr> <td style="text-align:right;"> 35 </td> <td style="text-align:right;"> 39 </td> <td style="text-align:right;"> 29 </td> <td style="text-align:right;"> 47 </td> <td style="text-align:right;"> 16 </td> <td style="text-align:right;"> 38 </td> </tr> <tr> <td style="text-align:right;"> 27 </td> <td style="text-align:right;"> 33 </td> <td style="text-align:right;"> 20 </td> <td style="text-align:right;"> 34 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 33 </td> </tr> </tbody> </table>