Statistical Tests and Basic ModelingProfessor Ruhil   
`         

Agenda

Review of hypothesis testing
Overview of some key statistical tests
- t-tests
  - one-sample
  - two-sample (aka independent samples)
  - paired
- $χ^{2}$ (chi-square)
Overview of some basic regression models
- linear regression $(y is continuous)$
- logistic regression $(y is binary)$

Hypothesis Testing   
`         

Hypothesis testing is an inferential procedure that uses sample data to evaluate the credibility of a hypothesis about a population parameter. The process involves ...

Stating a hypothesis: an assumption that can neither be fully proven nor fully disproven. For example,
- Not more than 5% of GM trucks breakdown in under 10,000 miles
- Heights of North American adult males is distributed with $μ = 72$ inches
- Mean year-round temperature in Athens (OH) is $> 62$
- 10% of Ohio teachers are Accomplished
- Mean county unemployment rate is 12.1%
Drawing a sample to test the hypothesis
Conducting the test itself to see if the hypothesis should be rejected

The `Null` and the `Alternative` Hypotheses

Null Hypothesis: $(H_{0})$ is the assumption believed to be true
Alternative Hypothesis: $(H_{a})$ is the statement believed to be true if $(H_{0})$ is rejected

$H_{0}$ : $μ > 72$ inches; $H_{1}$ : $μ \leq 72$

$H_{0}$ : $μ < 72$ inches; $H_{1}$ : $μ \geq 72$

$H_{0}$ : $μ \leq 72$ inches; $H_{1}$ : $μ > 72$

$H_{0}$ : $μ \geq 72$ inches; $H_{1}$ : $μ < 72$

$H_{0}$ : $μ = 72$ inches; $H_{1}$ : $μ \neq 72$

$H_{0}$ and $H_{1}$ are mutually exclusive and mutually exhaustive

Mutually Exclusive: Either $H_{0}$ or $H_{1}$ is True $\dots$ both cannot be true at the same time
Mutually Exhaustive: $H_{0}$ and $H_{1}$ exhaust the Sample Space $\dots$ ; there are no other possibilities unknown to us

`Type I and Type II Errors`

Decision based on Sample	Null is true	Null is false

Reject the Null	Type I error	No error
Do not reject the Null	No error	Type II error

Type I Error: Rejecting the Null hypothesis $H_{0}$ when $H_{0}$ is true

i.e., we should not have rejected the Null

Type II Error: Failing to reject the Null hypothesis $H_{0}$ when $H_{0}$ is false

i.e., we should have rejected the Null

The probability of committing a Type I error $= Level of Significance = α$
The probability of committing a Type II error $= Level of Significance = β$
The power of the test is $(1 - β)$

We have to decide how often we want to make a Type I error (i.e., falsely Reject $H_{0}$ ). Conventionally we set this rate to one of the following $α$ values:

$α = 0.05 or α = 0.01$

Note the very cautious language ... Reject $H_{0}$ versus Do Not Reject $H_{0}$

The Process of Hypothesis Testing: One Example

Assume we want to know whether the roundabout on SR682 has had an impact on traffic accidents in Athens. We have historical data on the number of accidents in years past. Say the average per day used to be 6 (i.e., $μ_{0} = 6$ ). To see if the roundabout has had an impact we could gather accident data for a random sample of 100 days $(n = 100)$ from the period after the roundabout was built.

Before we do that though, we will need to specify our hypotheses. What do we think might be the impact? Let us say the City Engineer argues that the roundabout should have decreased accidents.

If he is correct then the sample mean $\bar{x}$ should be less than the population mean $μ_{0}$ i.e., $\bar{x} < μ_{0}$
If he is wrong then the sample mean $\bar{x}$ should be at least as much as the population mean $μ_{0}$ i.e., $\bar{x} \geq μ_{0}$

The Sampling Distribution of $\bar{x}$

We know from the theory of sampling distributions that the distribution of sample means, for all samples of size $n$ , will be normally distributed (as shown below)

Most samples would be in the middle of the distribution but by sheer chance we could end up with a sample mean in the tails. This will happen with a very small probability but it could happen!!

Scenario 1: Engineer says $\bar{x} < μ$

If we believe the City Engineer, we would setup the hypotheses as follows:

$H_{0}$ : The roundabout does not reduce accidents, i.e., $μ \geq μ_{0}$
$H_{1}$ : The roundabout does reduce accidents, i.e., $μ < μ_{0}$
Next set $α$ (the probability of making a Type I error)
We then calculate the sample mean $(\bar{x})$ and the sample standard deviation $(s)$
Next we calculate the standard error of the sample mean: $s_{\bar{x}} = \frac{s}{\sqrt{n}}$
Now we calculate $t = \frac{\bar{x} - μ_{0}}{s_{\bar{x}}}$ and this is what we call $t_{c a l c u l a t e d}$
Using $d f = n - 1$ , find the area to the left of $t_{c a l c u l a t e d}$
If this area is very small we can conclude that the roundabout must have worked to reduce accidents
How should we define very small? By setting $α$ either to 0.05 or to 0.01

We Reject $H_{0}$ if $P (t_{c a l c u l a t e d}) \leq α$ ; the data provide sufficient evidence to conclude that the roundabout has reduced accidents

If $P (t_{c a l c u l a t e d}) > α$ then we Fail to reject $H_{0}$ ; the data provide insufficient evidence to conclude that the roundabout has reduced accidents

Rejection rule

Reject $H_{0}$ if calculated $t$ falls in the green region (i.e., calculated $t \leq - 1.6603$ )

Failure to reject rule

Do Not Reject $H_{0}$ if calculated $t$ falls in the grey region (i.e., $t_{c a l c u l a t e d} > - 1.6603$ )

Scenario 2: Engineer says $\bar{x} \neq μ$

If we believe the City Engineer, we would setup the hypotheses as follows:

$H_{0}$ : The roundabout has no impact on accidents, i.e., $μ = μ_{0}$
$H_{1}$ : The roundabout has an impact on accidents, i.e., $μ \neq μ_{0}$
We then calculate the sample mean $(\bar{x})$ and the sample standard deviation $(s)$
Next we calculate the standard error of the sample mean: $s_{\bar{x}} = \frac{s}{\sqrt{n}}$
Now calculate $t_{c a l c u l a t e d} = \frac{\bar{x} - μ_{0}}{s_{\bar{x}}}$
Using $d f = n - 1$ , find the area to the left/right of $\pm t_{c a l c u l a t e d}$

If this area is very small then we can conclude that the roundabout must have worked to reduce accidents

How should we define very small? By setting $α$ either to 0.05 or to 0.01

We can then Reject $H_{0}$ if $P (\pm t_{c a l c u l a t e d}) \leq α$ ; the data provide sufficient evidence to conclude that the roundabout has reduced accidents

If $P (\pm t_{c a l c u l a t e d}) > α$ then we will Fail to reject $H_{0}$ ; the data provide insufficient evidence to conclude that the roundabout has reduced accidents

Rejection rule

Reject $H_{0}$ if calculated $| t |$ falls in the green region (i.e., calculated $t \leq - 1.98$ or calculated $t \geq 1.98$ )

Failure to reject rule

Do Not Reject $H_{0}$ if calculated $| t |$ falls in the grey region (i.e., $- 1.98 < calculated t < 1.98$ )

The process revisited ...

State the hypotheses
- If we want to test whether something has changed, or is different, or had an impact, etc. then $H_{0}$ must specify that nothing has changed $\dots H_{0} : μ = μ_{0}; H_{1} : μ \neq μ_{0} \dots$ two-tailed
- If we want to test whether something has increased, or has risen, or is more then $H_{0}$ must specify that it has not increased $\dots H_{0} : μ \leq μ_{0}; H_{1} : μ > μ_{0} \dots$ one-tailed
- If we want to test whether something has decreased, or has reduced, or is less then $H_{0}$ must specify that it has not decreased $\dots H_{0} : μ \geq μ_{0}; H_{1} : μ < μ_{0} \dots$ one-tailed
Collect the sample and set $α = 0.05$ or $α = 0.01$
Calculate $s_{\bar{x}} = \frac{s}{\sqrt{n}}$ , $\bar{x}$ , $d f = n - 1$
Calculate the $t$
Reject $H_{0}$ if calculated $t$ falls in the critical region; do not reject $H_{0}$ otherwise

But of course ... beware Type I and Type II errors

Type I Error: You rejected $H_{0}$ but it should not have been rejected $(level of significance = α)$
Type II Error: You failed to reject $H_{0}$ but it should have been rejected

Continuous y variable   
`         

Example 1 (One-sample t-test)

[From Harrell & Slaughter] We want to test if the mean tumor volume is 190 $m m^{3}$ in a population with melanoma

$H_{0} : μ_{0} = 190$ versus $H_{1} : μ_{0} \neq 190$

$\bar{x} = 181.52, s = 40, n = 100, μ_{0} = 190$

$s_{\bar{x}} = \frac{s}{\sqrt{n}} = \frac{40}{\sqrt{100}} = 4$

$t = \frac{\bar{x} - μ_{0}}{s_{\bar{x}}} = \frac{181.52 - 190}{4} = - 2.12$

$p - v a l u e = 0.037$ , leading us to reject $H_{0}$ if $α = 0.05$

The data do not conform to the pattern predicted by the null hypothesis.

Example 2 (Paired t-test)

[From Harrell & Slaughter] To investigate the relationship between smoking and bone mineral density, Rosner presented a paired analysis in which each person had a nearly perfect control which was his or her twin. Data were normalized by dividing differences by the mean density in the twin pair. Computed density in heavier smoking twin minus density in lighter smoking one.

Mean difference was -5% and standard error was 2%, with $n = 41$

$H_{0} : mean difference is μ_{0} = 0$ versus $H_{1} : mean difference is μ_{0} \neq 0$

$t = \frac{- 5 - 0}{2} = - 2.5$

$p - v a l u e = 0.0166$

The data do not conform to the pattern predicted by the null hypothesis.

Comparisons of Means from Common Parent Populations

Comparisons of Means from Different Parent Populations

Separate Parent Populations

We often need to compare sample means across two groups. For example, are average earnings the same for men and women in a specific occupation? Perhaps we suspect (a) women are underpaid or (generally) that (b) their salaries differ from those of men.

Let the population and sample means be $μ_{m}, μ_{w}$ and ${\bar{x}}_{m}, {\bar{x}}_{w}$ , respectively

(a) $H_{0} : μ_{m} \leq μ_{w}$ and $H_{1} : μ_{m} > μ_{w}$ , $∴ H_{0} : μ_{m} - μ_{w} \leq 0$ and $H_{1} : μ_{m} - μ_{w} > 0$

(b) $H_{0} : μ_{m} = μ_{w}$ and $H_{1} : μ_{m} \neq μ_{w}$ , $∴ H_{0} : μ_{m} - μ_{w} = 0$ and $H_{1} : μ_{m} - μ_{w} \neq 0$

Standard Error of the difference in means: $s_{{\bar{x}}_{m} - {\bar{x}}_{w}} = \sqrt{\frac{s_{m}^{2}}{n_{m}} + \frac{s_{w}^{2}}{n_{w}}}$

Confidence Interval estimate: $({\bar{x}}_{m} - {\bar{x}}_{w}) \pm t_{α / 2} (s_{{\bar{x}}_{m} - {\bar{x}}_{w}})$

The Test Statistic: $t = \frac{({\bar{x}}_{m} - {\bar{x}}_{w}) - (μ_{m} - μ_{w})}{\sqrt{\frac{s_{m}^{2}}{n_{m}} + \frac{s_{w}^{2}}{n_{w}}}} = \frac{({\bar{x}}_{m} - {\bar{x}}_{w}) - D_{0}}{\sqrt{\frac{s_{m}^{2}}{n_{m}} + \frac{s_{w}^{2}}{n_{w}}}}$

The degrees of freedom for this test: $d f = \frac{{(\frac{s_{m}^{2}}{n_{m}} + \frac{s_{w}^{2}}{n_{w}})}^{2}}{\frac{1}{(n_{m} - 1)} {(\frac{s_{m}^{2}}{n_{m}})}^{2} + \frac{1}{(n_{w} - 1)} {(\frac{s_{w}^{2}}{n_{w}})}^{2}}$

Note: We usually round down the $d f$ to the nearest integer

We have two ways of calculating the estimated standard error $(s_{{\bar{x}}_{1} - {\bar{x}}_{2}})$ and the degrees of freedom $d f$

(1) When the population variances are assumed unequal
(2) When the population variances are assumed equal

(1) Unequal Population Variances

Standard Error will be: $(s_{{\bar{x}}_{1} - {\bar{x}}_{2}}) = \sqrt{\frac{σ_{1}^{2}}{n_{1}} + \frac{σ_{2}^{2}}{n_{2}}}$

Degrees of Freedom will be: $d f = \frac{{(\frac{s_{m}^{2}}{n_{m}} + \frac{s_{w}^{2}}{n_{w}})}^{2}}{\frac{1}{(n_{m} - 1)} {(\frac{s_{m}^{2}}{n_{m}})}^{2} + \frac{1}{(n_{w} - 1)} {(\frac{s_{w}^{2}}{n_{w}})}^{2}}$

Use this when $n_{1}$ or $n_{2}$ are $< 30$ and
Either sample has a standard deviation at least twice that of the other sample

(2) Equal Population Variances

Standard Error will be: $(s_{{\bar{x}}_{1} - {\bar{x}}_{2}}) = \sqrt{\frac{n_{1} + n_{2}}{n 1 \times n_{2}}} \sqrt{\frac{(n_{1} - 1) s_{x_{1}}^{2} + (n_{2} - 1) s_{x_{2}}^{2}}{(n_{1} + n_{2}) - 2}}$

Degrees of Freedom will be: $d f = (n_{1} + n_{2}) - 2$

Use when the standard deviations are roughly equal, and
$n_{1}$ and $n_{2}$ $\geq 30$

Example 3 (Two-Sample t-test)

[From Harrell & Slaughter] Two soporific drugs to be tested, Durg 1 versus Drug 2. Which of these is more effective?

$H_{0} : μ_{1} = μ_{2}$ versus $H_{1} : μ_{1} \neq μ_{2}$

Assuming unequal variances: $t = - 1.8608, d f = 17.776, p - v a l u e = 0.07939$ so fail to reject $H_{0}$
Assuming equal variances: $t = - 1.8608, d f = 18, p - v a l u e = 0.07919$ so fail to reject $H_{0}$

When do these tests break down?(1) Biased samples(2) Non-normally distributed data (can test)(3) Insufficient power (calculate a priori)(4) Unequal variances (can test)   
`         

Binomial and Chi-Square Distributions   
`         

The Binomial

Given a certain number of independent trials $(n)$ with an identical probability $(p)$ of success $(X)$ in each trial we can easily calculate the probability of seeing a specific number of successes.

For example, if I flip a coin 10 times, where $X = H e a d$ with $p = 0.5$ then what is probability of seeing exactly 2 heads, exactly 4 heads, 7 heads, etc.? The answer is easily calculated as:

$P [X successes] = (\binom{n}{X}) p^{X} {(1 - p)}^{n - X} where (\binom{n}{x}) = \frac{n!}{X! (n - X)!} and n! = n \times (n - 1) \times (n - 2) \times \dots \times 2 \times 1$

Radiologists and Missing Sons

Assume that they are just as likely to have boys as girls. This then generates the following hypotheses:

$H_{0} : Radiologsts are just as likely to have sons as daughters (p = 0.5)$ $H_{1} : Radiologsts are not as likely to have sons as daughters (p \neq 0.5)$

Let $α = 0.05$

The $p - v a l u e = 0.005014$ so we can reject $H_{0}$ ; the data provide sufficient evidence to conclude that radiologists are not as likely to have sons as daughters.

What if we suspected, a priori that radiologists are less likely to have sons? In that case we would have done the following:

$H_{0} : Radiologsts are at least as likely to have sons as daughters (p \geq 0.5)$ $H_{A} : Radiologsts are less likely to have sons than daughters (p < 0.5)$

Again, the $p - v a l u e = 0.002507$ and we can easily reject $H_{0}$ ; the data provide sufficient evidence to conclude that radiologists are not at least as likely to have sons as daughters.

The $χ^{2}$

The $χ^{2}$ distribution is used with multinomial data (i.e., when the categorical variable has more than two categories) to test whether observed frequency counts differ from expected frequency counts.

The Goodness-of-Fit Test for a single variable

$H_{0}$ : Proportions are all the same $H_{A}$ : Proportions are \textit{not} all the same

$χ^{2} = \sum_{i} \frac{{(O b s e r v e d_{i} - E x p e c t e d_{i})}^{2}}{E x p e c t e d_{i}}$

$χ^{2}$ distributed with $(no. of categories - 1)$ degrees of freedom $(df)$

Reject $H_{0}$ if $p - v a l u e \leq α$ ; Do not reject $H_{0}$ otherwise

As $d f \to \infty$ you need a larger $χ^{2}$ to Reject $H_{0}$ at the same $α$

The plot below shows how the theoretical $χ^{2}$ distribution varies with the degrees of freedom. As the distribution shifts right the degrees of freedom are getting smaller. The first is for 3 degrees of freedom, which means we have a total of 4 categories, then we have 4 degrees of freedom (so 5 categories), then 5 degrees of freedom (so 6 categories), and finally 6 degrees of freedom (i.e., 7 categories). Note what happens; The more the degrees of freedom, the larger the $χ^{2}$ value needed to reject $H_{0}$ with $α = 0.05$ or $α = 0.01$ .

The test is built on two assumptions:

No category has expected frequencies $< 1$
No more than 20% of the categories should have expected frequencies $< 5$

An Example

Assume that there are four gourmet meats placed before 100 subjects. In a blind taste test each subject is asked to pick the item they liked the most. Do subjects exhibit indifference between the four items? If they do, then we would expect about 25% to pick item A, 25% to pick item B, 25% to pick item C, and 25% to pick item D. If $H_{0}$ were true then expected frequencies would be as follows for A, B, C, and D:

## [1] 25 25 25 25

Now assume that the 100 subjects actually indicated the following preferences for A, B, C, and D:

## [1] 30 10 40 20

Some of the difference between the observed frequencies and expected frequencies if $H_{0}$ were true (i.e., there were no clear preferences) could be by chance. Therefore we test whether the overall difference is enough to suggest that this could not happen by chance very often or if it could happen very often. In other words, do the data suggest that subjects do prefer some items over the others. We will set $α = 0.05$ and then conduct the test.

Carrying out the test

observed	expected
30	25
10	25
40	25
20	25

## 
##     Chi-squared test for given probabilities
## 
## data:  observed
## X-squared = 20, df = 3, p-value = 0.0001697

Given that the $p - v a l u e$ is less than $α = 0.05$ we can reject the $H_{0}$ that the items are equally preferred. That is, the data suggest that some items are preferable to others.

$χ^{2}$ test of Association\Independence

Often you are interested in testing for a relationship between two categorical variables

is smoking associated with lung cancer?
is a woman's educational attainment associated with fewer pregnancies?
does diabetes differ by race\ethnicity?

Calculate, for each cell in the contingency table, $\frac{(f_{i j} - e_{i j})^{2}}{e_{i j}}$

Add the resulting value over all cells. This yields $χ^{2} = \sum_{i} \sum_{j} \frac{(f_{i j} - e_{i j})^{2}}{e_{i j}}$

$χ^{2} \sim d f = (r - 1) (c - 1)$ where ... $r =$ number of rows, and $c =$ number of columns

If you have a $2 \times 2$ table or small samples then Fisher's Exact test may be preferable

Pigeons and Predation

It has been hypothesized that the white rump of pigeons serves to distract predators like the peregrine falcons, and therefore it may be an adaptation to reduce predation. To test this, researchers followed the fate of 203 pigeons, 101 with white rumps and 102 with blue rumps. Nine of the white-rumped birds and 92 of the blue-rumped birds were killed by falcons.

##           
##            blue white Sum
##   killed     92     9 101
##   survived   10    92 102
##   Sum       102   101 203

##           
##                  blue      white
##   killed   0.91089109 0.08910891
##   survived 0.09803922 0.90196078

The table suggests that 91% of the blue-rumped pigeons were killed versus only 9% of the white-rumped pigeons.

Do the two kinds of pigeons differ in their rate of capture by falcons? Carry out an appropriate test.

$H_{0} :$ Predation by falcons is independent of rump-color
$H_{A} :$ Predation by falcons is not independent of rump-color
$α = 0.05$

## 
##     Pearson's Chi-squared test
## 
## data:  tab.p
## X-squared = 134.13, df = 1, p-value < 2.2e-16

## 
##     Fisher's Exact Test for Count Data
## 
## data:  tab.p
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##   33.70502 270.81972
## sample estimates:
## odds ratio 
##   89.50586

Regardless of the test used, we can safely Reject $H_{0}$ given that the $p - v a l u e \approx 0$ . The data suggest that predation by falcons is not independent of rump-color

Coffee and Sex at Birth

Sex at Birth	Light	Regular	Dark	Total
Male	20	40	20	80
Female	30	30	10	70
Total	50	70	30	150

Research Question: Are coffee preferences independent of sex at birth (i.e., is there any association between coffee preferences and gender)?

$H_{0} :$ Coffee preference is independent of sex at birth $H_{1} :$ Coffee preference is not independent of sex at birth

For each cell in the contingency table, calculate

$e_{i j} = \frac{Row i Total \times Column j Total}{Sample Size}$

$e_{11} = \frac{(80) (50)}{150} = \frac{4000}{150} = 26.67$

$e_{12} = \frac{(80) (70)}{150} = \frac{5600}{150} = 37.33$

$e_{13} = \frac{(80) (30)}{150} = \frac{2400}{150} = 16.00$

$e_{21} = \frac{(70) (50)}{150} = \frac{3500}{150} = 23.33$

$e_{22} = \frac{(70) (70)}{150} = \frac{4900}{150} = 32.67$

$e_{23} = \frac{(70) (30)}{150} = \frac{2100}{150} = 14.00$

Sex at birth	Coffee	$f_{i}$	$e_{i}$	$(f_{i} - e_{i})$	$(f_{i} - e_{i})^{2}$	$(f_{i} - e_{i})^{2} / e_{i}$
Male	Light	20	26.67	-6.67	44.49	1.67
Male	Medium	40	37.33	2.67	7.13	0.19
Male	Dark	20	16.00	4.00	16.00	1.00
Female	Light	30	23.33	6.67	44.49	1.91
Female	Medium	30	32.67	-2.67	7.13	0.22
Female	Dark	10	14.00	-4.00	16.00	1.14
$χ^{2}$						6.13

$d f = (r - 1) (c - 1) = (2 - 1) (3 - 1) = (1) (2) = 2$

$p - v a l u e < 0.05$ ; Reject $H_{0}$

Coffee preferences and gender are not independent

Linear Regression   
`         

Understanding Infant Mortality (2014)

Can we predict infant mortality from income?

$y = α + β (x)$

$Infant Mortality = α + β (I n c o m e)$

$Infant Mortality = 53.23 - 0.0016 (I n c o m e)$

As Income increases by 1 US Dollar, Infant Mortality drops by 0.0016

If Income rises by 1000 US Dollars, Infant Mortality drops by $0.0016 (1000) = 1.6$

What if a country has Income of 20,000? What would be the predicted Infant Mortality?

$Infant Mortality = 53.23 - 0.0016 (I n c o m e) = 53.23 - 0.0016 (20000) = 53.23 - 32 = 21.23$

How good is this linear regression?

(1) Income is a significant predictor ... $p - v a l u e = 0.000000107$

(2) Adjusted $R^{2} = 0.1884$ ... This linear regression model explains about 18.84% of the variaton in infant mortality

(3) Root Mean Squared Error $= 35.09$ ... If we use this regression model to predict infant mortality, average prediction error would be $\pm 35.09$ infant deaths

High Adjusted $R^{2}$ , small Root Mean Squared Error, and statistically significant predictor variable is desirable

Improving the Linear Regression Model

$y = α + β_{1} (x_{1}) + β_{2} (x_{2}) + \dots + β_{k} (x_{k})$

$y = α + β_{1} (I n c o m e) + β_{2} (Female Youth Literacy Rate) + β_{3} (% in Urban Areas)$

## 
## Call:
## lm(formula = U5MR ~ Income + FemaleYouthLR + Urban, data = sowc)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -41.570 -13.036  -4.449   4.948  94.270 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    1.810e+02  1.011e+01  17.901   <2e-16 ***
## Income        -3.374e-04  2.411e-04  -1.399   0.1642    
## FemaleYouthLR -1.418e+00  1.236e-01 -11.467   <2e-16 ***
## Urban         -2.686e-01  1.213e-01  -2.214   0.0286 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 23.22 on 129 degrees of freedom
## Multiple R-squared:  0.6525,    Adjusted R-squared:  0.6444 
## F-statistic: 80.74 on 3 and 129 DF,  p-value: < 2.2e-16

Holding all else constant, as Female Youth Literacy Rate rises by a unit amount infant mortality drops by 1.418
Holding all else constant, as the % of the population living in Urban areas rises by a unit amount infant mortality drops by 0.02686
Income does not appear to have a statistically significant impact on infant mortality
The model "explains" 64.44% of the variation in infant mortality
Average prediction error would be 23.22

Potential Issues

Biased sample (analysis will be worthless)
Incorrect functional form $(y = α + β (\frac{1}{x^{3}}) but you fit y = α + β (x))$
Influential outliers (extremely unusual data point may influence the linear regression line)
Heteroscedastic errors (discretionary spending of poor families will have little variance while that of wealthy families will have more variance)
Correlated errors (people in the same poor neighborhood are more likely to share adverse health outcomes)
Measurement error (measurement error in the independent variables)
High Multicollinearity (income and discretionary spending tend to be highly correlated so you cannot control for one while increasing the other by a unit amount)
Non-normal distribution of errors (you forgot to include some independent variables or have the wrong functional form or both)

Binary y variable* Passenger survives (y=1)(y=1) or dies (y=0)(y=0)* Drug has an impact (y=1)(y=1) or it does not (y=0)(y=0)* Birth is of a girl (y=1)(y=1) or of a boy (y=0)(y=0)* Tumor has shrunk (y=1)(y=1) or has not shrunk (y=0)(y=0)   
`         

Logit (Logistic) and Probit Models

Goal is to model the probability of survival, drug impact, birth of a girl child, tumor shrinking, and so on

$Y_{i} \in {0, 1}, i = 1, \dots, N$
$P (Y_{i} = 1 | X_{i}) = Φ (\sum b_{k} X_{i k})$ ... Probit
$P (Y_{i} = 1 | X_{i}) = \frac{e x p (b_{k} X_{i k})}{1 + e x p (b_{k} X_{i k})}$ ... Logit
$Y_{1}, Y_{2}, \dots, Y_{N}$ are statistically independent
$X_{i k} s$ are not exactly or nearly linearly dependent

Show entries

Search:

	PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Ticket	Fare	Cabin	Embarked
1	1	0	3	Braund, Mr. Owen Harris	male	22	1	A/5 21171	7.25		S
2	2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Thayer)	female	38	1	PC 17599	71.2833	C85	C
3	3	1	3	Heikkinen, Miss. Laina	female	26	0	STON/O2. 3101282	7.925		S

Showing 1 to 3 of 891 entries

Previous1 2 3 4 5…297Next

## 
## Call:
## glm(formula = Survived ~ Fare, family = binomial(link = "logit"), 
##     data = mydf2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.4558  -0.8985  -0.8625   1.3461   1.5344  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.911664   0.095817  -9.515  < 2e-16 ***
## Fare         0.014741   0.002219   6.644 3.06e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1171.1  on 875  degrees of freedom
## Residual deviance: 1105.7  on 874  degrees of freedom
## AIC: 1109.7
## 
## Number of Fisher Scoring iterations: 4

## 
## Call:
## glm(formula = Survived ~ Fare + Female, family = binomial(link = "logit"), 
##     data = mydf2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1991  -0.6277  -0.5872   0.8123   1.9237  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.652604   0.148535   4.394 1.11e-05 ***
## Fare         0.011033   0.002289   4.820 1.44e-06 ***
## FemaleMale  -2.408824   0.170896 -14.095  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1171.07  on 875  degrees of freedom
## Residual deviance:  876.04  on 873  degrees of freedom
## AIC: 882.04
## 
## Number of Fisher Scoring iterations: 4

How good are the models?

$S u r v i v e d = f (F a r e)$

## Bootstrapped (25 reps) Confusion Matrix 
## 
## (entries are percentual average cell counts across resamples)
##  
##           Reference
## Prediction    0    1
##          0 56.5 29.9
##          1  4.1  9.5
##                             
##  Accuracy (average) : 0.6599

$S u r v i v e d = f (F a r e, F e m a l e)$

## Bootstrapped (25 reps) Confusion Matrix 
## 
## (entries are percentual average cell counts across resamples)
##  
##           Reference
## Prediction    0    1
##          0 51.4 11.7
##          1 10.2 26.8
##                             
##  Accuracy (average) : 0.7814

Story

Male's had a lower probability of survival than females, on average, and holding all else constant.
Wealthier passengers had a higher probability of survival than other passengers, on average, and holding all else constant

Reference individual is a woman who paid the Median Fare. Compared to this individual,

(a) Probability that the average Male who paid the same Median Fare of 14.4542 survived = 0.1660

(b) Probability that the average Female who paid the same Median Fare of 14.4542 survived = 0.6919

(c) Probability of survival was 0.6760 for the average Female who paid 7.91 $(Q_{1})$ and 0.7300 if she paid 31.00 $(Q_{3})$

(d) Probability of survival was 0.1561 for the average Male who paid $Q_{1}$ Fare and 0.1934 if he paid $Q_{3}$ Fare

Fare	Male Probability	Female Probability
7.91	0.1561	0.6760
14.45	0.1660	0.6910
31.00	0.1934	0.7300

The Chivalry of the Seas

Questions??

Find me at...

@aruhil
aniruhil.org
ruhil@ohio.edu

Agenda

Review of hypothesis testing

Overview of some key statistical tests

t-tests
- one-sample
- two-sample (aka independent samples)
- paired
$χ^{2}$ (chi-square)

Overview of some basic regression models

linear regression $(y is continuous)$
logistic regression $(y is binary)$

Help

Keyboard shortcuts

↑, ←, Pg Up, k

Go to previous slide

↓, →, Pg Dn, Space, j

Go to next slide

Home

Go to first slide

End

Go to last slide

Number + Return

Go to specific slide

b / m / f

Toggle blackout / mirrored / fullscreen mode

Clone slideshow

Toggle presenter mode

Restart the presentation timer

?, h

Toggle this help

Statistical Tests and Basic Modeling

Professor Ruhil

Agenda

Hypothesis Testing

The Null and the Alternative Hypotheses

Type I and Type II Errors

The Process of Hypothesis Testing: One Example

The Sampling Distribution of ¯xx¯

Scenario 1: Engineer says ¯x<μx¯<μ

Rejection rule

Failure to reject rule

Scenario 2: Engineer says ¯x≠μx¯≠μ

Rejection rule

Failure to reject rule

The process revisited ...

But of course ... beware Type I and Type II errors

Continuous y variable

Example 1 (One-sample t-test)

Example 2 (Paired t-test)

Comparisons of Means from Common Parent Populations

Comparisons of Means from Different Parent Populations

(1) Unequal Population Variances

(2) Equal Population Variances

Example 3 (Two-Sample t-test)

When do these tests break down?

(1) Biased samples

(2) Non-normally distributed data (can test)

(3) Insufficient power (calculate a priori)

(4) Unequal variances (can test)

Binomial and Chi-Square Distributions

The Binomial

Radiologists and Missing Sons

The χ2χ2

The Goodness-of-Fit Test for a single variable

An Example

Carrying out the test

χ2χ2 test of Association\Independence

Pigeons and Predation

Coffee and Sex at Birth

Linear Regression

Understanding Infant Mortality (2014)

Can we predict infant mortality from income?

Improving the Linear Regression Model

Potential Issues

Binary y variable

* Passenger survives (y=1)(y=1) or dies (y=0)(y=0)

* Drug has an impact (y=1)(y=1) or it does not (y=0)(y=0)

* Birth is of a girl (y=1)(y=1) or of a boy (y=0)(y=0)

* Tumor has shrunk (y=1)(y=1) or has not shrunk (y=0)(y=0)

Logit (Logistic) and Probit Models

How good are the models?

Story

The Chivalry of the Seas

Questions??

Find me at...

Agenda

Help

The `Null` and the `Alternative` Hypotheses

`Type I and Type II Errors`

The Sampling Distribution of $\bar{x}$

Scenario 1: Engineer says $\bar{x} < μ$

Scenario 2: Engineer says $\bar{x} \neq μ$

The $χ^{2}$

$χ^{2}$ test of Association\Independence

* Passenger survives $(y = 1)$ or dies $(y = 0)$

* Drug has an impact $(y = 1)$ or it does not $(y = 0)$

* Birth is of a girl $(y = 1)$ or of a boy $(y = 0)$

* Tumor has shrunk $(y = 1)$ or has not shrunk $(y = 0)$