+ - 0:00:00
Notes for current slide
Notes for next slide

Hypothesis Testing

MPA 6010

Ani Ruhil

1 / 63

Agenda

  1. The Logic of Hypothesis Testing

  2. One-tailed versus Two-tailed hypotheses

  3. One-Sample t-tests

  4. Two-group t-tests

  5. Paired t-tests

  6. Assumptions of t-tests

2 / 63

The Logic of Hypothesis Testing

3 / 63

Hypothesis testing is an inferential procedure that uses sample data to evaluate the credibility of a specific belief about a population parameter. The process involves ...

  1. Stating a hypothesis: an assumption that can neither be fully proven nor fully disproven. For example,

    • Not more than 5% of GM trucks breakdown in under 10,000 miles

    • Heights of North American adult males is distributed with μ=72 inches

    • Mean year-round temperature in Athens (OH) is 62

    • 10% of Ohio teachers are Accomplished

    • Mean county unemployment rate in Ohio is 4.1%

  2. Drawing a sample to test the hypothesis

  3. Conducting the test itself to see if the hypothesis should be rejected

4 / 63

The Null and the Alternative Hypotheses

  • Null Hypothesis (H0): is the assumption believed to be true

  • Alternative Hypothesis (H1): is the statement believed to be true if H0 is rejected

    H0: μ>72 inches; H1: μ72

    H0: μ<72 inches; H1: μ72

    H0: μ72 inches; H1: μ>72

    H0: μ72 inches; H1: μ<72

    H0: μ=72 inches; H1: μ72

H0 and H1 are mutually exclusive and mutually exhaustive

  • Mutually Exclusive: Either H0 or H1 is True both cannot be true at the same time

  • Mutually Exhaustive: H0 and H1 exhaust the Sample Space ; there are no other possibilities that could exist that we are unaware of

5 / 63

Type I and Type II Errors

Decision based on Sample
Null is true
Null is false
Reject the Null Type I error No error
Do not reject the Null No error Type II error

Type I Error: Rejecting the Null hypothesis H0 when H0 is true

i.e., we should not have rejected the Null hypothesis

Decision H0 is True H0 is False
Reject H0 Type I error (α) No error (1β)
Do not reject H0 No error (1α) Type II error (β)

Type II Error: Failing to reject the Null hypothesis H0 when H0 is false

i.e., we should have rejected the Null hypothesis

The probability of committing a Type I error = Level of Significance =α

We have to decide how often we are okay with committing a Type I error (i.e., falsely Reject H0). Conventionally Type I error rate is set to one of the following α values: α=0.05 or α=0.01

Note the very cautious language ... Reject H0 versus Do Not Reject H0

6 / 63

The Process of Hypothesis Testing: An Example

Historically, the standard pediatric vaccination schedule — covering diphtheria–tetanus–pertussis (DTaP), polio, Haemophilus influenzae type b (Hib), measles–mumps–rubella (MMR), pneumococcal, plus hepatitis B — became widely normalized and accepted as standard criteria for children to attend public schools in the late 20th century.

During the COVID-19 pandemic folk began wondering if vaccine hesitancy would explode to such an extent that more parent would start asking for vaccine exemptions for their children. Has this happened in Ohio? Are indeed more children attending kindergarten exempt from vaccines? Regardless of what we believe, perhaps the Governor would like us to carry out a test and see if exemptions have grown. How would we test this?

The first thing would be a starting point -- what was the exemption rate in 2018-19? Records show the rate was 2.9%. Now we can write our hypotheses.

H0:Exemption rates are the same or lower (μ02.9)

Since we are expected to test if exemption rates have grown the alternative hypothesis would be:

H1:Exemption rates have increased (μ0>2.9)

7 / 63

The Sampling Distribution of ˉx

We know from the theory of sampling distributions that the distribution of sample means, for all samples of size n, will be normally distributed (as shown below)

Most samples would be in the middle of the distribution but by sheer chance we could end up with a sample mean in the tails. This will happen with a very small probability but it could happen!!

8 / 63

For example, we could get a sample mean that when converted into its tscore via t=ˉxμsˉx, with the standard error is given by sˉx=sn could assume values of t=± Here are some t values the probability of seeing them with a sample size of 300 (i.e., n=300) and hence df=n1=3001=299.

Calculate the probabilities of each shaded region.

9 / 63

Calculate the probabilities of each shaded region.

10 / 63

Calculate the probabilities of each shaded region.

11 / 63

If sample means can fall anywhere in the distribution with varying probabilities, we have to establish some probability cutoff such that if the probability of drawing our sample mean meets or surpasses that cutoff we can say there is a very low probability of this occurring if H0 is true and hence it must be that exemptions have increased.

Conventionally we set this cutoff to be probabilities of 0.05 and 0.01, respectively. These are the areas shaded in green below:

12 / 63

We run an anonymous survey of public schools and out of a total of 300 responses (n=300), we see the average exemption rate to be 4.7% and a standard deviation of (s=0.212).

What is the tscore here?

First, the standard error sˉx=sn=0.212300=0.0122

Second, the tscore calculated as t=ˉxμ0sˉx=4.72.90.0122=1.80.0122=147.541

If nothing has changed and in reality in the population of kindergarteners only 2.9% are exempt, the probability of ending up with t=147.541 is practically 0 ... aka, highly unlikely to occur by chance.

Hence the only logical conclusion is to say, well, we reject the null hypothesis that exemption rates are 2.9% or less.

Formally, we set the following decision rules:

  • Reject the Null hypothesis if the probability of your calculated t is α
  • Do not reject the Null hypothesis if the probability of your calculated t is >α

α=0.05 or α=0.01

13 / 63

Critical Region vs Region of "Acceptance"

Region of Acceptance

Critical Region

14 / 63

But what if we had no specific guidance and wanted to simply test whether exemption rates have changed? Now we would have to allow our calculated tscore to be positive or negative.

H0:Exemption rates are 2.9% (μ0=2.9)

H1:Exemption rates are not 2.9% (μ02.9)

15 / 63

The process revisited ...

  1. State the hypotheses

    • If we want to test whether something has changed then H0 must specify that nothing has changed H0:μ=μ0;H1:μμ0 two-tailed
    • If we want to test whether something is different then H0 must specify that nothing is different H0:μ=μ0;H1:μμ0 two-tailed
    • If we want to test whether something had an impact then H0 must specify that it had no impact H0:μ=μ0;H1:μμ0 two-tailed
    • If we want to test whether something has increased then H0 must specify that it has not increased H0:μμ0;H1:μ>μ0 one-tailed
    • If we want to test whether something has decreased then H0 must specify that it has not decreased H0:μμ0;H1:μ<μ0 one-tailed
  2. Collect the sample and set α=0.05 or α=0.01

  3. Calculate sˉx=sn, ˉx, df=n1, and then t
  4. Reject H0 if calculated t falls in the critical region; do not reject H0 otherwise. This is the same as saying Reject H0 if p-value is α; do not reject H0 if p-value >α
16 / 63

Problem 1

Last year, Normal (IL) the city's motor pool maintained the city's fleet of vehicles at an average cost of 346 per car. This year Jack's Crash Shop is doing the maintenance. City notices that in a random sample of 36 cars fixed by Jack the mean repair cost is 330 with a standard deviation of 120. Is Jack's Crash Shop saving the City money?

H0:μ346 and H1:μ<346. Let us choose α=0.05. Note df=n1=361=35

sˉx=sn=12036=20, and hence t=ˉxμ0sˉx=33034620=1620=0.80

Fail to reject H0; the data suggest that Jack's prices may not differ from those predicted by the null hypothesis

17 / 63

Problem 2

Kramer's (TX) Police Chief learns that his staff clear 46.2% of all burglaries in the city. She wants to benchmark their performance and to do this she samples 10 other similar cities in Texas. She finds their numbers to be as follows:

Rate Rate
44.2 32.1
40.3 32.9
36.4 29.0
49.4 46.4
51.7 41.0

Is Kramer's clearance rate significantly different from those of other similar Texas cities?

H0:μ=46.2 and H1:μ46.2. Set α=0.05

Note df=n1=101=9, ˉx=40.34, and sˉx=2.4279

t=ˉxμ0sˉx=40.3446.22.4279=2.414

18 / 63

Reject H0; the data suggest that Kramer's clearance rate does not conform with that of other similar Texas cities.

19 / 63

Problem 3: Philanthropy

The Director of Philanthropy at the Fleckman Institute of the Arts is curious to assess the impact of this year's changes in federal tax laws on donations. Last year the average donation was 580. A random sample of 50 donors yields an average donation of 623.64 with a standard deviation of 84.27. Did the change in federal tax laws have an impact on donations? solve this on your own

20 / 63

Problem 4: Volunteering

Springdale University is concerned that student volunteer activity has decreased. Last year their students volunteered an average of 7.3 hours of community service per month. This year, a random sample of 75 student volunteers reveals an average of 7.07 hours per month with a standard deviation of 1.29 hours. Should the University be concerned? solve this on your own

21 / 63

Overlap Between Hypothesis Tests and Confidence Intervals

  • Calculate the 95% confidence interval for a sample mean ˉx

  • Note that in this confidence interval, α=0.05; α/2=0.025

  • Use the Test Statistic with α=0.05 to make a decision

  • Note the similarity?

22 / 63

Assumptions Underlying the ttest

  1. The data are independent within the sample and identically distributed

    • no clustering,
    • no serial dependence,
    • design quirks that tie observations together
  2. The tstatistic is approximately normally distributed

  3. The model is correctly specified

    • observations come from a population with a constant mean (the parameter you test),
    • there is finite variance
    • and measurements are unbiased and on a continuous scale
23 / 63

Testing Assumptions?

  1. Check your data sampling plan, the study design, and measurement

  2. Visual checks for Normality (Histograms, boxplots, QQ-plots)

  3. Formal tests of Normality

    • Small n: lean on Shapiro–Wilk (H0: Data come from a Normally distributed population) and QQ Plots; be wary of outliers.

    • Medium n (30–200): CLT helps; emphasize QQ Plots and tail fit.

    • Large n: tiny deviations will always “fail” tests; trust visual diagnostics and robustness.

If you see curved QQ lines, that is skew; S-shaped is heavy/light tails. Fix with transformations only if they clarify interpretation; otherwise prefer robust inference.

24 / 63

QQ Data (qqplot-data.sav)

Theoretical Values Sample Values
7.20 7.47
8.24 8.63
8.90 8.88
9.43 9.11
9.92 9.54
10.38 10.14
10.87 10.26
11.40 10.92
12.06 13.12
13.10 13.43

25 / 63

QQ Plots that hint at Non-Normality

26 / 63

QQ Plots that hint at Normality

27 / 63

Shapiro-Wilk Test and QQ Plot for Kramer (TX)

28 / 63

Beware of Normality Tests

pvalue=0.00008903; Reject H0

pvalue=0.2015; Do not Reject H0

29 / 63

What should I do if my data are Non-Normal?

  1. Consider eliminating outliers and then redo tests

  2. Consider transforming your data, starting with taking z-scores, for example

    • Right‑skewed, positive data: try square root, log, or Box–Cox, in that order. Add a small constant only when zeros exist.

    • Proportions near 0 or 1: use logit on bounded rates, or arcsine‑sqrt for binomial proportions; stabilize variance before modeling. [Outside the scope of this course]

    • Left‑skewed: reflect then apply a right‑skew fix (e.g., transform y=max(x)x, then take the log or the sqrt of y).

    • Heavy tails: consider rank‑based or robust methods instead of forcing normality; Huberization or winsorizing only if defensible. [Outside the scope of this course]

    • Multiplicative errors: log typically linearizes relationships and equalizes spread. [Outside the scope of this course]

  3. Switch to non-parametric statistical tests

30 / 63

Right-Skewed Example (right-skewed-data.sav)

  • Open data in SPSS

  • Look at key descriptive statistics and plots to see if there is any skew and if there is, is the measure right-skewed or left-skewed

Note: For skewness, now you have ...
H0: The data are not skewed (i.e., s=0);
H1: The data are skewed (i.e., s0)

  • If they are right-skewed then we need to transform the data via ...
    (a) Take the logarithm of the skewed variable
    (b) Take the square-root of the skewed variable

If your data include 0 values you need to add a very small constant, perhaps k=0.5, before you take the logarithm or the square-root

  • Check the key descriptive statistics and plots of the transformed variables to see if there is any skew.

  • If skewness persists, then try another transformation or switch to non-parametric tests

31 / 63

Left-Skewed Example (left-skewed-data.sav)

  • Open data in SPSS

  • Look at key descriptive statistics and plots to see if there is any skew and if there is, is the measure right-skewed or left-skewed

Note: For skewness, now you have ...
H0: The data are not skewed (i.e., s=0);
H1: The data are skewed (i.e., s0)

  • If they are left-skewed then we need to transform the data via ...
    (a) Create a right-skewed version via y1=max(x1)x1) and then take the logarithm of the skewed variable, and if this does not work,
    (b) Create a right-skewed version via y1=max(x1)x1) and then take the square-root of the skewed variable

If your data include 0 values you need to add a very small constant, perhaps k=0.5, before you take the logarithm or the square-root

  • Check the key descriptive statistics and plots of the transformed variables to see if there is any skew.

  • If skewness persists, then try another transformation or switch to non-parametric tests

32 / 63

Non-Parametric Tests

  • Non-parametric tests are used
    (a) when transformations do not work, or
    (b) the data represent ordinal categories (or are ranked data)

  • Called non-parametric because unlike, say, the t −test which requires some distributional assumption to be true (i.e., Normality) and involves parameters (i.e., the mean and the variance), these alternatives make no such assumptions and need no such parameters

  • They are more likely to lead to a Type II error so if the assumptions of parametric tests are met use parametric tests

33 / 63

The Sign & Wilcoxon Signed-Rank Tests (length-of-stay-data.sav)

  • Assumption: Random sample from a continuous distribution
  • Tests whether the Median equals a hypothesized value (the H0 value)
  • Scores above H0 value are marked +; scores below are marked -
  • Scores = to the Median are dropped
  • If H0 is correct, 50% of the scores should be “+” and 50% should be “-” ... essentially a Binomial test where p0=0.5

H0: Distribution is symmetric around p0=0.5
H1: Distribution is not symmetric around p00.5

  • A very weak test so use the Wilcoxon Signed-Rank Test if you need a non-parametric test

Sign Test: Analyze > Nonparametric Tests > Legacy Dialogs > Binomial...
Wilcoxon Signed-Rank Test: Analyze > Nonparametric Tests > One Sample...

For the Wilcoxon Signed-Rank Test you will have H0: The median of your variable is some value, i.e., H0:Md=Md0, for example, in our case, we could use H0:Md=5 or whatever you think is the median length of stay

34 / 63

Comparing Two Means

35 / 63

Comparisons of Means from Common Parent Population

Common Parent Population

Common Parent Population

36 / 63

Comparisons of Means from Different Parent Populations

Separate Parent Populations

Separate Parent Populations

37 / 63

We often need to compare sample means across two groups. For example, are average earnings the same for men and women in a specific occupation? Perhaps we suspect (a) women are underpaid or (generally) that (b) their salaries differ from those of men.

Let the population and sample means be μm,μw and ˉxm,ˉxw, respectively

(a) H0:μmμw and H1:μm>μw, H0:μmμw0 and H1:μmμw>0

(b) H0:μm=μw and H1:μmμw, H0:μmμw=0 and H1:μmμw0

Standard Error of the difference in means: sˉxmˉxw=s2mnm+s2wnw

Confidence Interval estimate: (ˉxmˉxw)±tα/2(sˉxmˉxw)

The Test Statistic: t=(ˉxmˉxw)(μmμw)s2mnm+s2wnw=(ˉxmˉxw)D0s2mnm+s2wnw

38 / 63

The degrees of freedom for this test: df=(s2mnm+s2wnw)21(nm1)(s2mnm)2+1(nw1)(s2wnw)2

Note: We usually round down the df to the nearest integer

We have two ways of calculating the estimated standard error (sˉx1ˉx2) and the degrees of freedom df

(1) When the population variances are assumed unequal

(2) When the population variances are assumed equal

39 / 63

(1) Unequal Population Variances

Standard Error will be: (sˉx1ˉx2)=σ21n1+σ22n2

Degrees of Freedom will be: df=(s2mnm+s2wnw)21(nm1)(s2mnm)2+1(nw1)(s2wnw)2

Rule of thumb ...

  • Use this when n1 or n2 are <30 and

  • Either sample has a standard deviation at least twice that of the other sample

40 / 63

(2) Equal Population Variances

Standard Error will be: (sˉx1ˉx2)=n1+n2n1×n2(n11)s2x1+(n21)s2x2(n1+n2)2

Degrees of Freedom will be: df=(n1+n2)2

Rule of thumb ...

  • Use this when the standard deviations are roughly equal, and

  • n1 and n2 30

41 / 63

Assumptions and Rules-of-thumb

Assumptions:

(1) Random samples representing each group, drawn from the population(s)

(2) Variables are drawn from normally distributed Populations

Rules-of-thumb:

  • Draw larger samples if you suspect the Population(s) may be skewed

  • Go with assumption of equal variances if both the following are met:
    (a) Assumption theoretically justified, standard deviations fairly close, &
    (b) n130 and n230

  • Go with assumption of unequal variances if both the following are met:
    (a) One standard deviation is at least twice the other standard deviation, &
    (b) n1<30 or n2<30

Of course, some statistical software packages (SPSS, for instance) will run the test under both assumptions so you can choose on the basis of the results (a bad idea in some eyes)

42 / 63

Testing Variances: Levene's Test for Homogeneity of Variances

  • Assumes roughly symmetric frequency distributions within all groups
  • Robust to violations of assumption
  • Can be used with 2 or more groups

H0:σ21=σ22=σ23=σ2k and HA: For at least one pair of (i,j) we have σ2iσ2j

Test Statistic: W=(Nk)ki=1ni(ˉZiˉZ)2(k1)ki=1nij=1(ZijˉZi)2

Zij=|YijˉYi|; ˉZi is the mean for all Y in the ith group; ˉZ is the mean for all Y in the study; k is the number of groups in the study; and ni is the sample size for group i

If you opt for the more robust version that uses the Median, then, Zij=|Yij˜Yi| where ˜Yi is median of ith group

WFα,k1,nk

43 / 63

Example 1

The Athens County Public Library is trying to keep its bookmobile alive since it reaches readers who otherwise may not use the library. One of the library employees decides to conduct an experiment, running advertisements in 50 areas served by the bookmobile and not running advertisements in 50 other areas also served by the bookmobile. After one month, circulation counts of books are calculated and mean circulation counts are found to be 526 books for the advertisement group with a standard deviation of 125 books and 475 books for the non-advertisement group with a standard deviation of 115 books. Is there a statistically significant difference in mean book circulation between the two groups?

Group Mean Std. Dev. Sample Size
Advertisement Group 526 125 50
Non-Advertisement Group 475 115 50

Since we are being asked to test for a "difference" it is a two-tailed test, with hypotheses given by:

H0: There is no difference in average circulation counts (μ1=μ2)H1: There is a difference in average circulation counts (μ1μ2)

44 / 63

Since both groups have sample sizes that exceed 30 we can proceed with the assumption of equal variances and calculate the standard error and the degrees of freedom. The degrees of freedom as easy: df=n1+n22=50+502=98. The standard error is sˉx1ˉx2=n1+n2n1n2(n11)s2x1+(n21)s2x2(n1+n2)2 and plugging in the values we have

sˉx1ˉx2=50+502500(501)(1252)+(501)(1152)(50+50)2=(0.2)(120.1041)=24.02082

The test statistic is

t=(ˉx1ˉx2)(μ1μ2)sˉx1ˉx2=(526475)024.02082=5124.02082=2.123158

45 / 63

Since no α is given let us use the conventional starting point of α=0.05

With df=98 and α=0.05, two-tailed, the critical t value would be ±1.98446745

Since our calculated t=2.1231 exceeds the critical t=1.9844, we can easily reject the null hypothesis of no difference

Conclusion: These data suggest there is a difference in average circulation counts between the advertisement and no advertisement groups

We could have also used the the p-value approach, rejecting the null hypothesis of no difference if the p-value was α

The p-value of our calculate t turns out to be 0.0363 and so we can reject the null hypothesis

Note, in passing, that had we used α=0.01 we would have been unable top reject the null hypothesis because 0.0363 is >0.01

46 / 63

The 95% confidence interval is given by

(ˉx1ˉx2)±tα/2(sˉx1ˉx2)=51±1.9844(24.02082)=51±47.66692=3.33308 and 98.66692

We can be about 95% confident that the true difference between the groups lies in this interval

Had we used the 99% interval for a test with α=0.01 the interval would be

51±2.627(24.02082)=12.10269 and 114.1027

subsuming the null hypothesis difference of 0 and leaving us unable to reject the null hypothesis.

47 / 63

Example 2 (mtcars-data.sav)

Say we have a large data-set with a variety of information about several cars, gathered in 1974. One of the questions we have been tasked with testing is whether the miles per gallon yield of manual transmission cars in 1974 was greater than that of automatic transmission cars. Assume they want us to use α=0.05.

We have 13 manual transmission cars and 19 automatic transmission cars, and the means and standard deviations are 24.3923 and 6.1665 for manual, and 17.1473 and 3.8339 for automatic cars, respectively. The hypotheses are:

H0:Mean mpg of manual cars is at most that of the mean mpg of automatic cars (ˉxmanualˉxautomatic)H1:Mean mpg of manual cars is greater than mean mpg of automatic cars (ˉxmanual>ˉxautomatic)

Group ni ˉxi si
Manual 13 24.3923 6.1665
Automatic 19 17.1473 3.8339
48 / 63

The calculated t is 4.1061 and the p-value is 0.0001425, allowing us to reject the null hypothesis

Conclusion: These data suggest that average mpg of automatic cars is not at most that of manual cars

Note a couple of things here:

(i) We have a one-tailed hypothesis test, and
(ii) we are assuming unequal variances since both conditions are not met for assuming equal variances

In addition, note that the confidence interval is found to be (3.6415,10.8483), indicating that we can be 95% confident that the average manual mpg is higher than average automatic mpg by anywhere between 3.64 mpg and 10.84 mpg

49 / 63

SPSS Example 01 (TN Project STAR data)

  • Carry out a test for whether there is a difference in reading scores of male versus female kindergarteners

  • Run the appropriate tests of assumptions, stating your conclusions

  • Conduct the t-test assuming equal variances and assuming unequal variances

  • What do you conclude?

  • Carry out a test for whether there is a difference in reading scores of third graders in regular classrooms vs small classrooms

  • Run the appropriate tests of assumptions, stating your conclusions

  • Conduct the t-test assuming equal variances and assuming unequal variances

  • What do you conclude?

50 / 63

SPSS Example 02 (Ohio Schools Data)

  • Carry out a test for whether there is a difference in the average Performance Index score of public versus charter schools

  • Run the appropriate tests of assumptions, stating your conclusions

  • Conduct the t-test assuming equal variances and assuming unequal variances

  • What do you conclude?

  • Identify the Median percent economically disadvantaged and then construct a new variable that takes on the value of 0 if the school's percentage is at or below this Median, and 1 if above. Label the values accordingly.

  • Now test whether the average Performance Index score of schools at or below the Median (in terms of economic disadvantage) is lower than that of schools above the Median

  • Run the appropriate tests of assumptions, stating your conclusions

  • Conduct the t-test assuming equal variances and assuming unequal variances

  • What do you conclude?

51 / 63

Nonparametric Test (Mann-Whitney U Test) assuming equal variances

The assumptions of the Mann-Whitney U test are:

  • The variable of interest is continuous (not discrete). The measurement scale is at least ordinal
  • The probability distributions of the two populations are identical, except for location (i.e., the “center”)
  • The two samples are independent
  • Both are simple random samples from their respective populations

H0: The samples come from populations with similar probability distributions

Test Process and Statistic ...

  • Combine both samples and rank, in ascending order, all values, and if there are ties, rank accordingly
  • Sum the ranks of the smaller group (R1)
  • U1=n1n2+n1(n1+1)2R1; U2=n1n2U1
  • Choose the larger of U1 or U2 as the test statistic

Analyze > Nonparametric Tests > Legacy Dialogs > 2 Independent Samples, choose Mann–Whitney U

52 / 63

Nonparametric Test (Kolmogorov-Smirnov Test) assuming unequal variances

  • This (very weak) test is used to compare the distributions of two groups by comparing the empirical cumulative distribution functions (ecdfs) of the two groups and finding the greatest absolute distance between the two

  • The ecd f is Fˆ(Y) = fraction of sample with values Yi, where i=1,2,3,,n

  • The K−S statistic is Dmax=|Fˆ1(Y)Fˆ2(Y)|

Assumptions:

  • The measurement scale is at least ordinal.
  • The probability distributions are continuous
  • The two samples are mutually independent
  • Both samples are simple random samples from their respective populations

H0:F1(Y)=F2(Y) for all Yi.
H1:F1(Y)F2(Y) for at least one Yi.

Analyze > Nonparametric Tests > Legacy Dialogs > 2 Independent Samples

53 / 63

Comparing Matched or Paired Means

54 / 63

Matched (aka Dependent or Paired) Samples

Sometimes you may have two sets of measures on the same units. Now

  • H0:μd=0;H1:μd0 or
  • H0:μd0;H1:μd<0 or
  • H0:μd0;H1:μd>0

ˉd=din

sd=(diˉd)2n1 and sˉd=sdn

Test Statistic: t=ˉdμdsˉd

With normally distributed population df=n1

Interval estimate: ˉd±tα/2(sˉd)

55 / 63

The Testing Protocol

Let us see how the test is carried out with reference to a small data-set wherein we have six pre-school children's scores on a vocabulary test before a reading program is introduced into the pre-school (x2) and then again after the reading program has been in place for a few months (x2).

Vocabulary Scores pre- and post-intervention
Child ID Post-intervention score Pre-intervention score Difference = Post - Pre
1 6.0 5.4 0.6
2 5.0 5.2 -0.2
3 7.0 6.5 0.5
4 6.2 5.9 0.3
5 6.0 6.0 0.0
6 6.4 5.8 0.6
56 / 63

Note the column di has the difference of the scores such that for Child 1, 6.05.4=0.6, for Child 2, 5.05.2=0.2, and so on.

Note also that the mean, variance and standard deviation of d are calculated as follows:

di=x1x2ˉd=dins2d=(diˉd)2n1sd=(diˉd)2n1

57 / 63

Say we have no idea what to expect from the program. In that case, our hypotheses would be:

H0:μd=0H1:μd0

The test statistic is given by t=ˉdμdsd/n;df=n1 and the interval estimate calculated as ˉd±tα/2(sdn).

Once we have specified our hypotheses, selected α, and calculated the test statistic, the usual decision rules apply: ...

  • Reject the null hypothesis if the calculated pvalueα
  • Do not reject the null hypothesis if the calculated pvalue>α

In this particular example, it turns out that ˉd=0.30; sd=0.335; t=2.1958, df=5, pvalue=0.07952 and 95% CI: 0.3±0.35=(0.0512,0.6512).

Given the large pvalue we fail to reject H0 and conclude that these data do not suggest a statistically significant impact of the reading program.

58 / 63

Example 3

Over the last decade, has poverty worsened in Ohio's public school districts? One way to test worsening poverty would be to compare the percent of children living below the poverty line in each school district across two time points. For the sake of convenience I will use two American Community Survey (ACS) data sets that measure Children Characteristics (Table S0901), one the 2011-2015 ACS and the other the 2006-2010 ACS. While a small snippet of the data are shown below for the 35 school districts with data for both years, you can download the full dataset from here.

Percent of Children in Poverty
District 2006-2010 2011-2015
Akron City School District, Ohio 35.3 41.0
Brunswick City School District, Ohio 6.8 7.6
Canton City School District, Ohio 44.1 49.6
Centerville City School District, Ohio 10.5 5.4
Cincinnati City School District, Ohio 39.5 43.0
Cleveland Municipal School District, Ohio 45.8 53.3
59 / 63

H0: Poverty has not worsened (d0)H1: Poverty has worsened (d>0)

Subtracting the 2006-2010 poverty rate from the 2011-2015 poverty rate for each district and then calculating the average difference (d) yields ˉd=4.328571 and sd=3.876746

With n=35 we have a standard error sˉd=sdn=3.87674635=0.6552897

The test statistic is t=ˉdsˉd=4.3285710.6552897=6.605584 and has a pvalue=0.0000001424, allowing us to easily reject the null hypothesis

... These data suggest that school district poverty has indeed worsened over the intervening time period. The 95% confidence interval is (2.9968 and 5.6602)

60 / 63

Example 4

A large urban school district in a Midwestern state implemented a reading intervention to boost the district's scores on the state's English Language Arts test. The intervention was motivated by poor performance of the district's 4th grade cohort. Three years had passed before that cohort was tested in the 8th grade. Did the intervention boost ELA scores, on average?

English Language Arts: Scaled scores, grades Three and Eight
Student ID Grade Scaled Score
AA0000001 3 583
AA0000002 3 583
AA0000003 3 583
AA0000004 3 668
AA0000005 3 627
AA0000006 3 617
61 / 63

H0: Intervention did not boost ELA scores (d0)H1: Intervention did boost ELA scores (d>0)

We have ˉd=14.62594, sd=66.27296, df=12955 and the standard error is 0.5822609

The test statistic then is t=ˉdsˉd=14.625940.5822609=25.11922 and has a pvalue that is very close to 0 and obviously way smaller than α=0.05 and α=0.01

Hence we can reject the null hypothesis; these data suggest that the reading intervention did indeed boost English Language Arts scores on average.

62 / 63

Testing Options and the Protocol

  • Data are coming from a paired design, use the two-sample ttest for paired samples

  • Data are coming from two unpaired groups, use the two-sample ttest with

    • the assumption of equal variances if n130 and n230 and s1s2
    • the assumption of unequal variances if n1<30 or n2<30 and si2(sj)
  • Use Levene's test for homogeneity of variances if the assumption of normality is not supported

  • Normality is not as big a deal since these tests are robust to small violations of normality, provided you have "large enough samples" and the skew is not extreme

  • How large is "large enough"?

    • If there is similar skewness, even when comparing two groups, having 30 in each group will work
    • If I have one group with severe skewness (one left-skewed the other right-skewed) then I need samples of a few hundred units each before the test can be trusted
63 / 63

Agenda

  1. The Logic of Hypothesis Testing

  2. One-tailed versus Two-tailed hypotheses

  3. One-Sample t-tests

  4. Two-group t-tests

  5. Paired t-tests

  6. Assumptions of t-tests

2 / 63
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow