In the short survey that some of you participated in when we were setting up this workshop, a number of you said you were interested in learning about how to conduct the usual battery of statistical tests in the course of evaluating a program. This is a quick overview of the necessary commands and, where feasible, tests for assumptions underlying the tests, etc. I use a very specific data-set that was part of a pretty famous experiment – Tennessee’s Project STAR. Basically it was a study of the effects of small class-size on student achievement in kindergarten through 3rd grade. In time students’ test records for grades 4 through 8, and then their high-school data were also gathered. More details of the study and the data files are available here. The data were will use are for grades KG through 3 and include only a subset of the variables originally gathered.

We may be interested in correlations between reading/mathematics scores in successive grade, as well as between reading and mathematics scores within a grade. Let us test some of these on a piecemeal basis. First up: reading scores.

```
cor.test(STAR$readk, STAR$read1,
method = "pearson", alternative = "two.sided", conf.level = 0.95)
```

```
Pearson's product-moment correlation
data: STAR$readk and STAR$read1
t = 47.376, df = 4009, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.5788848 0.6185798
sample estimates:
cor
0.5991003
```

```
cor.test(STAR$readk, STAR$mathk,
method = "pearson", alternative = "greater", conf.level = 0.95)
```

```
Pearson's product-moment correlation
data: STAR$readk and STAR$mathk
t = 77.426, df = 5784, p-value < 2.2e-16
alternative hypothesis: true correlation is greater than 0
95 percent confidence interval:
0.7026181 1.0000000
sample estimates:
cor
0.7134043
```

Say you have quite a few of these types of variables and want to check pairwise correlations. What would be a good way to do that? By using the `GGally`

package.

Note that I am only using reading scores here to keep the plot manageable. Note also that the plot shows you pairwise correlations with the usual asterisks denoting significance levels. You also see the unconditional kernel density of each score, and pairwise scatter-plots. I like `ggpairs(...)`

for exploratory data analysis because it gives you a lot of good information. If you want something fancier but less detailed like a correlogram, then you can play around with the code below; this might be good for inclusion in the final report to the client.

```
library(corrplot)
STAR[, c(8:15)] -> mydf # keep only reading and math scores, KG through grade 3
na.omit(mydf) -> mydf # drop any observation with missing data
cor(mydf) -> M
cor.mtest <- function(mat, ...) {
mat <- as.matrix(mat)
n <- ncol(mat)
p.mat <- matrix(NA, n, n)
diag(p.mat) <- 0
for (i in 1:(n - 1)) {
for (j in (i + 1):n) {
tmp <- cor.test(mat[, i], mat[, j], ...)
p.mat[i, j] <- p.mat[j, i] <- tmp$p.value
}
}
colnames(p.mat) <- rownames(p.mat) <- colnames(mat)
p.mat
}
# matrix of the p-value of the correlation
cor.mtest(mydf) -> p.mat
colorRampPalette(c("#BB4444", "#EE9988", "#FFFFFF", "#77AADD", "#4477AA")) -> col
```

```
corrplot(M, method = "color", col = col(200),
type = "upper", order = "hclust",
addCoef.col = "black", # Add coefficient of correlation
tl.col = "black", tl.srt = 45, #Text label color and rotation
# Combine with significance
p.mat = p.mat, sig.level = 0.01, insig = "blank",
# hide correlation coefficient on the principal diagonal
diag = FALSE
)
```

You may be looking to run t-tests, perhaps a one-sample test, a two-sample (aka independent samples) test with/without the assumption of equal variances, or even a paired t-test. These are all very straightforward to run in R. I will focus on 3rd grade mathematics scores.

Say our null hypothesis is that the mean `math3`

score is 650, i.e., \(\mu_0 = 650\), and we are running a two-tailed test with \(\alpha = 0.05\).

```
t.test(STAR$math3, mu = 650, alternative = "two.sided", conf.level = 0.95)
```

```
One Sample t-test
data: STAR$math3
t = -62.68, df = 6076, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 650
95 percent confidence interval:
616.9663 618.9699
sample estimates:
mean of x
617.9681
```

Instead, we may be curious about differences between male and female students. Let us stick with a two-tailed test of no difference, i.e., \(\mu_{male} - \mu_{female} = 0\), and with \(\alpha = 0.05\).

```
t.test(STAR$math3 ~ STAR$gender, alternative = "two.sided",
conf.level = 0.95, paired = FALSE, var.equal = FALSE)
```

```
Welch Two Sample t-test
data: STAR$math3 by STAR$gender
t = -0.43691, df = 6074.7, p-value = 0.6622
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.448014 1.555696
sample estimates:
mean in group male mean in group female
617.7512 618.1974
```

```
t.test(STAR$math3 ~ STAR$gender, alternative = "two.sided",
conf.level = 0.95, paired = FALSE, var.equal = TRUE)
```

```
Two Sample t-test
data: STAR$math3 by STAR$gender
t = -0.43632, df = 6075, p-value = 0.6626
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.450706 1.558389
sample estimates:
mean in group male mean in group female
617.7512 618.1974
```

What about a paired test of `mathk`

versus `math3`

?

```
t.test(STAR$math3, STAR$mathk, alternative = "two.sided",
conf.level = 0.95, paired = TRUE, var.equal = FALSE)
```

```
Paired t-test
data: STAR$math3 and STAR$mathk
t = 158.42, df = 2889, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
126.4175 129.5860
sample estimates:
mean of the differences
128.0017
```

If you are approaching these tests with an eye on the underlying assumptions of normality and/or equal variances, R has some easy ways to run these tests as well.

- Normality tests with small versus large sample sizes.

```
STAR %>%
sample_frac(0.1) -> starsm # drawing a small sample for Shapiro-Wilk
shapiro.test(starsm$read3) # needs smaller sample sizes
```

```
Shapiro-Wilk normality test
data: starsm$read3
W = 0.99065, p-value = 0.0007556
```

```
Anderson-Darling normality test
data: STAR$read3
A = 4.9323, p-value = 3.37e-12
```

- Quantile-quantile plots