Chapter 10: The Normal Distribution

Working with the Standard Normal Distribution
Putting the Standard Normal Distribution to Good Use
The Standard Error
The Normal Approximation to the Binomial

Working with the Standard Normal Distribution

\(z-scores\) are essentially standardized distributions, standardized in the sense that each value of \(Y\) is converted to its position relative to the mean \((\mu)\), and divided by the standard deviation \((\sigma)\). Formally, \(z = \dfrac{Y - \mu}{\sigma}\). Since for all \(z-score\) distributions the mean is 0 and the standard deviation is 1, we can easily figure out how much of the area under the curve lies above or below a specific \(z-score\), as well as between two \(z-scores\). For example, if we want to know the area (i) above a \(z=0\), (ii) above a \(z=2.11\), or (iii) below a \(z=-0.33\), we can calculate these as follows:

pnorm(0, lower.tail=FALSE) # since I need the area above

## [1] 0.5

pnorm(2.11, lower.tail=FALSE) # since I need the area above

## [1] 0.01742918

pnorm(-0.33, lower.tail=TRUE) # since I need the area below

## [1] 0.3707

What about the area between \(-0.33 \leq z \leq 2.11\)? Between \(-2.11 \leq z \leq -0.33\)?

pnorm(-0.33, lower.tail=FALSE) - pnorm(2.11, lower.tail=FALSE)

## [1] 0.6118708

pnorm(-2.11, lower.tail=FALSE) - pnorm(-0.33, lower.tail=FALSE)

## [1] 0.3532708

What if, however, we were asked to work backwards, say, asked to find the \(z-score\) that leaves a particular area above it or below it? For example, we were told to find a \(z-score\) that leaves 0.39 of the area under the curve above it? Or the \(z-score\) that leaves 0.11 of the area under the curve below it?

qnorm(0.39, lower.tail=FALSE) # focus on area above

## [1] 0.279319

qnorm(0.11, lower.tail=TRUE) # focus on area below

## [1] -1.226528

So we can easily shift from area to \(z-score\) or the other way around via pnorm and qnorm.

Putting the Standard Normal Distribution to Good Use

NASA excludes anyone under 62 inches in height and anyone over 75 inches in height from being an astronaut pilot. In metric units these cutoffs are 157.5 cm and 190.5 cm, respectively. Assume that heights are distributed with means and standard deviations of 177.6 cm and 9.7 cm for 20-29 year-old men, and 163.2 cm and 10.1 cm for 20-29 year-old women. What proportion of men and women in these age groups would be excluded from being NASA astronaut pilots?

First, we flip the qualifying cutoffs into \(z-scores\) for men and women, respectively.

z.LowMen = (157.5 - 177.6)/9.7; z.LowMen

## [1] -2.072165

z.HighMen = (190.5 - 177.6)/9.7; z.HighMen

## [1] 1.329897

z.LowWomen = (157.5 - 163.2)/10.1; z.LowWomen

## [1] -0.5643564

z.HighWomen = (190.5 - 163.2)/10.1; z.HighWomen

## [1] 2.70297

Now we can figure out what proportion of the distribution of men and women fall outside these cutoffs:

pnorm(z.LowMen, lower.tail=TRUE) + pnorm(z.HighMen, lower.tail=FALSE)

## [1] 0.1109012

pnorm(z.LowWomen, lower.tail=TRUE) + pnorm(z.HighWomen, lower.tail=FALSE)

## [1] 0.2896919

In essence, then, some 11.09% of 20-29 year-old men and 28.96% of 20-29 year-old women will be excluded from being NASA astronaut pilots.

Problem 4

It was rumored that MI5 has an upper limit on the height of its spies; you had to be no taller than 180.3 cm if a man and no taller than 172.7 cm if a woman. If heights of British men and women are \(\sim N(177;7.1)\) and \(\sim N(163.3; 6.4)\), respectively.

What fraction of men would be excluded?

height.men.mu = 177
height.men.sd = 7.1
max.height.men = 180.3
height.z = (max.height.men - height.men.mu)/height.men.sd
round(height.z, digits=2)

## [1] 0.46

p.value = pnorm(0.46, 0, 1, lower.tail=FALSE)
round(p.value, digits=4)

## [1] 0.3228

Some 32.28% would be excluded.

What fraction of women meet the height requirement for application to MI5?

height.women.mu = 163.3
height.women.sd = 6.4
max.height.women = 172.7
height.z = (max.height.women - height.women.mu)/height.women.sd
round(height.z, digits=2)

## [1] 1.47

p.value = pnorm(1.47, 0, 1, lower.tail=TRUE)
round(p.value, digits=4)

## [1] 0.9292

Some 92.92% of women meet the height requirement.

Sean Connery is 183.4 cm tall. By how many standard deviations does he exceed the height limit for MI5?

james.bond = 183.4
z.james.bond = (james.bond - max.height.men)/height.men.sd
round(z.james.bond, digits=2)

## [1] 0.44

He exceeds the limit by 0.44 standard deviation units.

The Standard Error

If we have a sample to work with then we use the standard error \(\left(\sigma_{\bar{Y}} = \dfrac{\sigma}{\sqrt{n}}\right)\) instead of the standard deviation. Problem 6(e) shows this in action.

Problem 6

Babies born in singleton births in the United States have birth weights (in kilograms) that are \(\sim N(3.296; 0.560)\).

What is the probability of a baby weighing more than 5 kg at birth?

mu = 3.296
sd = 0.560
x = 5
baby.z = (x - mu)/sd
round(baby.z, digits=2)

## [1] 3.04

pnorm(3.04, lower.tail=FALSE)

## [1] 0.001182891

About 0.1182% of babies will be born with birth weights greater than 5 kg.

What is the probability of the baby weighing betwen 3 and 4 kg?

z.3 = (3 - mu)/sd
z.4 = (4 - mu)/sd 
p.value = pnorm(z.3, lower.tail=FALSE) - pnorm(z.4,lower.tail=FALSE)
p.value

## [1] 0.5970977

About 59.70% of babies will fall within these limits.

What fraction of babies is more than 1.5 standard deviations from the mean in either direction?

In essence they are asking about the area above/below \(z=\pm 1.5\).

pnorm(1.5, lower.tail=FALSE) + pnorm(-1.5, lower.tail=TRUE)

## [1] 0.1336144

Some 13.36% of babies will have birth weights more than 1.5 standard deviation units either side of the mean.

What fraction of the babies is more than 1.5 kg from the mean in either direction?

z.1.5 = 1.5/sd; z.1.5

## [1] 2.678571

pnorm(z.1.5, lower.tail=FALSE) + pnorm(-z.1.5, lower.tail=TRUE)

## [1] 0.007393696

So 0.73% of babies will weigh more than 1.5 kg from the mean in either direction.

If you took a random sample of 10 babies, what is the probability that their mean weight \((\bar{Y})\) would be greater than 3.5 kg?

Now we’ll have to work with the standard error so let us calculate it first.

n = 10
se = sd/sqrt(n); se # for use in calculating the z-score

## [1] 0.1770875

z = (3.5 - mu)/se; z

## [1] 1.151973

pnorm(z, lower.tail=FALSE)

## [1] 0.1246662

In a sample of 10 babies the probability of their birth weight exceeding 3.5 kg is 0.1246.

The Normal Approximation to the Binomial

If \(n\) is large the binomial distribution can be approximated by a normal distribution with \(\mu=np\) and \(\sigma=\sqrt{np\left(1 - p\right)}\). Note that it makes little sense to do this today given the computing power at your fingertips. Nevertheless, the example below should demonstrate the process.

Problem 12

This is the same data-set about cancer among the cast and crew who worked on The Conqueror where 91 out of 220 were eventually diagnosed with cancer. The population proportion is 14%.

n = 220; p=0.14
np = n*p; np

## [1] 30.8

sigma = sqrt(np * (1 - p)); sigma

## [1] 5.146649

z = (91 - np)/sigma
2 * pnorm(z, lower.tail=FALSE)

## [1] 1.321476e-31

The \(p.value\) is practically 0 so we can reject the null hypothesis; the data suggest that the cancer rate for this cast and crew was different from that expected.