Contingency Tables

Contingency analysis comes into use when we need to test for the independence of two categorical variables. For example, did Male passengers on the Titanic have the same survival chances as Female passengers? Is lung cancer independent of smoking?

In order to do such tests we often rely upon odds and odds ratios. The odds of an event are calculated as the probability of a “success” divided by the probability of a “failure”. For e.g., on average 51 boys are born in every 100 births, so the probability of any randomly chosen delivery being that of a boy is \(\dfrac{51}{100}=0.51\).

Likewise the probability of any randomly chosen delivery being that of a girl is \(\dfrac{49}{100}=0.49\)

Note: If \(p=\) probability of success; \(1-p\) = probability of failure, then the odds of success: \(O = \dfrac{p}{1-p}\). Since with a sample we estimate the odds we have \(\hat{O} = \dfrac{\hat{p}}{1-\hat{p}}\)

Aspirin and Cancer

The example below is the same one from the text.

Outcome Aspirin Placebo Total
Cancer 1438 1427 2865
No Cancer 18496 18515 37011
Total 19934 19942 39876

What were the odds of (i) not getting cancer, and (ii) getting cancer for the Aspirin group versus the Placebo group? These were:

p.ncAspirin = (18496/19934); p.ncAspirin
## [1] 0.9278619
p.cAspirin = (1438/19934); p.cAspirin
## [1] 0.07213806
Odds.ncAspirin = p.ncAspirin/p.cAspirin; Odds.ncAspirin
## [1] 12.86231
p.ncPlacebo = (18515/19942); p.ncPlacebo
## [1] 0.9284425
p.cPlacebo = (1427/19942); p.cPlacebo
## [1] 0.07155752
Odds.ncPlacebo = p.ncPlacebo/p.cPlacebo; Odds.ncPlacebo
## [1] 12.97477

So the odds-ratio of not getting cancer for the Aspirin versus the Placebo group is:

OR.Aspirin = Odds.ncAspirin/Odds.ncPlacebo; OR.Aspirin
## [1] 0.9913321

In plain words: The odds of not getting cancer were slightly less for the Aspirin group than for the Placebo group.

What is the range of values that the Odds and the Odds Ratios can assume?

Let \(p=\) the probability of success; \(1-p\) = probability of failure. Then, as previously defined, the odds of success are calculated as \(O = \dfrac{p}{1-p}\) and for two groups the odds ratio of success is \(O=\dfrac{O_1}{O_2}\). Table 1 (see below) maps out the minimum probability of success (\(=0\)) and the maximum probability of success (\(=1\)) for a single group. Note \(p=0\) means success never occurs and \(p=1\) means success always occurs, for this particular group. Expressing the probability as odds yields a corresponding range of values that are anchored below at \(0\) and above at \(\infty\) … these are the limits of the odds.

\(p\) \(1-p\) \(odds\)
0.00 1.00 0.00
0.10 0.90 0.11
0.20 0.80 0.25
0.30 0.70 0.43
0.40 0.60 0.67
0.50 0.50 1.00
0.60 0.40 1.50
0.70 0.30 2.33
0.80 0.20 4.00
0.90 0.10 9.00
1.00 0.00 \(\infty\)

The implication is that the odds follow an asymmetric, highly skewed distribution and hence require a transformation in order for us to come up with some estimate of the uncertainty surrounding the odds. The log-transformation achieves this by normalizing the distribution of the odds.

However, what would be the limits of odds ratios? The answer: It depends on what \(p\) is for any group. Let us see this with an example. Assume that the following fictional data represent passenger survival on the Titanic. Let us assume that there were a total of 100 passengers, 50 Male and 50 Female. Let us also assume that all Males and Females survived. What would the contingency table look like?

Sex Survived Died Total
Male 50 0 50
Female 50 0 50
Total 100 0 100

What then is the probability that a Male passenger survived? It is \(= \dfrac{50}{50} = 1\) and hence the probability that a Male passenger died is \(1 - \hat{p}_{male} = 1 - 1 = 0\). What about the Female passengers? Well, the probability that a Female passenger survived is also \(= \dfrac{50}{50} = 1\) and hence the probability that a female passenger died is \(1 - \hat{p}_{female} = 1 - 1 = 0\). Now let us calculate the odds of survival for Male passengers \(= \dfrac{1}{0} = \infty\) and likewise the odds of survival for Female passengers \(= \dfrac{1}{0} = \infty\).

So this tells us that one extreme situation may be where for each group the odds of survival are \(\infty\). When this happens the odds ratio will be \(\dfrac{\infty}{\infty}\), and this is undefined so we would be unable to calculate an odds ratio!! The R code and output below shows you what happens.

M1 = matrix(c(50, 50, 0, 0), nrow=2)
rownames(M1) = c("Male", "Female")
colnames(M1) = c("Survived", "Died")