Basic scatterplots via the plot() command

guppies = read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e3bGuppyFatherSonAttractiveness.csv")
summary(guppies)
##  fatherOrnamentation sonAttractiveness
##  Min.   :0.0300      Min.   :-0.3200  
##  1st Qu.:0.3275      1st Qu.: 0.1800  
##  Median :0.4550      Median : 0.4500  
##  Mean   :0.4908      Mean   : 0.4872  
##  3rd Qu.:0.6100      3rd Qu.: 0.8025  
##  Max.   :1.1200      Max.   : 1.2300

Is there a relationship between the father’s ornamentation and the son’s attractiveness? Both variables are numeric so we could plot the data and then run a correlation.

plot(guppies$fatherOrnamentation, guppies$sonAttractiveness, pch = 16, xlab = "Father's Ornamentation", ylab = "Son's Attractiveness", col = "tomato4")

library(car)
scatterplot(sonAttractiveness ~ fatherOrnamentation, data = guppies, ellipse = TRUE, levels = c(0.5, 0.95), reg.line = FALSE, smoother = FALSE, xlim = c(0, 1.2), ylim = c(-0.4, 1.5), xlab = "Father's Ornamentation", ylab = "Son's Attractiveness")
## Warning in plot.window(...): "levels" is not a graphical parameter
## Warning in plot.window(...): "reg.line" is not a graphical parameter
## Warning in plot.window(...): "smoother" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "levels" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "reg.line" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "smoother" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "levels" is
## not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "reg.line" is
## not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "smoother" is
## not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "levels" is
## not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "reg.line" is
## not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "smoother" is
## not a graphical parameter
## Warning in box(...): "levels" is not a graphical parameter
## Warning in box(...): "reg.line" is not a graphical parameter
## Warning in box(...): "smoother" is not a graphical parameter
## Warning in title(...): "levels" is not a graphical parameter
## Warning in title(...): "reg.line" is not a graphical parameter
## Warning in title(...): "smoother" is not a graphical parameter

Note the presence of outliers. Is this sufficient to reject the assumption of bivariate normality? Let us test this.

library(MVN)
mardiaTest(guppies, qqplot = FALSE)
## Warning: 'mardiaTest' is deprecated.
## Use 'mvn' instead.
## See help("Deprecated")

In this case we can stick with the Pearson correlation but if the p-value were small enough Spearman would have been appropriate.

cor(guppies, use = "complete.obs", method = "pearson")
##                     fatherOrnamentation sonAttractiveness
## fatherOrnamentation           1.0000000         0.6141043
## sonAttractiveness             0.6141043         1.0000000

As father’s ornamentation increases, so does the son’s attractiveness. The Spearman correlation is positive and estimated to be 0.6141.

Hypothesis Testing for the correlation coefficient

H0: Son’s attractiveness is not correlated with father’s ornamentation \((\rho = 0)\)
HA: Son’s attractiveness is correlated with father’s ornamentation \((\rho \neq 0)\)

cor.test(guppies$sonAttractiveness, guppies$fatherOrnamentation)
## 
##  Pearson's product-moment correlation
## 
## data:  guppies$sonAttractiveness and guppies$fatherOrnamentation
## t = 4.5371, df = 34, p-value = 6.784e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.3577455 0.7843860
## sample estimates:
##       cor 
## 0.6141043

The p-value is quite small so we can reject H0; there appears to be a significant positive correlation between father’s ornamentation and son’s attractiveness.

Rank Order Correlations

When one or both measures is not numeric and/or one or more measures is non-normally distributed, Pearson’s correlation coefficient is inappropriate. Rather, Spearman’s rank-order correlation is used. We see this in action here with the Indian Rope trick.

rope = read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter16/chap16e5IndianRopeTrick.csv")
summary(rope)
##      years       impressivenessScore
##  Min.   : 2.00   Min.   :1.000      
##  1st Qu.:17.00   1st Qu.:2.000      
##  Median :28.00   Median :4.000      
##  Mean   :27.29   Mean   :3.429      
##  3rd Qu.:39.00   3rd Qu.:4.000      
##  Max.   :50.00   Max.   :5.000
plot(rope$years, rope$impressivenessScore, pch=16, xlab = "Time Lapsed (in years) ", ylab = "Impressiveness Score", col = "cornflowerblue")

mardiaTest(rope, qqplot = FALSE)
## Warning: 'mardiaTest' is deprecated.
## Use 'mvn' instead.
## See help("Deprecated")

The multivariate normality test shows no violations so we could stick with Pearson if we wanted to. Since the text uses these data for Spearman’s I’ll show you that as well.

cor.test(rope$impressivenessScore, rope$years)
## 
##  Pearson's product-moment correlation
## 
## data:  rope$impressivenessScore and rope$years
## t = 6.9679, df = 19, p-value = 1.223e-06
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.6562731 0.9366690
## sample estimates:
##     cor 
## 0.84778
cor.test(rope$impressivenessScore, rope$years, method = "spearm")
## Warning in cor.test.default(rope$impressivenessScore, rope$years, method =
## "spearm"): Cannot compute exact p-value with ties
## 
##  Spearman's rank correlation rho
## 
## data:  rope$impressivenessScore and rope$years
## S = 332.12, p-value = 2.571e-05
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.7843363

Ties prevent the calculation of exact p-values so like in the past, we can switch to another package. Here I’ll use the coin library.

library(coin)
spearman_test(impressivenessScore ~ years, data = rope)
## 
##  Asymptotic Spearman Correlation Test
## 
## data:  impressivenessScore by years
## Z = 3.5077, p-value = 0.0004521
## alternative hypothesis: true rho is not equal to 0

The p-value is 0.0004 so we can reject the NULL; there appears to be a positive association between the impressiveness of eyewitness accounts and the years lapsed since the account was penned. Note also that regardless of whether we go with Pearson’s correlation or Spearman’s, we see a significant positive correlation between time lapsed and the impressiveness score.

Practice Problems

Problem 1

Birds of many species retain the same breeding partner year after year. In some of these species, male and female partners migrate separately and spend the winter in different places, often thousands of kilometers apart. Yet they manage to find one another again each spring. In a field study of individually banded pairs of black-tailed godwits, Gunnarsson et al. (2004) recorded spring arrival dates of males and females on the breeding grounds in the year after they were observed breeding together. The data for 10 pairs are provided in the data, with arrival date measured in days since March 31.

birds = read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter16/chap16q05GodwitArrivalDates.csv")
summary(birds)
##    femaleDate       maleDate    
##  Min.   :24.00   Min.   :22.00  
##  1st Qu.:35.25   1st Qu.:37.25  
##  Median :44.00   Median :48.00  
##  Mean   :45.50   Mean   :45.80  
##  3rd Qu.:55.75   3rd Qu.:55.75  
##  Max.   :69.00   Max.   :59.00
  1. Plot these data. Does the plot indicate the presence of any relationship between arrival dates of males and females?
plot(birds$femaleDate, birds$maleDate, pch = 16, xlab = "Female Arrival Date", ylab = "Male Arrival Date")

library(MVN)
mardiaTest(birds, qqplot = TRUE)
## Warning: 'mardiaTest' is deprecated.
## Use 'mvn' instead.
## See help("Deprecated")
mardiaTest(birds, qqplot = FALSE)
## Warning: 'mardiaTest' is deprecated.
## Use 'mvn' instead.
## See help("Deprecated")
library(car)

scatterplot(femaleDate ~ maleDate, data = birds, ellipse = TRUE, levels = c(0.5, 0.95), reg.line = TRUE, smoother = TRUE, xlim = c(0, 100), ylim = c(0, 100))
## Warning in plot.window(...): "levels" is not a graphical parameter
## Warning in plot.window(...): "reg.line" is not a graphical parameter
## Warning in plot.window(...): "smoother" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "levels" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "reg.line" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "smoother" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "levels" is
## not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "reg.line" is
## not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "smoother" is
## not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "levels" is
## not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "reg.line" is
## not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "smoother" is
## not a graphical parameter
## Warning in box(...): "levels" is not a graphical parameter
## Warning in box(...): "reg.line" is not a graphical parameter
## Warning in box(...): "smoother" is not a graphical parameter
## Warning in title(...): "levels" is not a graphical parameter
## Warning in title(...): "reg.line" is not a graphical parameter
## Warning in title(...): "smoother" is not a graphical parameter

  1. Test for a correlation, report your conclusion, as well as the 95% confidence interval for the correlation coefficient.
cor.test(birds$femaleDate, birds$maleDate)
## 
##  Pearson's product-moment correlation
## 
## data:  birds$femaleDate and birds$maleDate
## t = 7.0144, df = 8, p-value = 0.000111
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.7157944 0.9830331
## sample estimates:
##       cor 
## 0.9274395

Problem 2

Left-handed people have an advantage in many sports, and it has been suggested that left-handedness might have been an advantage in hand-to-hand combat in early societies. To explore this potential advantage, reearchers compared the frequency of left-handed individuals in societies with measures of the level of violence in these societies. These data are given below:

left = read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter16/chap16q13LefthandednessAndViolence.csv")
summary(left)
##   percentLeft        murder      
##  Min.   : 3.50   Min.   :0.0100  
##  1st Qu.: 6.55   1st Qu.:0.0275  
##  Median : 9.20   Median :0.3350  
##  Mean   :11.38   Mean   :1.6375  
##  3rd Qu.:14.85   3rd Qu.:3.2600  
##  Max.   :22.70   Max.   :5.3700
mardiaTest(left)
## Warning: 'mardiaTest' is deprecated.
## Use 'mvn' instead.
## See help("Deprecated")
  1. Plot these data. Does the plot indicate any association between left-handedness and propensity for violence?

  2. Use an appropriate method to test for a correlation and report your conclusion.