Installing and Loading packages

Use the “Tools” menu to select “Install Packages…” and enter the following package name(s) or run the command as shown below:

packs = c("vcd", "vcdExtra", "dplyr")
# install.packages(packs, dependencies = TRUE, repos="https://cran.rstudio.com")

Remember to removed the # symbol in line 25 before running the install.packages command. Then once you are done installing these packages go ahead and reinsert the # symbol.


Problem (a)

More than 10% of people carry the parasite Taxoplasma gondii. The data (see below) are from Prague and provide information on 15-29 year-old drivers who were in an accident. Some of these drivers were infected with the parasite and others were not. Other drivers represent those who were not in an accident, and some of these drivers are infected while others are not infected.

taxo = read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02q32ToxoplasmaAccidents.csv")
(i) What variable types are these? Numeric or categorical? If categorical, are they nominal or ordinal?  


(ii) The goal here is to analyze the data to see if there is any association (any relationship) between being infected and being involved in a car accident. Use an appropriate table to carry out this analysis.
tab.1 = table(taxo$driverType, taxo$infectionStatus)
tab.1
##            
##             infected uninfected
##   accidents       21         38
##   control         38        211
tab.2 = prop.table(tab.1, 2)  # row proportions
tab.2
##            
##              infected uninfected
##   accidents 0.3559322  0.1526104
##   control   0.6440678  0.8473896
tab.3 = tab.2 * 100
tab.3 
##            
##             infected uninfected
##   accidents 35.59322   15.26104
##   control   64.40678   84.73896

What conclusions can we draw? Why?

(iii) Depict this association with an appropriate graphic. 
barplot(tab.3, ylim=c(0, 100), beside=TRUE, xlab="Infection Status", ylab="Relative Frequencies (%)")

mosaicplot(tab.3, main="Mosaicplot of Infection Status & Accidents")

Both plots show a higher rate of accidents for the infected group.

Problem (b)

Mattison et al. (2012) carried out a study with rhesus monkeys to see if a reduction in food intake extends life span (measured in years). The data 9see below) are the life spans of 19 Male and 15 Female monkeys. Monkeys were randomly assigned either to a normal nutritious diet group or to a similar diet but reduced by 30% in terms of the quantity of food.

rhesus = read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02q35FoodReductionLifespan.csv")
(i) What types of variables are these? Numeric? Categorical? If categorical, are they nominal or ordinal?



(ii) Calculate the Mean, Median, Variance, Standard Deviation for each group. 
library(dplyr)

grouped = group_by(rhesus, sex, foodTreatment)

tab.A = summarise(grouped,
        Mean = mean(lifespan, na.rm=TRUE),
        Median = median(lifespan, na.rm=TRUE),
        Variance = var(lifespan, na.rm=TRUE),
        Std.Dev = sd(lifespan, na.rm=TRUE)
        )

tab.A
## Source: local data frame [4 x 6]
## Groups: sex [?]
## 
##      sex foodTreatment     Mean Median Variance  Std.Dev
##   (fctr)        (fctr)    (dbl)  (dbl)    (dbl)    (dbl)
## 1 female       control 28.67500   27.1 22.15643 4.707062
## 2 female       reduced 26.08571   27.8 48.85810 6.989857
## 3   male       control 33.33333   34.1 32.87000 5.733236
## 4   male       reduced 34.69000   37.0 36.40767 6.033877

You could also do this with another package (see below):

library(psych)

describeBy(rhesus$lifespan, list(rhesus$sex, rhesus$foodTreatment))
## : female
## : control
##   vars n  mean   sd median trimmed  mad  min  max range skew kurtosis   se
## 1    1 8 28.68 4.71   27.1   28.68 4.45 23.7 35.2  11.5 0.28    -1.94 1.66
## -------------------------------------------------------- 
## : male
## : control
##   vars n  mean   sd median trimmed  mad  min  max range skew kurtosis   se
## 1    1 9 33.33 5.73   34.1   33.33 6.67 24.9 40.7  15.8 -0.3    -1.56 1.91
## -------------------------------------------------------- 
## : female
## : reduced
##   vars n  mean   sd median trimmed  mad  min  max range  skew kurtosis
## 1    1 7 26.09 6.99   27.8   26.09 7.71 16.5 35.9  19.4 -0.07    -1.73
##     se
## 1 2.64
## -------------------------------------------------------- 
## : male
## : reduced
##   vars  n  mean   sd median trimmed  mad  min  max range  skew kurtosis
## 1    1 10 34.69 6.03     37   35.38 4.74 23.7 40.2  16.5 -0.51    -1.44
##     se
## 1 1.91

Here you are getting a lot more statistics than we need to deal with but if you find this easier to use, by all means do so.

(iii) Which group has higher average life spans, regardless of gender?



(iv) Within gender, which group has the higher average life span?



(v) Looking at the picture as a whole, what differences in life spans do you see between the male and female rhesus monkeys in this experiment? 



(vi) If we had to display these data, what would be some appropriate ways to do so? 



(vii) Let us go ahead and generate some of these plots and then compare them to see if the distribution of life spans is symmetric or skewed, and if it is skewed, the direction of the skew.
library(lattice)

# Histogram
histogram(~ lifespan | sex:foodTreatment, data=rhesus, layout=c(1,4), xlab="Life span (in years)")

# Box-plot
boxplot(lifespan ~ sex:foodTreatment, data=rhesus, horizontal=TRUE, 
        xlab="Life span (in years)",
        ylab = "Group",
        cex.axis = 0.75)

# Strip Chart 
par(bty = "l") # plot x and y axes only, not a complete box

stripchart(lifespan ~ sex:foodTreatment, data = rhesus, vertical = TRUE, method = "jitter", pch = 16, col = "forestgreen", cex = 1, las = 1, ylab = "Life span (in years)", xlab = "Group", ylim = c(0, 50))

# calculate mean life span for each group
means = tapply(rhesus$lifespan, list(rhesus$sex, rhesus$foodTreatment), mean)

# "segments" draws draws lines to indicate the means
segments(x0 = c(1, 2, 3, 4) - 0.1, y0 = means, x1 = c(1, 2, 3, 4) + 0.1,
         y1 = means, lwd = 2, col = "brown")

What can you say about symmetry versus skew (and the direction of the skew)?

(viii) What measure would be most appropriate, then, to describe the average lifespan for any of these groups?

Practice Problem

Chapter 3, #6 – Body Mass (in grams) and beak length (in mm) of finches captured in mist nets in Kenya (Schluter 1988)

finches = read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter03/chap03q04KenyaFinches.csv")
(1) Calculate mean, median, variance, standard deviation of (a) body mass, and beak length

(2) Depict (a) body mass and (b) beak length with an appropriate graphic

(3) Calculate how many finches do we have of each species and express in a table as proportions, percentages

(4) Depict species' proportions with an appropriate graphic