Use the “Tools” menu to select “Install Packages…” and enter the following package name(s) or run the command as shown below:
packs = c("vcd", "vcdExtra", "dplyr")
# install.packages(packs, dependencies = TRUE, repos="https://cran.rstudio.com")
Remember to removed the #
symbol in line 25 before running the install.packages
command. Then once you are done installing these packages go ahead and reinsert the #
symbol.
More than 10% of people carry the parasite Taxoplasma gondii. The data (see below) are from Prague and provide information on 15-29 year-old drivers who were in an accident. Some of these drivers were infected with the parasite and others were not. Other drivers represent those who were not in an accident, and some of these drivers are infected while others are not infected.
taxo = read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02q32ToxoplasmaAccidents.csv")
(i) What variable types are these? Numeric or categorical? If categorical, are they nominal or ordinal?
(ii) The goal here is to analyze the data to see if there is any association (any relationship) between being infected and being involved in a car accident. Use an appropriate table to carry out this analysis.
tab.1 = table(taxo$driverType, taxo$infectionStatus)
tab.1
##
## infected uninfected
## accidents 21 38
## control 38 211
tab.2 = prop.table(tab.1, 2) # row proportions
tab.2
##
## infected uninfected
## accidents 0.3559322 0.1526104
## control 0.6440678 0.8473896
tab.3 = tab.2 * 100
tab.3
##
## infected uninfected
## accidents 35.59322 15.26104
## control 64.40678 84.73896
What conclusions can we draw? Why?
(iii) Depict this association with an appropriate graphic.
barplot(tab.3, ylim=c(0, 100), beside=TRUE, xlab="Infection Status", ylab="Relative Frequencies (%)")
mosaicplot(tab.3, main="Mosaicplot of Infection Status & Accidents")
Both plots show a higher rate of accidents for the infected group.
Mattison et al. (2012) carried out a study with rhesus monkeys to see if a reduction in food intake extends life span (measured in years). The data 9see below) are the life spans of 19 Male and 15 Female monkeys. Monkeys were randomly assigned either to a normal nutritious diet group or to a similar diet but reduced by 30% in terms of the quantity of food.
rhesus = read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02q35FoodReductionLifespan.csv")
(i) What types of variables are these? Numeric? Categorical? If categorical, are they nominal or ordinal?
(ii) Calculate the Mean, Median, Variance, Standard Deviation for each group.
library(dplyr)
grouped = group_by(rhesus, sex, foodTreatment)
tab.A = summarise(grouped,
Mean = mean(lifespan, na.rm=TRUE),
Median = median(lifespan, na.rm=TRUE),
Variance = var(lifespan, na.rm=TRUE),
Std.Dev = sd(lifespan, na.rm=TRUE)
)
tab.A
## Source: local data frame [4 x 6]
## Groups: sex [?]
##
## sex foodTreatment Mean Median Variance Std.Dev
## (fctr) (fctr) (dbl) (dbl) (dbl) (dbl)
## 1 female control 28.67500 27.1 22.15643 4.707062
## 2 female reduced 26.08571 27.8 48.85810 6.989857
## 3 male control 33.33333 34.1 32.87000 5.733236
## 4 male reduced 34.69000 37.0 36.40767 6.033877
You could also do this with another package (see below):
library(psych)
describeBy(rhesus$lifespan, list(rhesus$sex, rhesus$foodTreatment))
## : female
## : control
## vars n mean sd median trimmed mad min max range skew kurtosis se
## 1 1 8 28.68 4.71 27.1 28.68 4.45 23.7 35.2 11.5 0.28 -1.94 1.66
## --------------------------------------------------------
## : male
## : control
## vars n mean sd median trimmed mad min max range skew kurtosis se
## 1 1 9 33.33 5.73 34.1 33.33 6.67 24.9 40.7 15.8 -0.3 -1.56 1.91
## --------------------------------------------------------
## : female
## : reduced
## vars n mean sd median trimmed mad min max range skew kurtosis
## 1 1 7 26.09 6.99 27.8 26.09 7.71 16.5 35.9 19.4 -0.07 -1.73
## se
## 1 2.64
## --------------------------------------------------------
## : male
## : reduced
## vars n mean sd median trimmed mad min max range skew kurtosis
## 1 1 10 34.69 6.03 37 35.38 4.74 23.7 40.2 16.5 -0.51 -1.44
## se
## 1 1.91
Here you are getting a lot more statistics than we need to deal with but if you find this easier to use, by all means do so.
(iii) Which group has higher average life spans, regardless of gender?
(iv) Within gender, which group has the higher average life span?
(v) Looking at the picture as a whole, what differences in life spans do you see between the male and female rhesus monkeys in this experiment?
(vi) If we had to display these data, what would be some appropriate ways to do so?
(vii) Let us go ahead and generate some of these plots and then compare them to see if the distribution of life spans is symmetric or skewed, and if it is skewed, the direction of the skew.
library(lattice)
# Histogram
histogram(~ lifespan | sex:foodTreatment, data=rhesus, layout=c(1,4), xlab="Life span (in years)")
# Box-plot
boxplot(lifespan ~ sex:foodTreatment, data=rhesus, horizontal=TRUE,
xlab="Life span (in years)",
ylab = "Group",
cex.axis = 0.75)