Installing and Loading packages

You can either use the “Tools” menu to select “Install Packages…” and enter the package name(s) or run a command as shown below:

packs = c("epitools", "psych", "lattice", "binom", "car", "multcomp", "ggplot2", "ggmap", "plotly")
# install.packages(packs, dependencies = TRUE, repos="https://cran.rstudio.com")

Note: “repos=”" is short for the URL of the repository from which packages should be downloaded. “dependencies = TRUE” tells R to also download any other packages that are linked to the package you want to install.

If any of these packages is updated by its author I will let you know how to update them.

Once a package has been installed, we can disable the R command “install.packages()” by adding a “#” symbol before the command. This way the next time you run this file to create a Word document RStudio can skip re-installing of the packages. Note also that using the “#”" symbol before any R command disables it so you could use this trick most generally.

We can load installed packages with the following command:

library(lattice)

Note:

Reading Data into R

R is very flexible and can read data in Excel formats (.xls, .xlsx), text formats (.txt), CSV (comma separated variable) formats (.csv), etc.

Let us start with a simple data set – serotonin levels in the central nervous system of desert locusts that were experimentally crowded for 0 (the control group), 1, and 2 hours. Presumably locusts are shy and solitary as a rule but become gregarious and social in a drought or when they see/smell other locusts, and hence the locust swarms. See here for an NPR story on this line of research.

locusts = read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02f1_2locustSerotonin.csv")

Now look at the upper-right window … you see the locusts data you just read-in. If you click the “play” button you’ll see the contents of the data-set. If you click the data-set you will open it up in the form of a spreadsheet. Do this now and notice you have the first column containing each locust’s serotonin level (titled “serotoninLevel” and measured in pmoles) and the treatment it was assigned to (titled “treatmentTime”).

Quick Check:

We can label the values of treatmentTime (see below):

locusts$treatmentTime = factor(locusts$treatmentTime, levels = c(0, 1, 2), labels = c("Control", "1 Hour", "2 Hours"))

Watch how treatmentTime is now showing up as a “Factor” … R-speak for a categorical variable. R also tells you how many unique groups you have and the label each group carries.

Now that we have some data to work with, how should we describe it visually? Well, if we have a numeric variable then it is appropriate to use a Histogram to depict the variable’s distribution. Here we have three groups (treatmentTime = 0, =1, =2) so we will have to setup a Histogram for each group.

histogram( ~ serotoninLevel | treatmentTime, data=locusts, layout=c(1,3))

Note the key commands here. The “~ serotoninLevel” piece tells R to use serotoninLevel as the variable of interest. “| treatmentTime” tells R to construct a histogram for each unique group is finds in a variable called treatmentTime. We also indicate the data we wish to use via “data = locusts”. The “layout=c(1,3)” tells R to squeeze the histograms into 1 column and 3 rows.

We could have also run the same plot as shown below; either way would be fine.

histogram( ~ locusts$serotoninLevel | locusts$treatmentTime, layout=c(1,3))

We may not like the default number of bins. Instead, we may want to specify the width of each groups. How should we set the bin-width? One way to do this would be to see the minimum and maximum values. We do this (below) with the “summary()” command and find Min = 3.20 and Max = 21.30. The gap (Max - Min) is 21.30 - 3.20 = 18.10. If we want to break a distribution that has a range of 18.10 into 10 groups, that would mean each group would be about 18.10 divided by 10 = 1.81 pmoles wide. So let us set this as the bin-width.

summary(locusts)
##  serotoninLevel   treatmentTime
##  Min.   : 3.200   Control:10   
##  1st Qu.: 4.675   1 Hour :10   
##  Median : 5.900   2 Hours:10   
##  Mean   : 8.407                
##  3rd Qu.:11.475                
##  Max.   :21.300
histogram(~ serotoninLevel | treatmentTime, data=locusts, breaks=seq(3.20, 21.30, by=1.81), layout=c(1,3))

We have some new commands here. The “breaks=seq(3.20, 21.30, by=1.81)” segment tells R to start making the groups from a minimum value of 3.20 to a maximum value of 21.30, and to make each group 1.81 units (here pmoles) wide.

What if we wanted only 5 groups? That would mean a bin-width of 18.10/5 = 3.62 pmoles. What would this distribution look like?

histogram(~ serotoninLevel | treatmentTime, data=locusts, breaks=seq(3.20, 21.30, by=3.62), layout=c(1,3))

Hmm, too much compression of the data here. So we would have to figure out some happy medium between 5 and 10 groups but we’ll leave it for now.

Hemoglobin Concentration Example

hemoglobinData = read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e3cHumanHemoglobinElevation.csv")

histogram(~ hemoglobin | population, data=hemoglobinData)