These are materials from a workshop I taught in May of 2019. The goal was to introduce R and RStudio, basic data manipulation with the tidyverse, and then provide an overview of literate programming for report generation (automated plus manual).
We started as usual by getting to know R and RStudio a bit, how t install them, the RStudio IDE’s panes, installing/updating packages, CRAN repositories, compiling packages from source, and then reading in data files in various formats and ways.
tidyverse
“Yet far too much handcrafted work — what data scientists call “data wrangling,” “data munging” and “data janitor work” — is still required. Data scientists, according to interviews and expert estimates, spend from 50 percent to 80 percent of their time mired in this more mundane labor of collecting and preparing unruly digital data, before it can be explored for useful nuggets." Well, as the quote underscores, cleaning data takes the bulk of our time so knowing how to go about accomplishing tasks you will likely need to accomplish weekly if not daily is well worth the effort. In this session we covered some basic dplyr
and 'tidyr
verbs. I threw in some lubridate
but we had to skip this section because of time constraints.
ggplot2
In the last and final session of the workshop we spent a little bit of time going over the basics of ggplot2
. Having another half-day would have been useful because we just did not get very far, as is obvious from the material in this module.