MPA 6020 is an advanced course in data management, analysis, and visualization with R. Students are exposed to multisectoral national and international data in available diverse formats, learn how to use the Census and other APIs, and how to build static/interactive/animated visualizations (including maps).
Please see the Syllabus and a couple of free online texts that may be helpful beyond the materials that follow.
This module introduces you to R and RStudio, both in terms of installing the needed software on your computer, and by explaining the very basics of working within RStudio. By the end of the module you will understand the fundamentals of RMarkdown, how to create html, pdf, and Word files that incorporate code, interpretive text, and tables/figures, how to read local and web-based data-files stored in any format, and how to save data in R’s native format.
This module introduces you to {ggplot2}, “a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.” You will not only learn how to choose and build the appropriate graphic, but also how to make the final product an effective tool for communicating the key narrative.
Our goal in this module is to understand some basic operations in R. Specifically, we will look at combining two or more data-sets, modifying variables, creating variables, converting quantitative variables into qualitative variables, and so on. There is a lot more we could learn here but I will focus on the essential tasks you are likely to run into.
In this module you will learn how three very powerful packages work their magic. {dplyr} and {tidyr} have quickly gained a following because they allow you to manage seemingly unwieldy data to calculate quantities of interest with relative ease, and to flip data from “long” to “wide” formats or vice-versa. {stringr} is a sister-package that works well with strings and often useful for key data cleaning.
If you want to do anything with dates, it helps to have them formatted correctly, and that is not always the case with secondary data we grab for analysis. Consequently, in this module we will see ways to work with date and time variables via base R and lubridate, a special package to work with date/time data. We’ll start by creating some data values.
Most of us end up having to work with Census data on a regular basis. In the past, we used American Factfinder (now called data.census.gov) or then the ICPSR databases to grab the data we needed. Mercifully, with the opening up of a lot of government data grabbing Census data has become very easy. There are several packages that allow you to do so but I will initially focus on two packages – {tidycensus} and {censusapi} – given their ease of use. In addition to these packages, I will show you how to work with APIs to get data from the World Bank, the U.S. Geological Survey (USGS), and other sources.
In this module you will learn how to build maps with {ggplot2} and {leaflet}. There are other packages – see {choroplethr}, {tmap}, and {sf} – that we could use but I will leave it to you to explore these on your own if mapping interests you. {leaflet} and {mapview} are especially fun and useful for creating interactive maps. We will also look at two other packages – {highcharter}, to generate interactive plots/maps, and {gganimate} to animate plots built with ggplot2. We will also see how to use {patchwork} to arrange multiple graphics on a single canvas.