In the last module we started building some visualizations in order to answer specific substantive questions. In this module we will look at generating more advanced graphics that have annotations, combine several graphics into one, build some interactive graphics, and also do some mapping.

1 Combining Plots with {patchwork}

Often you have to combine and place multiple graphics into a single canvas. There are a few ways to do this but the easiest way is that offered by the {patchwork} package. Let us use the diamonds data for this section, a data frame with 53940 rows and 10 variables:

Variable Description
price price in US dollars ($326–$18,823)
carat weight of the diamond (0.2–5.01)
cut quality of the cut (Fair, Good, Very Good, Premium, Ideal)
color diamond colour, from D (best) to J (worst)
clarity a measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best))
x length in mm (0–10.74)
y width in mm (0–58.9)
z depth in mm (0–31.8)
depth total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43–79)
table width of top of diamond relative to widest point (43–95)

1.1 The Basics

To combine multiple plots, we need to save each plot with a unique name. I am calling them p1, p2, etc.

Notice the default layout here: p1 + p2 + p3 gives us the plots all in a row. But you may have other plans, for example, to put the scatterplot in a row all its own.

Now we have p3 in the second row, all by itself. Note that this was achieved via the / operator. What if we used | instead?

You ended up with two columns, the first containing only p1 and the second containing p2 and p3. So make a note of the difference between | and /. For example, note the following setup:

What if we wanted to squeeze in the fourth plot?

1.3 Spacing and Sizing

We can also tweak the sizes of individual rows and columns, control the space between plots, and so on. First up, spacing the plots with plot_spacer()

Sizing the plots with relative sizes?

Alternatively, we could specify size with unit vectors, as shown below.

1.4 Moving Beyond the grid

We can use a layout design to get a little more flexibility but still retain full control over the result. Layout designs can be done in two ways so let us see the easiest route – as a text setup. “When using the textual representation it is your responsibility to make sure that each area is rectangular. The only exception is # which denotes empty areas and can thus be of any shape.”

The other path is using area() inside layout, as shown below.

Watch the specification here with wrap_plots()

2 Mapping

Maps are very powerful visualizations because they allow you to highlight and reflect patterns, clusters, with relative ease. For example, is poverty really higher in Appalachian counties? What about the percent of the population without health insurance? Literacy? Opioid deaths; do they follow transportation routes? What about COVID-19 cases? Maps to the rescue!

Building a map requires a few elements. First and foremost, you need some data to show on a map. Second, you need to have the geographic coordinates needed to build a map, basically the latitude and longitude of the geographies (states, cities, school districts, etc.) that you want to map. Third, you want a column that contains the names of the geographies you want to map, and these should be properly formatted (i.e., in titlecase) for displaying on the map.

Let us start by building a simple state map with the {urbnmapr} package. It comes with the necessary data for states and counties, respectively, and works well with {ggplot2}. Note the reliance on geom_polygon() now.

## Rows: 83,933
## Columns: 9
## $ long       <dbl> -88.47323, -88.46888, -88.46866, -88.45504, -88.45496, -88…
## $ lat        <dbl> 31.89386, 31.93026, 31.93317, 32.03972, 32.04058, 32.05305…
## $ order      <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,…
## $ hole       <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA…
## $ piece      <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ group      <fct> 01.1, 01.1, 01.1, 01.1, 01.1, 01.1, 01.1, 01.1, 01.1, 01.1…
## $ state_fips <chr> "01", "01", "01", "01", "01", "01", "01", "01", "01", "01"…
## $ state_abbv <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL"…
## $ state_name <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Alabama", "Al…
## Rows: 208,874
## Columns: 12
## $ long        <dbl> -86.91760, -86.81657, -86.71339, -86.71422, -86.41312, -8…
## $ lat         <dbl> 32.66417, 32.66012, 32.66173, 32.70569, 32.70739, 32.4099…
## $ order       <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17…
## $ hole        <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F…
## $ piece       <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ group       <fct> 01001.1, 01001.1, 01001.1, 01001.1, 01001.1, 01001.1, 010…
## $ county_fips <chr> "01001", "01001", "01001", "01001", "01001", "01001", "01…
## $ state_abbv  <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL…
## $ state_fips  <chr> "01", "01", "01", "01", "01", "01", "01", "01", "01", "01…
## $ county_name <chr> "Autauga County", "Autauga County", "Autauga County", "Au…
## $ fips_class  <chr> "H1", "H1", "H1", "H1", "H1", "H1", "H1", "H1", "H1", "H1…
## $ state_name  <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Alabama", "A…

Note that this is just an empty map with the shapes of the states, and also that Alaska and Hawaii have been moved so that they can be displayed on the map. We could build a much better map by removing the x and y axis labels and tick marks, and setting a white background using theme_map() from the {ggthemes} package. We could also fill with some colors, say on the basis of the state_name.

Okay, this is not very useful because it would be much better to color the map on the basis of some substantive variable. Let us see what lurks in the statedata file.

## Rows: 51
## Columns: 6
## $ year        <int> 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 201…
## $ state_fips  <chr> "01", "02", "04", "05", "06", "08", "09", "10", "11", "12…
## $ state_name  <chr> "Alabama", "Alaska", "Arizona", "Arkansas", "California",…
## $ hhpop       <int> 1846380, 250183, 2463012, 1144657, 12895471, 2074517, 134…
## $ horate      <dbl> 0.6814329, 0.6311860, 0.6206178, 0.6546031, 0.5372219, 0.…
## $ medhhincome <int> 44700, 70600, 51000, 42000, 64600, 63500, 71700, 61200, 7…

Okay, two things stand out – horate (the homeownership rate), and medhhincome (the median household income). Let us fill with median household income but to do so, we will need to join statedata to our states file. Why? Because we need coordinates to map anything and statedata does not contain coordinates. Then we can specify fill = medhhincome inside the aes(...) command.

## Rows: 83,933
## Columns: 13
## $ long        <dbl> -88.47323, -88.46888, -88.46866, -88.45504, -88.45496, -8…
## $ lat         <dbl> 31.89386, 31.93026, 31.93317, 32.03972, 32.04058, 32.0530…
## $ order       <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17…
## $ hole        <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F…
## $ piece       <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ group       <fct> 01.1, 01.1, 01.1, 01.1, 01.1, 01.1, 01.1, 01.1, 01.1, 01.…
## $ state_fips  <chr> "01", "01", "01", "01", "01", "01", "01", "01", "01", "01…
## $ state_abbv  <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL…
## $ state_name  <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Alabama", "A…
## $ year        <int> 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 201…
## $ hhpop       <int> 1846380, 1846380, 1846380, 1846380, 1846380, 1846380, 184…
## $ horate      <dbl> 0.6814329, 0.6814329, 0.6814329, 0.6814329, 0.6814329, 0.…
## $ medhhincome <int> 44700, 44700, 44700, 44700, 44700, 44700, 44700, 44700, 4…

What about working with counties? Sure, let us merge countydata with the `counties file and map.

Maybe you are only interested in Florida?

Hmm, so far so good but what if the data were for some geography not bundled with {urbnmapr}, school districts or places, for example? Not a problem, we just have to go the extra mile. First we would have to find, download, and upload the shapefile. Say I am looking for places (loosely described as municipalities) in New Hampshire. Well, the {tigris} package comes in handy because it allows you to get whatever geography’s shapefiles you want. Below I am getting the shapefile for New Hampshire.

Okay, so now that I have the shapefile, how can I use it? I need to fortify it so that it looks like a regular dataframe rather than the native SpatialPolygonsDataFrame format it comes in. When I go to make the map I am going to add the state shapefile too since otherwise the state’s boundary will not show up.

Of course, the fill is superficial here. But say we had some data for places in New Hampshire, maybe the size of the population, as in nh.data.RData. Now we could join nh.data with nh.df to create nh and then map.

## Rows: 3,939
## Columns: 9
## $ long       <dbl> -71.22967, -71.22858, -71.22532, -71.22349, -71.22136, -71…
## $ lat        <dbl> 43.46519, 43.46392, 43.46597, 43.46411, 43.46168, 43.46163…
## $ order      <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,…
## $ hole       <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA…
## $ piece      <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ id         <chr> "3300980", "3300980", "3300980", "3300980", "3300980", "33…
## $ group      <fct> 3300980.1, 3300980.1, 3300980.1, 3300980.1, 3300980.1, 330…
## $ NAME       <chr> "Alton CDP, New Hampshire", "Alton CDP, New Hampshire", "A…
## $ population <dbl> 168, 168, 168, 168, 168, 168, 168, 168, 168, 168, 168, 168…

Now we plot:

You could have also filled by creating quartiles, etc., using {Santoku}, so do not forget that option.

Before we move on, one more map, to show you the possibilities. Here, I am plotting the locations of the parking tickets issued in Philadelphia, this time with the {leaflet} package. First the tickets data-set, reduced to a random sample of 20% of the tickets issued in the month of December in 2017 (to keep the data size manageable). I have also created a popup that will display specific information if someone clicks on a point in the map.

Voila! A few lines of code and we have an interactive map that can be used to display whatever evidence we want to display. Note that you need geographic coordinates since without them the data cannot be attached to a physical location.

Let us see another twist on this. Say I am trying to map the total number of COVID-19 cases in Ohio. I know the county and I know the number of cases that occurred, as well as the latitude and longitude of each county. Well, I can build a similar plot except making the size of the circle conditional upon the number of cases. The more the cases, the larger the radius of the circle.

But there are no coordinates in cov19. Well, I can get those from the {housingData} package, and then merge that with cov19.

## Rows: 88
## Columns: 14
## $ fips.x     <fct> 39001, 39003, 39005, 39007, 39009, 39011, 39013, 39015, 39…
## $ display    <chr> "Adams: 3", "Allen: 65", "Ashland: 5", "Ashtabula: 54", "A…
## $ countyname <chr> "Adams", "Allen", "Ashland", "Ashtabula", "Athens", "Augla…
## $ extra      <chr> "", "", "", "", "", "", "", "", "", "", "", "", "", "", ""…
## $ state.x    <fct> OH, OH, OH, OH, OH, OH, OH, OH, OH, OH, OH, OH, OH, OH, OH…
## $ lon        <dbl> -83.46359, -84.10825, -82.26922, -80.75647, -82.04053, -84…
## $ lat        <dbl> 38.85662, 40.77675, 40.86122, 41.71017, 39.34576, 40.56421…
## $ rMapState  <fct> ohio, ohio, ohio, ohio, ohio, ohio, ohio, ohio, ohio, ohio…
## $ rMapCounty <fct> adams, allen, ashland, ashtabula, athens, auglaize, belmon…
## $ date       <date> 2020-04-17, 2020-04-17, 2020-04-17, 2020-04-17, 2020-04-1…
## $ state.y    <chr> "Ohio", "Ohio", "Ohio", "Ohio", "Ohio", "Ohio", "Ohio", "O…
## $ fips.y     <chr> "39001", "39003", "39005", "39007", "39009", "39011", "390…
## $ cases      <dbl> 3, 65, 5, 54, 3, 21, 63, 8, 155, 15, 6, 25, 66, 26, 154, 1…
## $ deaths     <dbl> 0, 9, 0, 4, 1, 1, 3, 1, 2, 0, 1, 0, 1, 0, 11, 0, 0, 48, 10…

Okay, now we have everything we need to build the map.

3 Interactive Graphics with Plotly and Highcharter

Interactive graphics are useful in situations where you would like the user/viewer to see the data values or other details. Say, for example, I have a plots and want to make it interactive. How can I do that? By saving my regular plot and then using {plotly} to add a ggplotly() wrapper around the plot.

These plots are useful when presenting data to a live audience (in a talk, or on the web). But, I prefer {highcharter} since it does a lot of things well, and they are aesthetically pleasing. Let us stay with the COVID-19 example. Say I want a bar-chart of the total number of cases by state.

Notice the key elements here: The basic function call is hchart() and we are specifying that we want a bar-chart, and we are also providing the quantities that should go on the x and y axis, respectively. Note that x actually ends up as the y when you specify a “bar” chart.

What if I wanted a line-chart, maybe of the number of cases over time? And I wanted this just for a few states? We could do that too, as shown below. Note that I am creating tab2, a frequency table of the number of cases by state and date, and then converting total_cases into a logarithmic form (saved as log_cases) so that we can compare the rate of change from one date to the next on a common scale.

Now here is a county-level chart that shows the total number of cases as of April 14, 2020. This is stored in tab3 created as shown below. Pay attention to this creation because we are not just creating a frequency table but also adding in a specific key we are calling code because we will need to join these data to the map data.

Here comes the map!

Note that countries/us/us-all-all indicates that we want counties. If we wanted the states instead it would have been countries/us/us-all. The value = flags the column that has to be used to populate the color scheme. The joinBy connects our data (tab4) to hcmap() so that total_cases gets attached to the correct location in the map.

What if we wanted only Ohio? Well, in that case we could subset as shown below:

4 Practice Exercises

Exercise (1)

Create a map of all the counties in New York. Be sure to title the map and to fill in each county with the total number of COVID19 cases they have seen to date. In addition, draw county borders in white. Use theme_map() and make sure the legend is at the bottom. [Hint: You will need to calculate the total number of cases per county and then join the resulting file with the counties data file to get the latitude/longitudes for the counties.]

Exercise (2)

Run the following code chunk to load data on the murder, assault and rape rates per 100,000 persons. Urbanpop is the percent of the state population that lives in an urban area.

## [1] "Murder"   "Assault"  "UrbanPop" "Rape"

Now create a state-level map of the 50 states making sure to use UrbanPop to fill each state. Title the map and place the legend at the bottom.

Exercise (3)

Use the USArrests data to draw scatterplots of (a) Murder versus UrbanPop, (b) Assault versus UrbanPop, and (c) Rape versus UrbanPop. Save each of these scatterplots by name and then use patchwork to create a single canvas that includes all three plots. Make sure you label the x-axis, y-axis, and title each plot.

Exercise (4)

Now create highcharter versions of each of the three scatterplots you created in Exercise (3) above. You should end up with three scatterplots, each on its own canvas.

Exercise (5)

The sftrees dataset includes various pieces of data on trees in San Francisco. In turn, sf.df is a 10% random sample of these trees.

Use leaflet to create a map that shows, as circles, the location of each tree, and make sure when someone clicks on the circle the popup shows the species. Also ensure that the circles are green in color.

