Maps, Interactive, and Animated Graphics

In this module you will learn how to build maps with {ggplot2} and {leaflet}. There are other packages – see {choroplethr}, {tmap}, and {sf} – that we could use but I will leave it to you to explore these on your own if mapping interests you. {leaflet} and {mapview} are especially fun and useful for creating interactive maps. We will also look at two other packages – {highcharter}, to generate interactive plots/maps, and {gganimate} to animate plots built with ggplot2. We will also see how to use {patchwork} to arrange multiple graphics on a single canvas.

Ani Ruhil
2022-02-16

Agenda

In this module you will learn how to build maps with {ggplot2} and {ggmap}, and then with {leaflet}. There are other packages – see {choroplethr}, {tmap}, and {sf} – that we could use but I will leave it to you to explore these on your own if mapping interests you. {leaflet} is especially fun and useful for creating interactive maps. We will also look at two other packages – {highcharter}, to generate interactive plots/maps, and {gganimate} to animate plots built with {ggplot2}. We will also see how to use {patchwork} to arrange multiple graphics on a single canvas.

Mapping with {ggplot2} and {ggmap}

Having worked with {ggplot2} in the previous module we might as well stick to it since the coding syntax will be relatively familiar. Start by loading the following packages; remember to install them if you get a “package not found” error.

library(ggplot2)
library(ggmap)
library(maps)
library(mapdata)
library(maptools)
library(ggthemes)

It is pretty easy to build a state or county map. I’ll focus on building a county-level map of the USA, just to show you how easy it is to build one. Then we can focus on Ohio per se. The code below shows you how to download the dataframe for all ccounties and then how to subset it to counties in Ohio.

map_data("county") -> usa # get basic map data for all USA counties 

subset(usa, region == "ohio") -> oh # subset to counties in Ohio 

names(oh)
[1] "long"      "lat"       "group"     "order"     "region"   
[6] "subregion"

You see six columns/variables in the usa data-frame. Pay attention to the contents of each, described below:

Parameter Description
long longitude, a measure of east-west position. The prime meridian is assigned the value of 0 degrees, and runs through Greenwich (England). Athens, Ohio has a longitude of -82.101255
lat latitude, a measure of north-south position. The equator is defined as 0 degrees, the North Pole as 90 degrees north, and the South Pole as 90 degrees south. Athens, Ohio has a latitude of 39.329240
group an identifier that is unique for each subregion (here the counties)
order an identifier that indicates the order in which the boundary lines should be drawn
region string indicator for regions (here the states)
subregion string indicator for sub-regions (here the county names)

At this point, go ahead and open google maps in your browser and search for two places that have some meaning, some value to you – hometown, your actual residence, favorite vacation spot, and so on. When you find this place on the map, right-click and try to copy and paste the latitude/longitude you see reported for each place. Save these latitudes/longitudes since you will need them for some mapping exercises at the end of this module.

The actual map can be built as shown below.

ggplot() + 
  geom_polygon(
    data = usa, 
    aes(x = long, y = lat), 
    fill = "white",
    color = "black"
    ) + 
  coord_fixed(1.3) + 
  labs(
    title = "A Disastrous Map", 
    subtitle = "(County borders are all messed up)"
    ) 

But this is not very good because the county borders are incorrectly drawn. Why is that? Because we forgot to specify the grouping structure for drawing these lines (i.e., should all the latitudes/longitudes we see for each county be connected in order by county or should they be connected just by order?). Yes of course, they should connected in order by county, and this can be specified via the group = command, resulting in the county borders being drawn correctly.

ggplot() + 
  geom_polygon(
    data = usa, 
    aes(x = long, y = lat, group = group), 
    fill = "white", 
    color = "black"
    ) + 
  coord_fixed(1.3) +   
  labs(
    title = "A Better Map", 
    subtitle = "(County borders seem okay)"
    )

We could improve upon this map by filling in each county; there are 3000+ counties so the colors will not be unique but that is okay for now. I have also switched the county borders to be inked in white.

ggplot() + 
  geom_polygon(
    data = usa, 
    aes(x = long, y = lat, group = group), 
    fill = usa$group, 
    color = "white"
    ) + 
  coord_fixed(1.3) +   
  labs(
    title = "An Even Better Map", 
    subtitle = "(Counties are filled in with a color)"
    )

What if we want to also draw the state borders?

map_data("state") -> usast 

ggplot() + 
  geom_polygon(
    data = usa, 
    aes(x = long, y = lat, group = group, fill = region),
    color = "black"
    ) + 
  geom_polygon(
    data = usast, 
    aes(x = long, y = lat, group = group), 
    fill  = NA, 
    color = "blue"
    ) +   
  coord_fixed(1.3) + 
  labs(
    title = "An Even Better Map", 
    subtitle = "(Counties are filled in by a common color for each state)"
    ) + 
  theme(legend.position = "none")

Now, you could also do this with another package, {urbnmapr} so let us see it very briefly. Here is a map with state borders and then with county borders. Note that Alaska and Hawaii are visible in these maps but were conspicuously absent from the earlier maps drawn via {ggplot2} and {ggmap}. Note that you may have to install it with the command shown below:

devtools::install_github(“UrbanInstitute/urbnmapr”

library(tidyverse)
library(urbnmapr)

states %>%
  ggplot(aes(long, lat, group = group)) +
    geom_polygon(
      fill = "grey", color = "black", size = 0.25
      ) +
    coord_map(
      projection = "albers", lat0 = 39, lat1 = 45
      ) + 
  labs(title = "Map of the 50 States")

counties %>%
  ggplot(aes(long, lat, group = group)) +
    geom_polygon(
      fill = "grey", color = "darkred", size = 0.05
      ) +
    coord_map(
      projection = "albers", lat0 = 39, lat1 = 45
      ) + 
  labs(title = "Map of States and Counties")

If you want to learn more about the {urbanmapr} package you should read about it here

Labeling points and other features

Okay, so much for basic maps. Now we drill down to Ohio counties alone and see how we can label them, add points to them, and then also use a color scheme to color them on the basis of some measure such as percent of children in poverty in the county, median household income, and so on.

In order to label counties we will need to find the center of each county and then use the county names to show up at this latitude/longitude. Note that taking the mean/median of latitude/longitude will not work so we use specific code to find the centroids. Unfortunately, county names will have to be formatted into titlecase since they show up as subregion with all lowercase characters.

We start by leaning on the {stringr} package to correct how names will be displayed. We then calculate the mid-point of each county and call the result centroids2.

library(stringr)

str_to_title(oh$subregion) -> oh$county # Create a new variable called county with the correct format

library(sp) 
getLabelPoint <- # Returns a county-named list of label points
function(county){Polygon(county[c('long', 'lat')])@labpt}
centroids = by(oh, oh$county, getLabelPoint)     # Returns list
centroids2 <- do.call("rbind.data.frame", centroids)  # Convert to Data Frame
centroids2$county = rownames(centroids)
names(centroids2) <- c('clong', 'clat', "county")                 # Appropriate Header

Now we are ready to plot …

ggplot() + 
  geom_polygon(
    data = oh, 
    aes(x = long, y = lat, group = group), 
    fill = "white", 
    color = "gray"
    ) + 
  coord_fixed(1.3) + 
  geom_text(
    data = centroids2, 
    aes(x = clong, y = clat, label = county), 
    color = "darkblue", 
    size = 2.25
    )  + 
  theme_map()

Note that geom_text() is used to add the county names to the map. The color = and size = commands are tweaking the font color and size. If I relied on geom_label() I would have a less effective map with needless borders around each county name.

ggplot() + 
  geom_polygon(
    data = oh, 
    aes(x = long, y = lat, group = group), 
    fill = "white", 
    color = "gray"
    ) + 
  coord_fixed(1.3) + 
  geom_label(
    data = centroids2, 
    aes(x = clong, y = clat, label = county), 
    color = "darkblue", 
    size = 2.25
    )  + 
  theme_map()

I would also like to highlight where Ohio University’s Athens and Lancaster campuses are located. I know their latitudes/longitudes – Athens is 39.324391, -82.101443 and Lancaster is 39.738743, -82.586373. Let me use a red dot for each.

data.frame(
  place = c("Athens", "Lancaster"), 
  lat = c(39.324391, 39.738743), 
  long = c(-82.101443, -82.586373)
  ) -> campuses 

ggplot() + 
  geom_polygon(
    data = oh, 
    aes(x = long, y = lat, group = group), 
    fill = "white", 
    color = "gray"
    ) + 
  coord_fixed(1.3) + 
  geom_text(
    data = centroids2, 
    aes(x = clong, y = clat, label = county), 
    color = "darkblue", size = 2.25
    )  +  
  geom_point(
    data = campuses, 
    aes(x = long, y = lat), color = "red"
    ) + 
  theme_map()

Using another variable to fill the map

Now, often you will see maps that use a color scheme to represent some spatial distribution, median household income, poverty, and so on. Well, we will build one such map by using the percent of children in poverty in each county in Ohio. The data were sent to you via slack so make sure the file is in your data folder. We will read it in and then build the map.

library(readxl)

read_excel(
  "~/Documents/Teaching/mpa5830/data/acpovertyOH.xlsx", 
  sheet = "counties"
  ) -> acpovertyOH 

c(
  "ranking", "county", "child1216", "child0711", "all1216", "all0711"
  ) -> colnames(acpovertyOH) 

Now we merge this file with oh, noting that the merge key will be county since both files have this variable.

merge(
  oh, 
  acpovertyOH[, c(2:3)], 
  by = "county", 
  all.x = TRUE, 
  sort = FALSE
  ) -> my.df 

my.df[order(my.df$order), ] -> my.df

merge(
  oh, 
  acpovertyOH, 
  by = "county", 
  all.x = TRUE, 
  sort = FALSE
  ) -> my.df2 

my.df2[order(my.df2$order), ] -> my.df2

Now we start building the map.

ggplot() + 
  geom_polygon(
    data = my.df, 
    aes(x = long, y = lat, group = group, fill = child1216),
    color = "black"
    ) + 
  coord_fixed(1.3) +
  geom_text(
    data = centroids2, 
    aes(x = clong, y = clat, label = county), 
    color = "black", 
    size = 2.25
    ) + 
  scale_fill_distiller(palette = "Spectral") + 
  labs(fill = "Child Poverty %") + 
  theme_map() + 
  theme(legend.position = "bottom")

That isn’t a bad map but we could do better, by creating quartiles (4 groups) or quintiles (5 groups) so that it is easier to pinpoint which county falls into the top 25%, bottom 20%, and so on of whatever measure it is that we grouped. The code below shows you how to generate the quintiles and map with them.

library(dplyr)
my.df %>% 
  mutate(
    grouped_poverty = cut(
      child1216, 
      breaks = c(quantile(my.df$child1216, probs = seq(0, 1, by = 0.2))),
      labels = c("0-20", "20-40", "40-60", "60-80", "80-100"), 
      include.lowest = TRUE
      )
    ) -> my.df 

ggplot() + 
  geom_polygon(
    data = my.df, 
    aes(x = long, y = lat, group = group, fill = grouped_poverty),
    color = "black"
    ) + 
  coord_fixed(1.3) + 
  geom_text(
    data = centroids2, 
    aes(x = clong, y = clat, label = county),
    color = "white", 
    size = 2.25
    ) + 
  scale_fill_brewer(palette = "Set1", direction = -1) + 
  labs(fill = "Poverty Quintiles") + 
  theme_map() + 
  theme(legend.position = "bottom")

What if we wanted only three groups, say the top one-third, middle one-third, and then the lowest one-third?

my.df %>% 
  mutate(
    grouped_poverty = cut(
      child1216, breaks = c(quantile(
        my.df$child1216, 
        probs = seq(0, 1, by = 1/3))),
      labels = c("Bottom-third", "Middle-third", "Top-third"),
      include.lowest = TRUE
      )
    ) -> my.df 

ggplot() + 
  geom_polygon(
    data = my.df, 
    aes(x = long, y = lat, group = group, fill = grouped_poverty),
    color = "black"
    ) + 
    coord_fixed(1.3) + 
  geom_text(
    data = centroids2, 
    aes(x = clong, y = clat, label = county), 
    color = "white", 
    size = 2.25
    ) + 
    scale_fill_brewer(palette = "Set1", direction = -1) + 
  labs(fill = "Poverty Groups") + 
  theme_map() + 
  theme(legend.position = "bottom")

Now, all of these maps have been limited to just the 48 mainland states; Alaska and Hawaii are absent. Can we fold these in? Yes we can, with the {fiftystater} package.

devtools::install_github("wmurphyrd/fiftystater")

library(fiftystater)
data("fifty_states")
names(fifty_states)
[1] "long"  "lat"   "order" "hole"  "piece" "id"    "group"
ggplot() + 
  geom_polygon(
    data = fifty_states, 
    aes(x = long, y = lat, group = group, fill = id), 
    color = "white"
    ) + 
  theme_map()  + 
  theme(legend.position = "none")

{leaflet} maps

{leaflet} is an easy to learn a JavaScript library that generates interactive maps.It is fun and the possibilities are endless. You will need three libraries so make sure you install {leaflet}, {leaflet.extras}, and {widgetframe}. The basic command structure is as follows, you call leaflet via leaflet() and specify the latitude/longitude to be used to center the map, and the zoom factor to be applied. The higher the zoom number the more zoomed-in the view will be and the smaller the zoom number the more zoomed-out the view will be. The addTiles() command adds default tiles but you can tweak this (see the book chapter for examples). The setMapWidgetStyle() and the other commands that follow allow you to customize how the map will look in the knitted html document.

The one that shows up has been centered around Athens, Ohio and is using the default map-tiles. Note that you can zoom in/out witht he map, as well as move the map around so that you end up in some other place.

library(leaflet)
library(leaflet.extras)
library(widgetframe)

leaflet() %>% 
  setView(lat = 39.322577, lng = -82.106336, zoom = 14) %>% 
  addTiles() %>% 
  setMapWidgetStyle() %>%  
  frameWidget(width = '1200', height = '500') -> m1 

m1

So far so good, now how about dropping a pin on Building 21 on The Ridges, the main administrative building of the Voinovich School? This is done with addMarkers(), with the popup = c() switch indicating what text should be displayedif the popup is clicked.

leaflet() %>% 
  setView(lat = 39.322577, lng = -82.106336, zoom = 15) %>% 
  addMarkers(
    lat = 39.319984, lng = -82.107084,
    popup = c("The Ridges, Building 21")
    ) %>%
  addTiles() %>% 
  setMapWidgetStyle() %>%  
frameWidget(width = '1200', height = '500') -> m2 

m2

This, in a nutshell, is the basic setup of a leaflet map. There is tons more you could do so if interested, check out the several examples out there on the web, starting with this documentation. For now we see a few more extensions. Here, for instance, is how one might take a large data-set and display specific features. In particular, let us map some bike-share stations in New York City. The actual data-frame is rather large so we draw a random sample of 30 rows with the sample_n() command from {dplyr}. The map will show pop-ups that, if clicked, will display the location of the bike-share station.

load(
  here::here("data", "citibike.RData")
  )


citibike %>% 
  sample_n(30) -> citibike2 
leaflet(data = citibike2, width = "100%") %>% 
  setView(lat = 40.74, lng = -73.99, zoom = 11) %>% 
  addTiles() %>% 
  addMarkers(
    data = citibike2, lat = ~start.station.latitude, 
    lng = ~start.station.longitude, 
    label = ~start.station.id, 
    popup = ~start.station.name
    ) %>% 
  setMapWidgetStyle() %>%  
  frameWidget(width = '1200', height = '500') -> m3 

m3

Multiple plots on one canvas with {patchwork}

When you are building a visualization you often end up needing to squeeze multiple graphics into a single canvas. There are several ways to do it in R but I am showing you what may be the easiest way to do it – with {patchwork}. You may have to install it via {devtools} as shown below:

devtools::install_github(“thomasp85/patchwork”)

Start by loading {patchwork} and {ggplot2} (and any other libraries you plan to use for the plots). Start by naming each plot; most of us end up naming them p1, p2, and so on (why? because those were the earliest examples on the web). Then decide on how you want the plots to show up. That is, how many plots do you have? Should they be side-by-side? If you have an odd-number of plots, should two be side-by-side and the third in a row below these two? Or the other way around? What plot should come first? What plot should be last? Let us create three plots and see how the package works.

library(ggplot2)
library(patchwork)
data(mtcars)
ggplot(
  mtcars, 
  aes(
    x = factor(am, labels = c("Automatic", "Manual")), 
    y = mpg)
  )  + 
  geom_boxplot() + 
  labs(x = "Automatic/Manual", y = "Miles per gallon") -> p1

ggplot(
  mtcars, 
  aes(x = wt, y = mpg)
  )  + 
  geom_point() + 
  labs(x = "Curb Weight", y = "Miles per gallon") -> p2

ggplot(
  mtcars, 
  aes(x = qsec, y = mpg)
  )  + 
  geom_point() + 
  labs(x = "Quarter Mile times", y = "Miles per gallon") +
  facet_wrap(~gear) -> p3

Say I want two plots, each in its own column.

p1 + p2

Hmm, maybe one per row?

p1 + p2  + plot_layout(nrow = 2)

p1 + p2  + plot_layout(ncol = 1)

You see plot_layout() used in the second plot command. Ithas several options, the key ones being

Say I want to fill row 1 with p1, p2, then row 2 with p1, p2

p1 + p2 + p1 + p2 + plot_layout(ncol = 2, byrow = TRUE)

What if I want to fill column 1 with p1, p2, then column 2 with p1, p2?

p1 + p2 + p1 + p2 + plot_layout(ncol = 2, byrow = FALSE)

I can also use () to group sub-plots.

(p1 + (p2 + p3) + plot_layout(ncol = 1))

((p2 + p3) + p1 + plot_layout(ncol = 1)) # Not  the same thing!

Note that the | specifies vertical layouts and the / specifies horizontal layouts

(p1 | p2 | p1) / p3 

and then one can specify the heights/widths of each plot.

(p1 + (p2 + p3) + plot_layout(ncol = 1, heights = c(1,2)))

(p1 + (p2 + p3) + plot_layout(ncol = 1, heights = c(2,1)))

You can explore other settings here and if you want to see another package that tries to achieve similar results, explore cowplot here.

Interactive graphics with {highcharter}

highcharter is one of my favorite packages for dynamic plots because it builds them with ease and yet they are visually stunning (see below). This is advanced material so be warned.

The first plot is a heatmap using unemployment rates (value) in counties (code in countries/us/us-all-all).

library(highcharter)
data(unemployment)

hcmap(
  map = "countries/us/us-all-all", 
  data = unemployment,
  name = "Unemployment", 
  value = "value", 
  joinBy = c("hc-key", "code"),
  borderColor = "transparent"
  ) %>%
  hc_colorAxis(
    dataClasses = color_classes(c(seq(0, 10, by = 2), 50))
    ) %>% 
  hc_legend(
    layout = "vertical", 
    align = "right",
    floating = TRUE, 
    valueDecimals = 0, 
    valueSuffix = "%"
    ) 

Here is a scatterplot built with the epa.RData and keeping only 100 randomly sampled observations (to keep things manageable). Notice the specification scatter for chart-type, and the hcaes().

load(
  here::here("data", "epa.RData")
  )

library(dplyr)
epa %>% 
  filter(year == 2019) %>% 
  sample_n(100) -> epa2 

hchart(
  epa2, 
  "scatter", 
  hcaes(x = city08, y = highway08, group = make)
  )

Here is a line chart using the unemployment rate data. I am extracting year so I can use it for the x-axis, and then calculating the average unemployment rate by year and education group (educ_group).

load(
  here::here("data", "urate.RData")
  )

library(lubridate)
year(urate$yearmonth) -> urate$year

urate %>% 
  group_by(educ_group, year) %>% 
  summarise(
    avg.urate = mean(rate, na.rm = TRUE)
    ) -> urate2 

hchart(
  urate2, 
  "line", 
  hcaes(x = year, 
        y = avg.urate, 
        group = educ_group)
  ) %>%
   hc_title(
     text = "<span style=\"color:#e88e88\"> Unemployment Rates by Educational Group</span>", 
     useHTML = TRUE
     ) %>% 
  hc_xAxis(
    title = list(text = "Year")
    ) %>%
  hc_yAxis(
    title = list(text = "Average Unemployment Rate (%)")
    ) %>%
  hc_tooltip(
    table = TRUE, 
    sort = TRUE, 
    valueDecimals = 1,
    valueSuffix = "%"
    )

And then dressing up the highcharter plot with themes and some customization.

hchart(
  urate2, 
  "line", 
  hcaes(
    x = year, y = avg.urate, group = educ_group)
  ) %>% 
   hc_title(
     text = "<span style=\"color:#e88e88\"> Unemployment Rates by Educational Group</span>", 
     useHTML = TRUE
     ) %>% 
  hc_xAxis(
    title = list(text = "Year")
    ) %>%
  hc_yAxis(
    title = list(text = "Average Unemployment Rate (%)")
    ) %>%
  hc_tooltip(
    table = TRUE, 
    sort = TRUE, 
    valueDecimals = 1,
    valueSuffix = "%"
    ) %>% 
  hc_add_theme(hc_theme_flatdark())

Animated graphics with {gganimate}

This is advanced material too so you have been twice warned! The {gganimate} library can be tricky to run without errors and hiccups because it needs other packages to be installed and configured; see here for details. To complicate matters, {gganimate} is being completely overhauled and the new version should be released in the next few weeks so be sure to check its documentation available here.

library(gapminder)
library(gganimate)

ggplot(
  gapminder, 
  aes(gdpPercap, lifeExp, size = pop, colour = country)
  ) +
  geom_point(alpha = 0.7, show.legend = FALSE) +
  scale_colour_manual(values = country_colors) +
  scale_size(range = c(2, 12)) +
  scale_x_log10() +
  facet_wrap(~continent) +
  labs(
    title = 'Year: {frame_time}', 
    x = 'GDP per capita', 
    y = 'life expectancy'
    ) +
  transition_time(year) +
  ease_aes('linear') -> p1 

animate(p1) 

This code rebuilds a famous visualization a la Hans Rosling, coming close to at least capturing the spirit of Hans. Life expectancy is mapped for the continents by gross domestic product per capita, and across years. Each color represents a country within the continent, and the size of the bubbles is proportional to the country’s population size.
# Practice Tasks

Ex. 1

Create a map of the 48 contiguous states in the United State. Be sure to title the map and to fill in each state with colors while drawing state borders in white. Make sure you add state names by first calculating the centroids of each state and then merging these latitudes and longitudes with the map data. Use theme_map() and make sure the legend is not visible.

Ex. 2

Run the following code chunk to load data on the murder, assault and rape rates per 100,000 persons. Urbanpop is the percent of the state population that lives in an urban area.

data(USArrests)
names(USArrests)
rownames(USArrests) -> USArrests$statename 

library(ggplot2)
library(ggmap)
library(maps)
library(mapdata)
library(maptools)
library(ggthemes)

map_data("state") -> usast 

library(stringr)
str_to_title(usast$region) -> usast$statename 

library(dplyr)
usast %>% 
  filter(statename != "District Of Columbia") -> usast 

Ex. 3

Use the original USArrests data to draw scatterplots of (a) Murder versus UrbanPop, (b) Assault versus UrbanPop, and (c) Rape versus UrbanPop. Save each of these scatterplots by name and then use {patchwork} to create a single canvas that includes all three plots. Make sure you label the x-axis, y-axis, and title each plot.

Ex. 4

Now create {highcharter} versions of each of the three scatterplots you created in (3) above. You should end up with three scatterplots, each on its own canvas.

Ex. 5

Use {leaflet} to create a map that includes a popup for your place of birth. You will need to use Google maps to find the latitude/longitude for this place. The popup should display the name of this place.

Citation

For attribution, please cite this work as

Ruhil (2022, Feb. 16). Maps, Interactive, and Animated Graphics. Retrieved from https://aniruhil.org/courses/mpa6020/handouts/module07.html

BibTeX citation

@misc{ruhil2022maps,,
  author = {Ruhil, Ani},
  title = {Maps, Interactive, and Animated Graphics},
  url = {https://aniruhil.org/courses/mpa6020/handouts/module07.html},
  year = {2022}
}