+ - 0:00:00
Notes for current slide
Notes for next slide

Mapping and Interactive Graphics in R

Ani Ruhil

1 / 35

Agenda

  • building maps in R
  • leaflet maps
  • interactive/animated graphics with highcharter
2 / 35

maps with ggplot2

3 / 35

We need a few libraries to get us started on our way ...

library(ggplot2)
library(ggmap)
library(maps)
library(mapdata)
library(maptools)
library(ggthemes)

How about a county map of the USA? What about Ohio's 88 counties?

map_data("county") -> usa # get basic map data for all USA counties
subset(usa, region == "ohio") -> oh # subset to counties in Ohio
names(oh)
## [1] "long" "lat" "group" "order" "region" "subregion"
  • long = longitude -- measure east-west positions. The prime meridian is assigned the value of 0 degrees, and runs through Greenwich (England). Athens, Ohio has a longitude of -82.101255
  • lat = latitude -- measure north-south position. The equator is defined as 0 degrees, the North Pole as 90 degrees north, and the South Pole as 90 degrees south. Athens, Ohio has a latitude of 39.329240
  • group = an identifier that is unique for each subregion (here the counties)
  • order = an identifier that indicates the order in which the boundary lines should be drawn
  • region = string indicator for regions (here the states)
  • subregion = string indicator for sub-regions (here the county names)
4 / 35
ggplot() +
geom_polygon(data = oh, aes(x = long, y = lat),
fill = "white", color = "black") +
ggtitle("a") # bad
ggplot() +
geom_polygon(data = oh, aes(x = long, y = lat,
group = group), fill = "white", color = "black") +
ggtitle("b") # a slightly better basic map

5 / 35
ggplot() +
geom_polygon(data = oh, aes(x = long, y = lat,
group = group), fill = "white", color = "black") +
coord_fixed(1.3) +
ggtitle("c") # a better map
ggplot() +
geom_polygon(data = oh, aes(x = long, y = lat,
group = group, fill = subregion), color = "black", alpha = 0.3) +
coord_fixed(1.3) +
guides(fill = FALSE) +
ggtitle("d") # a colored map

6 / 35

Labeling the counties

  • to label counties we need to find the centroid of each county and then use the county names
  • county names will have to be formatted into titlecase
  • taking the mean/median of latitude/longitude will not work so we use specific code to find the centroids
library(stringr)
str_to_title(oh$subregion) -> oh$county
library(sp)
getLabelPoint <- # Returns a county-named list of label points
function(county){Polygon(county[c('long', 'lat')])@labpt}
by(oh, oh$county, getLabelPoint) -> centroids # Returns list
do.call("rbind.data.frame", centroids) -> centroids2# Convert to Data Frame
rownames(centroids) -> centroids2$county
names(centroids2) <- c('clong', 'clat', "county") # Appropriate Header
7 / 35

Now the code for the labeled plot ...

ggplot() +
geom_polygon(
data = oh,
aes(x = long, y = lat, group = group),
fill = "white",
color = "gray") +
coord_fixed(1.3) +
geom_text(
data = centroids2,
aes(x = clong, y = clat, label = county),
color = "darkblue",
size = 1) +
theme_map()
8 / 35

... and the plot itself

9 / 35

Using fill

What if we want to fill each county with grouped values of some variable such as population density, percent in poverty, median educational attainment, etc?

(1) find and prepare the variable we want to use
(2) merge this variable with the data used to generate the map
(3) generate the map

library(readxl)
read_excel("data/acpovertyOH.xlsx", sheet = "counties") -> acpovertyOH
c("ranking", "county", "child1216",
"child0711", "all1216", "all0711") -> colnames(acpovertyOH)
merge(oh, acpovertyOH[, c(2:3)], by = "county", all.x = TRUE, sort = FALSE) -> my.df
my.df[order(my.df$order), ] -> my.df
10 / 35

Now the code for the map ...

ggplot() +
geom_polygon(
data = my.df,
aes(x = long, y = lat, group = group, fill = child1216),
color = "black"
) +
coord_fixed(1.3) +
geom_text(
data = centroids2,
aes(x = clong, y = clat, label = county),
color = "black",
size = 2.25
) +
scale_fill_distiller(palette = "Spectral") +
labs(fill = "Child Poverty %") +
theme_map() +
theme(legend.position = "bottom")
11 / 35

... and now the map itself

12 / 35

That isn't a bad map but we could do better, by creating quartiles (4 groups) or quintiles (5 groups) so that it is easier to pinpoint which county falls into what group

library(dplyr)
my.df %>%
mutate(
grouped_poverty = cut(
child1216,
breaks = c(quantile(my.df$child1216,
probs = seq(0, 1, by = 0.2))),
labels = c("0-20", "20-40", "40-60", "60-80", "80-100"),
include.lowest = TRUE)
) -> my.df
ggplot() +
geom_polygon(data = my.df, aes(x = long, y = lat,
group = group, fill = grouped_poverty), color = "black") +
coord_fixed(1.3) +
geom_text(data = centroids2, aes(x = clong,
y = clat, label = county), color = "white", size = 2.25) +
scale_fill_brewer(palette = "Set1", direction = -1) +
labs(fill = "Poverty Quintiles") +
theme_map()

map is on the following slide ...

13 / 35

14 / 35

Using urbnmapr for maps

15 / 35
library(tidyverse)
library(urbnmapr)
states %>%
ggplot(aes(long, lat, group = group)) +
geom_polygon(fill = "grey", color = "#ffffff", size = 0.25) +
coord_map(projection = "albers", lat0 = 39, lat1 = 45)

16 / 35
counties %>%
ggplot(aes(long, lat, group = group)) +
geom_polygon(fill = "grey", color = "#ffffff", size = 0.05) +
coord_map(projection = "albers", lat0 = 39, lat1 = 45)

17 / 35

Mapping with leaflet

18 / 35

leaflet is an easy to learn a JavaScript library that generates interactive maps

library(leaflet)
library(leaflet.extras)
library(widgetframe)
leaflet() %>%
setView(lat = 39.322577, lng = -82.106336, zoom = 14) %>%
addTiles() %>%
setMapWidgetStyle() %>%
frameWidget(width = '1000', height = '320') -> m1
m1
19 / 35

drop a pin on Building 21

leaflet() %>%
setView(lat = 39.322577, lng = -82.106336, zoom = 15) %>%
addMarkers(lat = 39.319984, lng = -82.107084, popup = c("The Ridges, Building 21")) %>%
addTiles() %>%
setMapWidgetStyle() %>%
frameWidget(width = '1000', height = '320') -> m2
m2
  • popup = generates a default marker with specific text
20 / 35

NYC Bike data

Let us map some bike-share stations in New York City. The actual data-frame is large so we draw a random sample of 30 rows with the sample_n() command from dplyr

load("data/citibike.RData")
library(dplyr)
citibike %>%
sample_n(30) -> citibike2
leaflet(data = citibike2, width = "100%") %>%
setView(lat = 40.74, lng = -73.99, zoom = 12) %>%
addTiles() %>%
addMarkers(data = citibike2,
lat = ~start.station.latitude, lng = ~start.station.longitude,
label = ~start.station.id, popup = ~start.station.name) %>%
setMapWidgetStyle() %>%
frameWidget(width = '1000', height = '320') -> m3
m3
21 / 35
  • If you click on a marker you will see the station's name
  • One can do a lot more in terms of customizing the markers but I leave that to you to explore
  • leaflet will do more than just draw markers but unfortunately we do not have time to explore its other features
22 / 35

patchwork ... Multiple graphics on one canvas

23 / 35

Often when you are building a visualization you end up needing to squeeze multiple graphics into a single canvas, like the example below

24 / 35

We can do this in many ways but the easiest library to use might be patchwork

  • load ggplot2 and patchwork (and any other libraries you plan to use for the plots)
  • start by naming each plot; most of us end up naming them p1, p2, and so on (why? because those were the earliest examples on the web)
  • then decide on how you want the plots ... how many do you have? should they be side-by-side? two side-by-side and the third in a row below these two?

Let us create three plots

library(ggplot2)
library(patchwork)
data(mtcars)
ggplot(mtcars, aes(x = factor(am, labels = c("Automatic", "Manual")), y = mpg)) +
geom_boxplot() +
labs(x = "Automatic/Manual", y = "Miles per gallon") -> p1
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point() +
labs(x = "Curb Weight", y = "Miles per gallon") -> p2
ggplot(mtcars, aes(x = qsec, y = mpg)) +
geom_point() +
labs(x = "Quarter Mile times", y = "Miles per gallon") +
facet_wrap(~gear) -> p3
25 / 35
p1 + p2

p1 + p2 + plot_layout(ncol = 1)

plot_layout() has several options, key ones will be

  • ncol, nrow: number of columns/rows
  • byrow: how should the plots be embedded, by filling columns first or by filling rows first
  • widths, heights: relative widths/heights of each column and row in the grid. Will get repeated to match the dimensions of the grid.
26 / 35

fill row 1 with p1, p2, then row 2 with p1, p2

p1 + p2 + p1 + p2 +
plot_layout(ncol = 2, byrow = TRUE)

fill column 1 with p1, p2, then column 2 with p1, p2

p1 + p2 + p1 + p2 +
plot_layout(ncol = 2, byrow = FALSE)

27 / 35
(p1 + (p2 + p3) +
plot_layout(ncol = 1))

(p1 | p2 | p1) / p3

the | specifies vertical layouts and the / specifies horizontal layouts

28 / 35
(p1 + (p2 + p3) + plot_layout(ncol = 1, heights = c(1,2)))

29 / 35

highcharter -- Interactive graphics

30 / 35

highcharter is one of my favorite packages for dynamic plots because it builds them with ease and yet they are visually stunning (see below)

Created with Highcharts 7.0.10% - 2%2% - 4%4% - 6%6% - 8%8% - 10%10% - 50%Highcharts.com © USA Census BureauCopyright (c) 2015 Highsoft AS, Based on data from The United States Census Bureau
31 / 35

Scatterplot

load("data/epa.RData")
epa %>%
filter(year == 2019) %>%
sample_n(100) -> epa2
hchart(epa2, "scatter", hcaes(x = city08, y = highway08, group = make)) -> hc1
frameWidget(hc1, width = 1000, height = 350)
32 / 35

Line chart

load("data/unemprate.RData")
library(lubridate)
year(urate$yearmonth) -> urate$year
urate %>%
group_by(educ_group, year) %>%
summarise(avg.urate = mean(rate, na.rm = TRUE)) -> urate2
hchart(urate2, "line", hcaes(x = year, y = avg.urate, group = educ_group)) -> hc2
frameWidget(hc2, width = 1000, height = 375)
33 / 35

Dressing up the highcharter plot

hchart(urate2, "line", hcaes(x = year, y = avg.urate, group = educ_group)) %>%
hc_title(text = "<span style=\"color:#e88e88\"> Unemployment Rates by Educational Group</span>", useHTML = TRUE) %>%
hc_tooltip(table = TRUE, sort = TRUE, digits = 2) %>%
hc_add_theme(hc_theme_flatdark()) -> hc3
frameWidget(hc3, width = 1050, height = 400)
34 / 35

Agenda

  • building maps in R
  • leaflet maps
  • interactive/animated graphics with highcharter
2 / 35
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow