In this module you will learn how to build maps with {ggplot2} and {leaflet}. There are other packages – see {choroplethr}, {tmap}, and {sf} – that we could use but I will leave it to you to explore these on your own if mapping interests you. {leaflet} and {mapview} are especially fun and useful for creating interactive maps. We will also look at two other packages – {highcharter}, to generate interactive plots/maps, and {gganimate} to animate plots built with ggplot2. We will also see how to use {patchwork} to arrange multiple graphics on a single canvas.
In this module you will learn how to build maps with {ggplot2}
and {ggmap}
, and then with {leaflet}
. There are other packages – see {choroplethr}
, {tmap}
, and {sf}
– that we could use but I will leave it to you to explore these on your own if mapping interests you. {leaflet}
is especially fun and useful for creating interactive maps. We will also look at two other packages – {highcharter}
, to generate interactive plots/maps, and {gganimate}
to animate plots built with {ggplot2}
. We will also see how to use {patchwork}
to arrange multiple graphics on a single canvas.
{ggplot2}
and {ggmap}
Having worked with {ggplot2}
in the previous module we might as well stick to it since the coding syntax will be relatively familiar. Start by loading the following packages; remember to install them if you get a “package not found” error.
library(ggplot2)
library(ggmap)
library(maps)
library(mapdata)
library(ggthemes)
library(sf)
library(terra)
It is pretty easy to build a state or county map. I’ll focus on building a county-level map of the USA, just to show you how easy it is to build one. Then we can focus on Ohio per se. The code below shows you how to download the dataframe for all ccounties and then how to subset it to counties in Ohio.
map_data("county") -> usa # get basic map data for all USA counties
subset(usa, region == "ohio") -> oh # subset to counties in Ohio
names(oh)
[1] "long" "lat" "group" "order" "region"
[6] "subregion"
You see six columns/variables in the usa
data-frame. Pay attention to the contents of each, described below:
Parameter | Description |
---|---|
long |
longitude, a measure of east-west position. The prime meridian is assigned the value of 0 degrees, and runs through Greenwich (England). Athens, Ohio has a longitude of -82.101255 |
lat |
latitude, a measure of north-south position. The equator is defined as 0 degrees, the North Pole as 90 degrees north, and the South Pole as 90 degrees south. Athens, Ohio has a latitude of 39.329240 |
group |
an identifier that is unique for each subregion (here the counties) |
order |
an identifier that indicates the order in which the boundary lines should be drawn |
region |
string indicator for regions (here the states) |
subregion |
string indicator for sub-regions (here the county names) |
At this point, go ahead and open google maps in your browser and search for two places that have some meaning, some value to you – hometown, your actual residence, favorite vacation spot, and so on. When you find this place on the map, right-click
and try to copy and paste the latitude/longitude you see reported for each place. Save these latitudes/longitudes since you will need them for some mapping exercises at the end of this module.
The actual map can be built as shown below.
ggplot() +
geom_polygon(
data = usa,
aes(x = long, y = lat),
fill = "white",
color = "black"
) +
coord_fixed(1.3) +
labs(
title = "A Disastrous Map",
subtitle = "(County borders are all messed up)"
)
But this is not very good because the county borders are incorrectly drawn. Why is that? Because we forgot to specify the grouping structure for drawing these lines (i.e., should all the latitudes/longitudes we see for each county be connected in order
by county or should they be connected just by order
?). Yes of course, they should connected in order by county
, and this can be specified via the group =
command, resulting in the county borders being drawn correctly.
ggplot() +
geom_polygon(
data = usa,
aes(x = long, y = lat, group = group),
fill = "white",
color = "black"
) +
coord_fixed(1.3) +
labs(
title = "A Better Map",
subtitle = "(County borders seem okay)"
)
We could improve upon this map by filling in each county; there are 3000+ counties so the colors will not be unique but that is okay for now. I have also switched the county borders to be inked in white.
ggplot() +
geom_polygon(
data = usa,
aes(x = long, y = lat, group = group),
fill = usa$group,
color = "white"
) +
coord_fixed(1.3) +
labs(
title = "An Even Better Map",
subtitle = "(Counties are filled in with a color)"
)
What if we want to also draw the state borders?
map_data("state") -> usast
ggplot() +
geom_polygon(
data = usa,
aes(x = long, y = lat, group = group, fill = region),
color = "black"
) +
geom_polygon(
data = usast,
aes(x = long, y = lat, group = group),
fill = NA,
color = "blue"
) +
coord_fixed(1.3) +
labs(
title = "An Even Better Map",
subtitle = "(Counties are filled in by a common color for each state)"
) +
theme(legend.position = "none")
Now, you could also do this with another package, {urbnmapr}
so let us see it very briefly. Here is a map with state borders and then with county borders. Note that Alaska and Hawaii are visible in these maps but were conspicuously absent from the earlier maps drawn via {ggplot2}
and {ggmap}
. Note that you may have to install it with the command shown below:
remotes::install_github(“UrbanInstitute/urbnmapr”)
library(tidyverse)
library(urbnmapr)
states %>%
ggplot(aes(long, lat, group = group)) +
geom_polygon(
fill = "grey", color = "black", linewidth = 0.25
) +
coord_map(
projection = "albers", lat0 = 39, lat1 = 45
) +
labs(title = "Map of the 50 States")
counties %>%
ggplot(aes(long, lat, group = group)) +
geom_polygon(
fill = "grey", color = "darkred", linewidth = 0.05
) +
coord_map(
projection = "albers", lat0 = 39, lat1 = 45
) +
labs(title = "Map of States and Counties")
If you want to learn more about the {urbanmapr}
package you should read about it here
Okay, so much for basic maps. Now we drill down to Ohio counties alone and see how we can label them, add points to them, and then also use a color scheme to color them on the basis of some measure such as percent of children in poverty in the county, median household income, and so on.
In order to label counties we will need to find the center of each county and then use the county names to show up at this latitude/longitude. Note that taking the mean/median of latitude/longitude will not work so we use specific code to find the centroids
. Unfortunately, county names will have to be formatted into titlecase
since they show up as subregion
with all lowercase characters.
We start by leaning on the {stringr}
package to correct how names will be displayed. We then calculate the mid-point of each county and call the result centroids2
.
library(stringr)
str_to_title(oh$subregion) -> oh$county # Create a new variable called county with the correct format
library(sp)
getLabelPoint <- # Returns a county-named list of label points
function(county){Polygon(county[c('long', 'lat')])@labpt}
centroids = by(oh, oh$county, getLabelPoint) # Returns list
centroids2 <- do.call("rbind.data.frame", centroids) # Convert to Data Frame
centroids2$county = rownames(centroids)
names(centroids2) <- c('clong', 'clat', "county") # Appropriate Header
Now we are ready to plot …
ggplot() +
geom_polygon(
data = oh,
aes(x = long, y = lat, group = group),
fill = "white",
color = "gray"
) +
coord_fixed(1.3) +
geom_text(
data = centroids2,
aes(x = clong, y = clat, label = county),
color = "darkblue",
size = 2.25
) +
theme_map()
Note that geom_text()
is used to add the county names to the map. The color =
and size =
commands are tweaking the font color and size. If I relied on geom_label()
I would have a less effective map with needless borders around each county name.
ggplot() +
geom_polygon(
data = oh,
aes(x = long, y = lat, group = group),
fill = "white",
color = "gray"
) +
coord_fixed(1.3) +
geom_label(
data = centroids2,
aes(x = clong, y = clat, label = county),
color = "darkblue",
size = 2.25
) +
theme_map()
I would also like to highlight where Ohio University’s Athens and Lancaster campuses are located. I know their latitudes/longitudes – Athens is 39.324391, -82.101443 and Lancaster is 39.738743, -82.586373. Let me use a red dot for each.
data.frame(
place = c("Athens", "Lancaster"),
lat = c(39.324391, 39.738743),
long = c(-82.101443, -82.586373)
) -> campuses
ggplot() +
geom_polygon(
data = oh,
aes(x = long, y = lat, group = group),
fill = "white",
color = "gray"
) +
coord_fixed(1.3) +
geom_text(
data = centroids2,
aes(x = clong, y = clat, label = county),
color = "darkblue", size = 2.25
) +
geom_point(
data = campuses,
aes(x = long, y = lat), color = "red"
) +
theme_map()
Now, often you will see maps that use a color scheme to represent some spatial distribution, median household income, poverty, and so on. Well, we will build one such map by using the percent of children in poverty in each county in Ohio. The data were sent to you via slack so make sure the file is in your data folder. We will read it in and then build the map.
library(readxl)
library(here)
read_excel(
here(
"data",
"acpovertyOH.xlsx"
),
sheet = "counties"
) -> acpovertyOH
c(
"ranking", "county", "child1216", "child0711", "all1216", "all0711"
) -> colnames(acpovertyOH)
Now we merge this file with oh
, noting that the merge key will be county
since both files have this variable.
merge(
oh,
acpovertyOH[, c(2:3)],
by = "county",
all.x = TRUE,
sort = FALSE
) -> my.df
my.df[order(my.df$order), ] -> my.df
merge(
oh,
acpovertyOH,
by = "county",
all.x = TRUE,
sort = FALSE
) -> my.df2
my.df2[order(my.df2$order), ] -> my.df2
Now we start building the map.
ggplot() +
geom_polygon(
data = my.df,
aes(x = long, y = lat, group = group, fill = child1216),
color = "black"
) +
coord_fixed(1.3) +
geom_text(
data = centroids2,
aes(x = clong, y = clat, label = county),
color = "black",
size = 2.25
) +
scale_fill_distiller(palette = "Spectral") +
labs(fill = "Child Poverty %") +
theme_map() +
theme(legend.position = "bottom")
That isn’t a bad map but we could do better, by creating quartiles
(4 groups) or quintiles
(5 groups) so that it is easier to pinpoint which county falls into the top 25%, bottom 20%, and so on of whatever measure it is that we grouped. The code below shows you how to generate the quintiles
and map with them.
my.df %>%
mutate(
grouped_poverty = cut(
child1216,
breaks = c(quantile(my.df$child1216, probs = seq(0, 1, by = 0.2))),
labels = c("0-20", "20-40", "40-60", "60-80", "80-100"),
include.lowest = TRUE
)
) -> my.df
ggplot() +
geom_polygon(
data = my.df,
aes(x = long, y = lat, group = group, fill = grouped_poverty),
color = "black"
) +
coord_fixed(1.3) +
geom_text(
data = centroids2,
aes(x = clong, y = clat, label = county),
color = "white",
size = 2.25
) +
scale_fill_brewer(palette = "Set1", direction = -1) +
labs(fill = "Poverty Quintiles") +
theme_map() +
theme(legend.position = "bottom")
What if we wanted only three groups, say the top one-third, middle one-third, and then the lowest one-third?
my.df %>%
mutate(
grouped_poverty = cut(
child1216, breaks = c(quantile(
my.df$child1216,
probs = seq(0, 1, by = 1/3))),
labels = c("Bottom-third", "Middle-third", "Top-third"),
include.lowest = TRUE
)
) -> my.df
ggplot() +
geom_polygon(
data = my.df,
aes(x = long, y = lat, group = group, fill = grouped_poverty),
color = "black"
) +
coord_fixed(1.3) +
geom_text(
data = centroids2,
aes(x = clong, y = clat, label = county),
color = "white",
size = 2.25
) +
scale_fill_brewer(palette = "Set1", direction = -1) +
labs(fill = "Poverty Groups") +
theme_map() +
theme(legend.position = "bottom")
Now, all of these maps have been limited to just the 48 mainland states; Alaska and Hawaii are absent. Can we fold these in? Yes we can, with the {fiftystater}
package.
remotes::install_github(“wmurphyrd/fiftystater”)
library(fiftystater)
data("fifty_states")
names(fifty_states)
[1] "long" "lat" "order" "hole" "piece" "id" "group"
ggplot() +
geom_polygon(
data = fifty_states,
aes(x = long, y = lat, group = group, fill = id),
color = "white"
) +
theme_map() +
theme(legend.position = "none")
{leaflet}
maps{leaflet}
is an easy to learn a JavaScript library that generates interactive maps.It is fun and the possibilities are endless. You will need three libraries so make sure you install {leaflet}
, {leaflet.extras}
, and {widgetframe}
. The basic command structure is as follows, you call leaflet via leaflet()
and specify the latitude/longitude to be used to center the map, and the zoom factor to be applied. The higher the zoom number the more zoomed-in the view will be and the smaller the zoom number the more zoomed-out the view will be. The addTiles()
command adds default tiles but you can tweak this (see the book chapter for examples). The setMapWidgetStyle()
and the other commands that follow allow you to customize how the map will look in the knitted html document.
The one that shows up has been centered around Athens, Ohio and is using the default map-tiles. Note that you can zoom in/out witht he map, as well as move the map around so that you end up in some other place.
library(leaflet)
library(leaflet.extras)
library(widgetframe)
leaflet() %>%
setView(lat = 39.322577, lng = -82.106336, zoom = 14) %>%
addTiles() %>%
setMapWidgetStyle() %>%
frameWidget(width = '1200', height = '500') -> m1
m1
So far so good, now how about dropping a pin on Building 21 on The Ridges, the main administrative building of the Voinovich School? This is done with addMarkers()
, with the popup = c()
switch indicating what text should be displayedif the popup is clicked.
leaflet() %>%
setView(lat = 39.322577, lng = -82.106336, zoom = 15) %>%
addMarkers(
lat = 39.319984, lng = -82.107084,
popup = c("The Ridges, Building 21")
) %>%
addTiles() %>%
setMapWidgetStyle() %>%
frameWidget(width = '1200', height = '500') -> m2
m2
This, in a nutshell, is the basic setup of a leaflet map. There is tons more you could do so if interested, check out the several examples out there on the web, starting with this documentation. For now we see a few more extensions. Here, for instance, is how one might take a large data-set and display specific features. In particular, let us map some bike-share stations in New York City. The actual data-frame is rather large so we draw a random sample of 30 rows with the sample_n()
command from {dplyr}
. The map will show pop-ups that, if clicked, will display the location of the bike-share station.
load(
here::here(
"data",
"citibike.RData"
)
)
citibike %>%
sample_n(30) -> citibike2
leaflet(data = citibike2, width = "100%") %>%
setView(lat = 40.74, lng = -73.99, zoom = 11) %>%
addTiles() %>%
addMarkers(
data = citibike2, lat = ~start.station.latitude,
lng = ~start.station.longitude,
label = ~start.station.id,
popup = ~start.station.name
) %>%
setMapWidgetStyle() %>%
frameWidget(width = '1200', height = '500') -> m3
m3
{patchwork}
When you are building a visualization you often end up needing to squeeze multiple graphics into a single canvas. There are several ways to do it in R but I am showing you what may be the easiest way to do it – with {patchwork}
. You may have to install it via {devtools}
as shown below:
devtools::install_github(“thomasp85/patchwork”)
Start by loading {patchwork}
and {ggplot2}
(and any other libraries you plan to use for the plots). Start by naming each plot; most of us end up naming them p1
, p2
, and so on (why? because those were the earliest examples on the web). Then decide on how you want the plots to show up. That is, how many plots do you have? Should they be side-by-side? If you have an odd-number of plots, should two be side-by-side and the third in a row below these two? Or the other way around? What plot should come first? What plot should be last? Let us create three plots and see how the package works.
library(patchwork)
data(mtcars)
ggplot(
mtcars,
aes(
x = factor(am, labels = c("Automatic", "Manual")),
y = mpg)
) +
geom_boxplot() +
labs(x = "Automatic/Manual", y = "Miles per gallon") -> p1
ggplot(
mtcars,
aes(x = wt, y = mpg)
) +
geom_point() +
labs(x = "Curb Weight", y = "Miles per gallon") -> p2
ggplot(
mtcars,
aes(x = qsec, y = mpg)
) +
geom_point() +
labs(x = "Quarter Mile times", y = "Miles per gallon") +
facet_wrap(~gear) -> p3
Say I want two plots, each in its own column.
p1 + p2
Hmm, maybe one per row?
p1 + p2 + plot_layout(nrow = 2)
p1 + p2 + plot_layout(ncol = 1)
You see plot_layout()
used in the second plot command. Ithas several options, the key ones being
ncol
, nrow
: number of columns/rowsbyrow
: how should the plots be embedded, by filling columns first or by filling rows first?widths
, heights
: relative widths/heights of each column and row in the grid. Will get repeated to match the dimensions of the grid.Say I want to fill row 1 with p1, p2, then row 2 with p1, p2
p1 + p2 + p1 + p2 + plot_layout(ncol = 2, byrow = TRUE)
What if I want to fill column 1 with p1, p2, then column 2 with p1, p2?
p1 + p2 + p1 + p2 + plot_layout(ncol = 2, byrow = FALSE)
I can also use ()
to group sub-plots.
(p1 + (p2 + p3) + plot_layout(ncol = 1))
((p2 + p3) + p1 + plot_layout(ncol = 1)) # Not the same thing!
Note that the |
specifies vertical layouts and the /
specifies horizontal layouts
(p1 | p2 | p1) / p3
and then one can specify the heights/widths of each plot.
(p1 + (p2 + p3) + plot_layout(ncol = 1, heights = c(1,2)))
(p1 + (p2 + p3) + plot_layout(ncol = 1, heights = c(2,1)))
You can explore other settings here and if you want to see another package that tries to achieve similar results, explore cowplot
here.
{highcharter}
highcharter
is one of my favorite packages for dynamic plots because it builds them with ease and yet they are visually stunning (see below). This is advanced material so be warned.
The first plot is a heatmap using unemployment rates (value
) in counties (code
in countries/us/us-all-all
).
library(highcharter)
data(unemployment)
hcmap(
map = "countries/us/us-all-all",
data = unemployment,
name = "Unemployment",
value = "value",
joinBy = c("hc-key", "code"),
borderColor = "transparent"
) %>%
hc_colorAxis(
dataClasses = color_classes(c(seq(0, 10, by = 2), 50))
) %>%
hc_legend(
layout = "vertical",
align = "right",
floating = TRUE,
valueDecimals = 0,
valueSuffix = "%"
)
Here is a scatterplot
built with the epa.RData
and keeping only 100 randomly sampled observations (to keep things manageable). Notice the specification scatter
for chart-type, and the hcaes()
.
load(
here::here(
"data",
"epa.RData"
)
)
epa %>%
filter(year == 2019) %>%
sample_n(100) -> epa2
hchart(
epa2,
"scatter",
hcaes(x = city08, y = highway08, group = make)
)
Here is a line chart
using the unemployment rate data. I am extracting year
so I can use it for the x-axis, and then calculating the average unemployment rate by year and education group (educ_group)
.
load(
here::here(
"data",
"urate.RData"
)
)
year(urate$yearmonth) -> urate$year
urate %>%
group_by(educ_group, year) %>%
summarise(
avg.urate = mean(rate, na.rm = TRUE)
) -> urate2
hchart(
urate2,
"line",
hcaes(x = year,
y = avg.urate,
group = educ_group)
) %>%
hc_title(
text = "<span style=\"color:#e88e88\"> Unemployment Rates by Educational Group</span>",
useHTML = TRUE
) %>%
hc_xAxis(
title = list(text = "Year")
) %>%
hc_yAxis(
title = list(text = "Average Unemployment Rate (%)")
) %>%
hc_tooltip(
table = TRUE,
sort = TRUE,
valueDecimals = 1,
valueSuffix = "%"
)
And then dressing up the highcharter plot with themes and some customization.
hchart(
urate2,
"line",
hcaes(
x = year, y = avg.urate, group = educ_group)
) %>%
hc_title(
text = "<span style=\"color:#e88e88\"> Unemployment Rates by Educational Group</span>",
useHTML = TRUE
) %>%
hc_xAxis(
title = list(text = "Year")
) %>%
hc_yAxis(
title = list(text = "Average Unemployment Rate (%)")
) %>%
hc_tooltip(
table = TRUE,
sort = TRUE,
valueDecimals = 1,
valueSuffix = "%"
) %>%
hc_add_theme(hc_theme_flatdark())
{gganimate}
This is advanced material too so you have been twice warned! The {gganimate}
library can be tricky to run without errors and hiccups because it needs other packages to be installed and configured; see here for details. To complicate matters, {gganimate}
is being completely overhauled and the new version should be released in the next few weeks so be sure to check its documentation available here.
library(gapminder)
library(gganimate)
ggplot(
gapminder,
aes(gdpPercap, lifeExp, size = pop, colour = country)
) +
geom_point(alpha = 0.7, show.legend = FALSE) +
scale_colour_manual(values = country_colors) +
scale_size(range = c(2, 12)) +
scale_x_log10() +
facet_wrap(~continent, ncol = 5) +
labs(
title = 'Year: {frame_time}',
x = 'GDP per capita',
y = 'life expectancy'
) +
transition_time(year) +
ease_aes('linear') -> p1
animate(
p1,
enderer = ffmpeg_renderer(
format = "mp4"
)
)