In this module you will learn how to build maps with {ggplot2} and {leaflet}. There are other packages – see {choroplethr}, {tmap}, and {sf} – that we could use but I will leave it to you to explore these on your own if mapping interests you. {leaflet} and {mapview} are especially fun and useful for creating interactive maps. We will also look at two other packages – {highcharter}, to generate interactive plots/maps, and {gganimate} to animate plots built with ggplot2. We will also see how to use {patchwork} to arrange multiple graphics on a single canvas.
In this module you will learn how to build maps with
{ggplot2}
and {ggmap}
, and then with
{leaflet}
. There are other packages – see
{choroplethr}
, {tmap}
, and {sf}
–
that we could use but I will leave it to you to explore these on your
own if mapping interests you. {leaflet}
is especially fun
and useful for creating interactive maps. We will also look at two other
packages – {highcharter}
, to generate interactive
plots/maps, and {gganimate}
to animate plots built with
{ggplot2}
. We will also see how to use
{patchwork}
to arrange multiple graphics on a single
canvas.
{ggplot2}
and {ggmap}
Having worked with {ggplot2}
in the previous module we
might as well stick to it since the coding syntax will be relatively
familiar. Start by loading the following packages; remember to install
them if you get a “package not found” error.
library(ggplot2)
library(ggmap)
library(maps)
library(mapdata)
library(maptools)
library(ggthemes)
It is pretty easy to build a state or county map. I’ll focus on building a county-level map of the USA, just to show you how easy it is to build one. Then we can focus on Ohio per se. The code below shows you how to download the dataframe for all ccounties and then how to subset it to counties in Ohio.
map_data("county") -> usa # get basic map data for all USA counties
subset(usa, region == "ohio") -> oh # subset to counties in Ohio
names(oh)
[1] "long" "lat" "group" "order" "region"
[6] "subregion"
You see six columns/variables in the usa
data-frame. Pay
attention to the contents of each, described below:
Parameter | Description |
---|---|
long |
longitude, a measure of
east-west position. The prime meridian is assigned the
value of 0 degrees, and runs through Greenwich (England). Athens, Ohio
has a longitude of -82.101255 |
lat |
latitude, a measure of
north-south position. The equator is defined as 0 degrees,
the North Pole as 90 degrees north, and the South Pole as 90 degrees
south. Athens, Ohio has a latitude of 39.329240 |
group |
an identifier that is unique for each subregion (here the counties) |
order |
an identifier that indicates the
order in which the boundary lines should be drawn |
region |
string indicator for regions
(here the states) |
subregion |
string indicator for
sub-regions (here the county names) |
At this point, go ahead and open google maps in your browser and
search for two places that have some meaning, some
value to you – hometown, your actual residence, favorite vacation spot,
and so on. When you find this place on the map, right-click
and try to copy and paste the latitude/longitude you see reported for
each place. Save these latitudes/longitudes since you will need them for
some mapping exercises at the end of this module.
The actual map can be built as shown below.
ggplot() +
geom_polygon(
data = usa,
aes(x = long, y = lat),
fill = "white",
color = "black"
+
) coord_fixed(1.3) +
labs(
title = "A Disastrous Map",
subtitle = "(County borders are all messed up)"
)
But this is not very good because the county borders are incorrectly
drawn. Why is that? Because we forgot to specify the grouping structure
for drawing these lines (i.e., should all the latitudes/longitudes we
see for each county be connected in order
by county or
should they be connected just by order
?). Yes of course,
they should connected in order by county
, and this can be
specified via the group =
command, resulting in the county
borders being drawn correctly.
ggplot() +
geom_polygon(
data = usa,
aes(x = long, y = lat, group = group),
fill = "white",
color = "black"
+
) coord_fixed(1.3) +
labs(
title = "A Better Map",
subtitle = "(County borders seem okay)"
)
We could improve upon this map by filling in each county; there are 3000+ counties so the colors will not be unique but that is okay for now. I have also switched the county borders to be inked in white.
ggplot() +
geom_polygon(
data = usa,
aes(x = long, y = lat, group = group),
fill = usa$group,
color = "white"
+
) coord_fixed(1.3) +
labs(
title = "An Even Better Map",
subtitle = "(Counties are filled in with a color)"
)
What if we want to also draw the state borders?
map_data("state") -> usast
ggplot() +
geom_polygon(
data = usa,
aes(x = long, y = lat, group = group, fill = region),
color = "black"
+
) geom_polygon(
data = usast,
aes(x = long, y = lat, group = group),
fill = NA,
color = "blue"
+
) coord_fixed(1.3) +
labs(
title = "An Even Better Map",
subtitle = "(Counties are filled in by a common color for each state)"
+
) theme(legend.position = "none")
Now, you could also do this with another package,
{urbnmapr}
so let us see it very briefly. Here is a map
with state borders and then with county borders. Note that Alaska and
Hawaii are visible in these maps but were conspicuously absent from the
earlier maps drawn via {ggplot2}
and {ggmap}
.
Note that you may have to install it with the command shown below:
devtools::install_github(“UrbanInstitute/urbnmapr”
library(tidyverse)
library(urbnmapr)
%>%
states ggplot(aes(long, lat, group = group)) +
geom_polygon(
fill = "grey", color = "black", size = 0.25
+
) coord_map(
projection = "albers", lat0 = 39, lat1 = 45
+
) labs(title = "Map of the 50 States")
%>%
counties ggplot(aes(long, lat, group = group)) +
geom_polygon(
fill = "grey", color = "darkred", size = 0.05
+
) coord_map(
projection = "albers", lat0 = 39, lat1 = 45
+
) labs(title = "Map of States and Counties")
If you want to learn more about the {urbanmapr}
package
you should read
about it here
Okay, so much for basic maps. Now we drill down to Ohio counties alone and see how we can label them, add points to them, and then also use a color scheme to color them on the basis of some measure such as percent of children in poverty in the county, median household income, and so on.
In order to label counties we will need to find the center of each
county and then use the county names to show up at this
latitude/longitude. Note that taking the mean/median of
latitude/longitude will not work so we use specific code to find the
centroids
. Unfortunately, county names will have to be
formatted into titlecase
since they show up as
subregion
with all lowercase characters.
We start by leaning on the {stringr}
package to correct
how names will be displayed. We then calculate the mid-point of each
county and call the result centroids2
.
library(stringr)
str_to_title(oh$subregion) -> oh$county # Create a new variable called county with the correct format
library(sp)
<- # Returns a county-named list of label points
getLabelPoint function(county){Polygon(county[c('long', 'lat')])@labpt}
= by(oh, oh$county, getLabelPoint) # Returns list
centroids <- do.call("rbind.data.frame", centroids) # Convert to Data Frame
centroids2 $county = rownames(centroids)
centroids2names(centroids2) <- c('clong', 'clat', "county") # Appropriate Header
Now we are ready to plot …
ggplot() +
geom_polygon(
data = oh,
aes(x = long, y = lat, group = group),
fill = "white",
color = "gray"
+
) coord_fixed(1.3) +
geom_text(
data = centroids2,
aes(x = clong, y = clat, label = county),
color = "darkblue",
size = 2.25
+
) theme_map()
Note that geom_text()
is used to add the county names to
the map. The color =
and size =
commands are
tweaking the font color and size. If I relied on
geom_label()
I would have a less effective map with
needless borders around each county name.
ggplot() +
geom_polygon(
data = oh,
aes(x = long, y = lat, group = group),
fill = "white",
color = "gray"
+
) coord_fixed(1.3) +
geom_label(
data = centroids2,
aes(x = clong, y = clat, label = county),
color = "darkblue",
size = 2.25
+
) theme_map()
I would also like to highlight where Ohio University’s Athens and Lancaster campuses are located. I know their latitudes/longitudes – Athens is 39.324391, -82.101443 and Lancaster is 39.738743, -82.586373. Let me use a red dot for each.
data.frame(
place = c("Athens", "Lancaster"),
lat = c(39.324391, 39.738743),
long = c(-82.101443, -82.586373)
-> campuses
)
ggplot() +
geom_polygon(
data = oh,
aes(x = long, y = lat, group = group),
fill = "white",
color = "gray"
+
) coord_fixed(1.3) +
geom_text(
data = centroids2,
aes(x = clong, y = clat, label = county),
color = "darkblue", size = 2.25
+
) geom_point(
data = campuses,
aes(x = long, y = lat), color = "red"
+
) theme_map()
Now, often you will see maps that use a color scheme to represent some spatial distribution, median household income, poverty, and so on. Well, we will build one such map by using the percent of children in poverty in each county in Ohio. The data were sent to you via slack so make sure the file is in your data folder. We will read it in and then build the map.
library(readxl)
read_excel(
"~/Documents/Teaching/mpa5830/data/acpovertyOH.xlsx",
sheet = "counties"
-> acpovertyOH
)
c(
"ranking", "county", "child1216", "child0711", "all1216", "all0711"
-> colnames(acpovertyOH) )
Now we merge this file with oh
, noting that the merge
key will be county
since both files have this variable.
merge(
oh, c(2:3)],
acpovertyOH[, by = "county",
all.x = TRUE,
sort = FALSE
-> my.df
)
order(my.df$order), ] -> my.df
my.df[
merge(
oh,
acpovertyOH, by = "county",
all.x = TRUE,
sort = FALSE
-> my.df2
)
order(my.df2$order), ] -> my.df2 my.df2[
Now we start building the map.
ggplot() +
geom_polygon(
data = my.df,
aes(x = long, y = lat, group = group, fill = child1216),
color = "black"
+
) coord_fixed(1.3) +
geom_text(
data = centroids2,
aes(x = clong, y = clat, label = county),
color = "black",
size = 2.25
+
) scale_fill_distiller(palette = "Spectral") +
labs(fill = "Child Poverty %") +
theme_map() +
theme(legend.position = "bottom")
That isn’t a bad map but we could do better, by creating quartiles (4
groups) or quintiles (5 groups) so that it is easier to pinpoint which
county falls into the top 25%, bottom 20%, and so on of whatever measure
it is that we grouped. The code below shows you how to generate the
quintiles
and map with them.
library(dplyr)
%>%
my.df mutate(
grouped_poverty = cut(
child1216, breaks = c(quantile(my.df$child1216, probs = seq(0, 1, by = 0.2))),
labels = c("0-20", "20-40", "40-60", "60-80", "80-100"),
include.lowest = TRUE
)-> my.df
)
ggplot() +
geom_polygon(
data = my.df,
aes(x = long, y = lat, group = group, fill = grouped_poverty),
color = "black"
+
) coord_fixed(1.3) +
geom_text(
data = centroids2,
aes(x = clong, y = clat, label = county),
color = "white",
size = 2.25
+
) scale_fill_brewer(palette = "Set1", direction = -1) +
labs(fill = "Poverty Quintiles") +
theme_map() +
theme(legend.position = "bottom")
What if we wanted only three groups, say the top one-third, middle one-third, and then the lowest one-third?
%>%
my.df mutate(
grouped_poverty = cut(
breaks = c(quantile(
child1216, $child1216,
my.dfprobs = seq(0, 1, by = 1/3))),
labels = c("Bottom-third", "Middle-third", "Top-third"),
include.lowest = TRUE
)-> my.df
)
ggplot() +
geom_polygon(
data = my.df,
aes(x = long, y = lat, group = group, fill = grouped_poverty),
color = "black"
+
) coord_fixed(1.3) +
geom_text(
data = centroids2,
aes(x = clong, y = clat, label = county),
color = "white",
size = 2.25
+
) scale_fill_brewer(palette = "Set1", direction = -1) +
labs(fill = "Poverty Groups") +
theme_map() +
theme(legend.position = "bottom")
Now, all of these maps have been limited to just the 48 mainland
states; Alaska and Hawaii are absent. Can we fold these in? Yes we can,
with the {fiftystater}
package.
::install_github("wmurphyrd/fiftystater")
devtools
library(fiftystater)
data("fifty_states")
names(fifty_states)
[1] "long" "lat" "order" "hole" "piece" "id" "group"
ggplot() +
geom_polygon(
data = fifty_states,
aes(x = long, y = lat, group = group, fill = id),
color = "white"
+
) theme_map() +
theme(legend.position = "none")
{leaflet}
maps{leaflet}
is an easy to learn a JavaScript library that
generates interactive maps.It is fun and the possibilities are endless.
You will need three libraries so make sure you install
{leaflet}
, {leaflet.extras}
, and
{widgetframe}
. The basic command structure is as follows,
you call leaflet via leaflet()
and specify the
latitude/longitude to be used to center the map, and the zoom factor to
be applied. The higher the zoom number the more zoomed-in the view will
be and the smaller the zoom number the more zoomed-out the view will be.
The addTiles()
command adds default tiles but you can tweak
this (see the book chapter for examples). The
setMapWidgetStyle()
and the other commands that follow
allow you to customize how the map will look in the knitted html
document.
The one that shows up has been centered around Athens, Ohio and is using the default map-tiles. Note that you can zoom in/out witht he map, as well as move the map around so that you end up in some other place.
library(leaflet)
library(leaflet.extras)
library(widgetframe)
leaflet() %>%
setView(lat = 39.322577, lng = -82.106336, zoom = 14) %>%
addTiles() %>%
setMapWidgetStyle() %>%
frameWidget(width = '1200', height = '500') -> m1
m1
So far so good, now how about dropping a pin on Building 21 on The
Ridges, the main administrative building of the Voinovich School? This
is done with addMarkers()
, with the
popup = c()
switch indicating what text should be
displayedif the popup is clicked.
leaflet() %>%
setView(lat = 39.322577, lng = -82.106336, zoom = 15) %>%
addMarkers(
lat = 39.319984, lng = -82.107084,
popup = c("The Ridges, Building 21")
%>%
) addTiles() %>%
setMapWidgetStyle() %>%
frameWidget(width = '1200', height = '500') -> m2
m2
This, in a nutshell, is the basic setup of a leaflet map. There is
tons more you could do so if interested, check out the several examples
out there on the web, starting with this documentation. For now
we see a few more extensions. Here, for instance, is how one might take
a large data-set and display specific features. In particular, let us
map some bike-share stations in New York City. The actual data-frame is
rather large so we draw a random sample of 30 rows with the
sample_n()
command from {dplyr}
. The map will
show pop-ups that, if clicked, will display the location of the
bike-share station.
load(
::here("data", "citibike.RData")
here
)
%>%
citibike sample_n(30) -> citibike2
leaflet(data = citibike2, width = "100%") %>%
setView(lat = 40.74, lng = -73.99, zoom = 11) %>%
addTiles() %>%
addMarkers(
data = citibike2, lat = ~start.station.latitude,
lng = ~start.station.longitude,
label = ~start.station.id,
popup = ~start.station.name
%>%
) setMapWidgetStyle() %>%
frameWidget(width = '1200', height = '500') -> m3
m3
{patchwork}
When you are building a visualization you often end up needing to
squeeze multiple graphics into a single canvas. There are several ways
to do it in R but I am showing you what may be the easiest way to do it
– with {patchwork}
. You may have to install it via
{devtools}
as shown below:
devtools::install_github(“thomasp85/patchwork”)
Start by loading {patchwork}
and {ggplot2}
(and any other libraries you plan to use for the plots). Start by naming
each plot; most of us end up naming them p1
,
p2
, and so on (why? because those were the earliest
examples on the web). Then decide on how you want the plots to show up.
That is, how many plots do you have? Should they be side-by-side? If you
have an odd-number of plots, should two be side-by-side and the third in
a row below these two? Or the other way around? What plot should come
first? What plot should be last? Let us create three plots and see how
the package works.
library(ggplot2)
library(patchwork)
data(mtcars)
ggplot(
mtcars, aes(
x = factor(am, labels = c("Automatic", "Manual")),
y = mpg)
+
) geom_boxplot() +
labs(x = "Automatic/Manual", y = "Miles per gallon") -> p1
ggplot(
mtcars, aes(x = wt, y = mpg)
+
) geom_point() +
labs(x = "Curb Weight", y = "Miles per gallon") -> p2
ggplot(
mtcars, aes(x = qsec, y = mpg)
+
) geom_point() +
labs(x = "Quarter Mile times", y = "Miles per gallon") +
facet_wrap(~gear) -> p3
Say I want two plots, each in its own column.
+ p2 p1
Hmm, maybe one per row?
+ p2 + plot_layout(nrow = 2) p1
+ p2 + plot_layout(ncol = 1) p1
You see plot_layout()
used in the second plot command.
Ithas several options, the key ones being
ncol
, nrow
: number of columns/rowsbyrow
: how should the plots be embedded, by filling
columns first or by filling rows first?widths
, heights
: relative widths/heights
of each column and row in the grid. Will get repeated to match the
dimensions of the grid.Say I want to fill row 1 with p1, p2, then row 2 with p1, p2
+ p2 + p1 + p2 + plot_layout(ncol = 2, byrow = TRUE) p1
What if I want to fill column 1 with p1, p2, then column 2 with p1, p2?
+ p2 + p1 + p2 + plot_layout(ncol = 2, byrow = FALSE) p1
I can also use ()
to group sub-plots.
+ (p2 + p3) + plot_layout(ncol = 1)) (p1
+ p3) + p1 + plot_layout(ncol = 1)) # Not the same thing! ((p2
Note that the |
specifies vertical layouts and the
/
specifies horizontal layouts
| p2 | p1) / p3 (p1
and then one can specify the heights/widths of each plot.
+ (p2 + p3) + plot_layout(ncol = 1, heights = c(1,2))) (p1
+ (p2 + p3) + plot_layout(ncol = 1, heights = c(2,1))) (p1
You can explore
other settings here and if you want to see another package that
tries to achieve similar results, explore
cowplot
here.
{highcharter}
highcharter
is one of my favorite packages for dynamic plots because it builds them
with ease and yet they are visually stunning (see below). This is
advanced material so be warned.
The first plot is a heatmap using unemployment rates
(value
) in counties (code
in
countries/us/us-all-all
).
library(highcharter)
data(unemployment)
hcmap(
map = "countries/us/us-all-all",
data = unemployment,
name = "Unemployment",
value = "value",
joinBy = c("hc-key", "code"),
borderColor = "transparent"
%>%
) hc_colorAxis(
dataClasses = color_classes(c(seq(0, 10, by = 2), 50))
%>%
) hc_legend(
layout = "vertical",
align = "right",
floating = TRUE,
valueDecimals = 0,
valueSuffix = "%"
)
Here is a scatterplot
built with the
epa.RData
and keeping only 100 randomly sampled
observations (to keep things manageable). Notice the specification
scatter
for chart-type, and the hcaes()
.
load(
::here("data", "epa.RData")
here
)
library(dplyr)
%>%
epa filter(year == 2019) %>%
sample_n(100) -> epa2
hchart(
epa2, "scatter",
hcaes(x = city08, y = highway08, group = make)
)
Here is a line chart
using the unemployment rate data. I
am extracting year
so I can use it for the x-axis, and then
calculating the average unemployment rate by year and education group
(educ_group)
.
load(
::here("data", "urate.RData")
here
)
library(lubridate)
year(urate$yearmonth) -> urate$year
%>%
urate group_by(educ_group, year) %>%
summarise(
avg.urate = mean(rate, na.rm = TRUE)
-> urate2
)
hchart(
urate2, "line",
hcaes(x = year,
y = avg.urate,
group = educ_group)
%>%
) hc_title(
text = "<span style=\"color:#e88e88\"> Unemployment Rates by Educational Group</span>",
useHTML = TRUE
%>%
) hc_xAxis(
title = list(text = "Year")
%>%
) hc_yAxis(
title = list(text = "Average Unemployment Rate (%)")
%>%
) hc_tooltip(
table = TRUE,
sort = TRUE,
valueDecimals = 1,
valueSuffix = "%"
)
And then dressing up the highcharter plot with themes and some customization.
hchart(
urate2, "line",
hcaes(
x = year, y = avg.urate, group = educ_group)
%>%
) hc_title(
text = "<span style=\"color:#e88e88\"> Unemployment Rates by Educational Group</span>",
useHTML = TRUE
%>%
) hc_xAxis(
title = list(text = "Year")
%>%
) hc_yAxis(
title = list(text = "Average Unemployment Rate (%)")
%>%
) hc_tooltip(
table = TRUE,
sort = TRUE,
valueDecimals = 1,
valueSuffix = "%"
%>%
) hc_add_theme(hc_theme_flatdark())
{gganimate}
This is advanced material too so you have been twice warned! The
{gganimate}
library can be tricky to run without errors and
hiccups because it needs other packages to be installed and configured;
see
here for details. To complicate matters, {gganimate}
is
being completely overhauled and the new version should be released in
the next few weeks so be sure to check its documentation available
here.
library(gapminder)
library(gganimate)
ggplot(
gapminder, aes(gdpPercap, lifeExp, size = pop, colour = country)
+
) geom_point(alpha = 0.7, show.legend = FALSE) +
scale_colour_manual(values = country_colors) +
scale_size(range = c(2, 12)) +
scale_x_log10() +
facet_wrap(~continent) +
labs(
title = 'Year: {frame_time}',
x = 'GDP per capita',
y = 'life expectancy'
+
) transition_time(year) +
ease_aes('linear') -> p1
animate(p1)
This code rebuilds a famous visualization a la
Hans Rosling
, coming close to at least capturing the spirit
of Hans. Life expectancy is mapped for the continents by gross domestic
product per capita, and across years. Each color represents a country
within the continent, and the size of the bubbles is proportional to the
country’s population size.
# Practice Tasks
Create a map of the 48 contiguous states in the United State. Be sure
to title the map and to fill in each state with colors while drawing
state borders in white. Make sure you add state names by first
calculating the centroids of each state and then merging these latitudes
and longitudes with the map data. Use theme_map()
and make
sure the legend is not visible.
Run the following code chunk to load data on the murder, assault and rape rates per 100,000 persons. Urbanpop is the percent of the state population that lives in an urban area.
data(USArrests)
names(USArrests)
rownames(USArrests) -> USArrests$statename
library(ggplot2)
library(ggmap)
library(maps)
library(mapdata)
library(maptools)
library(ggthemes)
map_data("state") -> usast
library(stringr)
str_to_title(usast$region) -> usast$statename
library(dplyr)
%>%
usast filter(statename != "District Of Columbia") -> usast
Use the original USArrests data to draw scatterplots of (a) Murder
versus UrbanPop, (b) Assault versus UrbanPop, and (c) Rape versus
UrbanPop. Save each of these scatterplots by name and then use
{patchwork}
to create a single canvas that includes all
three plots. Make sure you label the x-axis, y-axis, and title each
plot.
Now create {highcharter}
versions of each of the three
scatterplots you created in (3) above. You should end up with three
scatterplots, each on its own canvas.
Use {leaflet}
to create a map that includes a popup for
your place of birth. You will need to use Google maps to find the
latitude/longitude for this place. The popup should display the name of
this place.
For attribution, please cite this work as
Ruhil (2022, Feb. 16). Maps, Interactive, and Animated Graphics. Retrieved from https://aniruhil.org/courses/mpa6020/handouts/module07.html
BibTeX citation
@misc{ruhil2022maps,, author = {Ruhil, Ani}, title = {Maps, Interactive, and Animated Graphics}, url = {https://aniruhil.org/courses/mpa6020/handouts/module07.html}, year = {2022} }