class: title-slide, center, middle background-image: url(images/ouaerial.jpeg) background-size: cover # .crimson[.fancy[Visualizing Data for Analytics (continued)]] ## .crimson[.fancy[Ani Ruhil]] --- name: agenda # .fancy[ Agenda ] This week we learn how to visualize data in more advanced ways ... + Combine multiple graphics into a single canvas with `{patchwork}` + Creating some interactive graphics with `{plotly}` and `{highcharter}` + Generating maps with `{ggplot2}` and `{urbnmapr}`, `{leaflet}`, and `{highcharter}` --- #### Combining Plots with `{patchwork}` Often you have to combine and place multiple graphics into a single canvas. There are a few ways to do this but the easiest way is that offered by the `{patchwork}` package. Let us use the `diamonds` data for this section, a data frame with 53940 rows and 10 variables: | Variable | Description | | :-- | :-- | | price | price in US dollars ($326--$18,823) | | carat | weight of the diamond (0.2--5.01) | | cut | quality of the cut (Fair, Good, Very Good, Premium, Ideal) | | color | diamond colour, from D (best) to J (worst) | | clarity | a measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best)) | | x | length in mm (0--10.74) | | y | width in mm (0--58.9) | | z | depth in mm (0--31.8) | | depth | total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43--79) | | table | width of top of diamond relative to widest point (43--95) | --- #### The Basics To combine multiple plots, we need to save `each plot` with a unique name. I am calling them `p1`, `p2`, etc. .pull-left[ ```r library(patchwork) library(tidyverse) ggplot() + geom_bar(data = diamonds, aes(x = cut, fill = cut)) + labs(x = "Cut of the Diamond", y = "Frequency") + theme(legend.position = "none") -> p1 ggplot() + geom_bar(data = diamonds, aes(x = color, fill = color)) + labs(x = "Color of the Diamond", y = "Frequency") + theme(legend.position = "none") -> p2 ggplot() + geom_point(data = diamonds, aes(x = carat, y = price, color = cut)) + labs(x = "Weight of the Diamond", y = "Price of the Diamond", color = "") + theme(legend.position = "bottom") -> p3 ggplot() + geom_boxplot(data = diamonds, aes(x = price, y = clarity, fill = cut)) + labs(y = "Clarity of the Diamond", x = "Price of the Diamond", fill = "") + theme(legend.position = "bottom") -> p4 p1 + p2 + p3 ``` ] .pull-right[ <img src="Module06sp20_files/figure-html/patch01-1.png" width="100%" style="display: block; margin: auto;" /> ] --- Notice the default layout here: `p1 + p2 + p3` gives us the plots all in a row. But you may have other plans, for example, to put the scatterplot in a row all its own. .left-column[ ```r (p1 + p2) / p3 ``` ] .right-column[ <img src="Module06sp20_files/figure-html/patch02-1.png" width="80%" style="display: block; margin: auto;" /> ] --- Now we have `p3` in the second row, all by itself. This was achieved via the `/` operator. What if we used `|` instead? .left-column[ ```r p1 | (p2 + p3) ``` ] .right-column[ <img src="Module06sp20_files/figure-html/patch03-1.png" width="80%" style="display: block; margin: auto;" /> ] --- You ended up with two columns, the first containing only `p1` and the second containing `p2` and `p3`. So make a note of the difference between `|` and `/`. For example, note the following setup: .left-column[ ```r p1 | (p2 / p3) ``` ] .right-column[ <img src="Module06sp20_files/figure-html/patch04-1.png" width="80%" style="display: block; margin: auto;" /> ] --- What if we wanted to squeeze in the fourth plot? .left-column[ ```r (p1 + p2) / (p3 + p4) ``` ] .right-column[ <img src="Module06sp20_files/figure-html/patch05-1.png" width="80%" style="display: block; margin: auto;" /> ] --- #### Annotations Annotations become helpful because you can add omnibus titles and tags for individual plots. For example, .pull-left[ ```r (p1 + p2) / (p3 + p4) + plot_annotation( title = 'The surprising truth about diamonds', subtitle = 'These plots will reveal untold secrets about one of our beloved data-sets', caption = 'Disclaimer: None of these plots are insightful', tag_levels = c('a', '1'), tag_prefix = 'Fig. ', tag_sep = '.', tag_suffix = ':' ) & theme( plot.tag.position = c(0, 1), plot.tag = element_text( size = 9, hjust = 0, vjust = 0, color = "steelblue") ) ``` ] .pull-right[ <img src="Module06sp20_files/figure-html/patch06-1.png" width="90%" style="display: block; margin: auto;" /> ] --- #### Spacing and Sizing We can also tweak the sizes of individual rows and columns, control the space between plots, and so on. First up, spacing the plots with `plot_spacer()` .pull-left[ ```r (p1 + plot_spacer() + p2 + plot_spacer() + p3) ``` ] .pull-right[ <img src="Module06sp20_files/figure-html/patch07-1.png" width="90%" style="display: block; margin: auto;" /> ] --- Sizing the plots with relative sizes? .left-column[ ```r p1 + p2 + p3 + p4 + plot_layout(widths = c(2, 1)) ``` ] .right-column[ <img src="Module06sp20_files/figure-html/patch08-1.png" width="90%" style="display: block; margin: auto;" /> ] --- Alternatively, we could specify size with unit vectors, as shown below. .left-column[ ```r p1 + p2 + p3 + p4 + plot_layout( widths = c(2, 1), heights = unit( c(5, 1), c('cm', 'null') ) ) ``` ] .right-column[ <img src="Module06sp20_files/figure-html/patch09-1.png" width="80%" style="display: block; margin: auto;" /> ] --- #### Moving Beyond the grid We can use a `layout` design to get a little more flexibility but still retain full control over the result. Layout designs can be done in two ways so let us see the easiest route -- as a text setup. "When using the textual representation it is your responsibility to make sure that each area is rectangular. The only exception is # which denotes empty areas and can thus be of any shape." .pull-left[ ```r layout <- " ##BBBB AACCDD ##CCDD " p2 + p3 + p4 + p1 + plot_layout(design = layout) ``` ] .pull-right[ <img src="Module06sp20_files/figure-html/patch10-1.png" width="100%" style="display: block; margin: auto;" /> ] --- The other path is using `area()` inside `layout`, as shown below. .pull-left[ ```r layout <- c( area(t = 2, l = 1, b = 5, r = 4), area(t = 1, l = 3, b = 3, r = 5) ) p3 + p4 + plot_layout(design = layout) ``` ] .pull-right[ <img src="Module06sp20_files/figure-html/patch11-1.png" width="80%" style="display: block; margin: auto;" /> ] --- Watch the specification here with `wrap_plots()` .pull-left[ ```r layout <- ' A## #B# ##C ' wrap_plots(A = p1, B = p2, C = p3, design = layout) ``` ] .pull-right[ <img src="Module06sp20_files/figure-html/patch12-1.png" width="80%" style="display: block; margin: auto;" /> ] --- ## Fixed-aspect plots There are some plots that use fixed coordinates and these should not be disturbed. Here is an example .pull-left[ ```r library(urbnmapr) ggplot() + geom_polygon( data = states, aes(x = long, y = lat, group = group, fill = state_abbv) ) + coord_fixed(1.3) + ggthemes::theme_map() + theme(legend.position = "none") + labs(title = "Fixed!!") -> mymap mymap + p1 + p2 + p3 ``` ] .pull-right[ <img src="Module06sp20_files/figure-html/urbnmapr-1.png" width="100%" style="display: block; margin: auto;" /> ] --- class: inverse, middle, center # .fancy[.heat[ Mapping ]] --- Maps are very powerful visualizations because they allow you to highlight and reflect patterns, clusters, with relative ease. For example, is poverty really higher in Appalachian counties? What about the percent of the population without health insurance? Literacy? Opioid deaths; do they follow transportation routes? What about COVID-19 cases? Maps to the rescue! Building a map requires a few elements. First and foremost, you need some data to show on a map. Second, you need to have the geographic coordinates needed to build a map, basically the latitude and longitude of the geographies (states, cities, school districts, etc.) that you want to map. Third, you want a column that contains the names of the geographies you want to map, and these should be properly formatted (i.e., in titlecase) for displaying on the map. Let us start by building a simple state map with the `{urbnmapr}` package. It comes with the necessary data for states and counties, respectively, and works well with `{ggplot2}`. Note the reliance on `geom_polygon()` now. --- .pull-left[ ```r library(urbnmapr) glimpse(states) glimpse(counties) ``` ] .pull-right[ ``` ## Rows: 83,933 ## Columns: 9 ## $ long <dbl> -88.47, -88.47, -88.47, -88.46, -88.45, -88.45, -88.44, -88… ## $ lat <dbl> 31.89, 31.93, 31.93, 32.04, 32.04, 32.05, 32.17, 32.17, 32.… ## $ order <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, … ## $ hole <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL… ## $ piece <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,… ## $ group <fct> 01.1, 01.1, 01.1, 01.1, 01.1, 01.1, 01.1, 01.1, 01.1, 01.1,… ## $ state_fips <chr> "01", "01", "01", "01", "01", "01", "01", "01", "01", "01",… ## $ state_abbv <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL",… ## $ state_name <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Alabama", "Ala… ``` ``` ## Rows: 208,874 ## Columns: 12 ## $ long <dbl> -86.92, -86.82, -86.71, -86.71, -86.41, -86.41, -86.44, -8… ## $ lat <dbl> 32.66, 32.66, 32.66, 32.71, 32.71, 32.41, 32.40, 32.41, 32… ## $ order <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,… ## $ hole <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA… ## $ piece <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1… ## $ group <fct> 01001.1, 01001.1, 01001.1, 01001.1, 01001.1, 01001.1, 0100… ## $ county_fips <chr> "01001", "01001", "01001", "01001", "01001", "01001", "010… ## $ state_abbv <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL"… ## $ state_fips <chr> "01", "01", "01", "01", "01", "01", "01", "01", "01", "01"… ## $ county_name <chr> "Autauga County", "Autauga County", "Autauga County", "Aut… ## $ fips_class <chr> "H1", "H1", "H1", "H1", "H1", "H1", "H1", "H1", "H1", "H1"… ## $ state_name <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Alabama", "Al… ``` ] --- .pull-left[ ```r ggplot() + geom_polygon(data = states, aes(x = long, y = lat, group = group), fill = "white", color = "steelblue") + coord_fixed(1.3) ``` ] .pull-right[ <img src="Module06sp20_files/figure-html/mapp-1.png" width="100%" style="display: block; margin: auto;" /> ] --- Note that this is just an empty map with the shapes of the states, and also that Alaska and Hawaii have been moved so that they can be displayed on the map. We could build a much better map by removing the x and y axis labels and tick marks, and setting a white background using `theme_map()` from the `{ggthemes}` package. We could also fill with some colors, say on the basis of the `state_name`. .pull-left[ ```r ggplot() + geom_polygon(data = states, aes(x = long, y = lat, group = group, fill = state_name), color = "white") + coord_fixed(1.3) + ggthemes::theme_map() + theme(legend.position = "none") ``` ] .pull-right[ <img src="Module06sp20_files/figure-html/map02-1.png" width="100%" style="display: block; margin: auto;" /> ] --- Okay, this is not very useful because it would be much better to color the map on the basis of some substantive variable. Let us see what lurks in the `statedata` file. .pull-left[ ```r glimpse(statedata) ``` ] .pull-right[ ``` ## Rows: 51 ## Columns: 6 ## $ year <int> 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015… ## $ state_fips <chr> "01", "02", "04", "05", "06", "08", "09", "10", "11", "12"… ## $ state_name <chr> "Alabama", "Alaska", "Arizona", "Arkansas", "California", … ## $ hhpop <int> 1846380, 250183, 2463012, 1144657, 12895471, 2074517, 1343… ## $ horate <dbl> 0.6814, 0.6312, 0.6206, 0.6546, 0.5372, 0.6389, 0.6623, 0.… ## $ medhhincome <int> 44700, 70600, 51000, 42000, 64600, 63500, 71700, 61200, 75… ``` ] --- Okay, two things stand out -- `horate` (the homeownership rate), and `medhhincome` (the median household income). Let us fill with median household income but to do so, we will need to join `statedata` to our `states` file. Why? Because we need coordinates to map anything and `statedata` does not contain coordinates. Then we can specify `fill = medhhincome` inside the `aes(...)` command. .pull-left[ ```r states %>% left_join(statedata, by = c("state_fips", "state_name") ) -> state.df glimpse(state.df) ``` ] .pull-right[ ``` ## Rows: 83,933 ## Columns: 13 ## $ long <dbl> -88.47, -88.47, -88.47, -88.46, -88.45, -88.45, -88.44, -8… ## $ lat <dbl> 31.89, 31.93, 31.93, 32.04, 32.04, 32.05, 32.17, 32.17, 32… ## $ order <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,… ## $ hole <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA… ## $ piece <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1… ## $ group <fct> 01.1, 01.1, 01.1, 01.1, 01.1, 01.1, 01.1, 01.1, 01.1, 01.1… ## $ state_fips <chr> "01", "01", "01", "01", "01", "01", "01", "01", "01", "01"… ## $ state_abbv <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL"… ## $ state_name <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Alabama", "Al… ## $ year <int> 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015… ## $ hhpop <int> 1846380, 1846380, 1846380, 1846380, 1846380, 1846380, 1846… ## $ horate <dbl> 0.6814, 0.6814, 0.6814, 0.6814, 0.6814, 0.6814, 0.6814, 0.… ## $ medhhincome <int> 44700, 44700, 44700, 44700, 44700, 44700, 44700, 44700, 44… ``` ] --- .pull-left[ ```r ggplot() + geom_polygon(data = state.df, aes(x = long, y = lat, group = group, fill = medhhincome), color = "white") + coord_fixed(1.3) + ggthemes::theme_map() + theme(legend.position = "bottom") + labs( title = "Median Household Income in the States (2015)", fill = "" ) + scale_fill_viridis_c(option = "magma") ``` ] .pull-right[ <img src="Module06sp20_files/figure-html/map05-1.png" width="100%" style="display: block; margin: auto;" /> ] --- What about working with counties? Sure, let us merge `countydata` with the `counties` file and map. .pull-left[ ```r counties %>% left_join(countydata, by = c("county_fips") ) -> county.df ggplot() + geom_polygon(data = county.df, aes(x = long, y = lat, group = group, fill = medhhincome ), color = "white", size = 0.05) + coord_fixed(1.3) + ggthemes::theme_map() + theme(legend.position = "bottom") + labs( title = "Median Household Income in the Counties (2015)", fill = "" ) + scale_fill_viridis_c(option = "magma") ``` ] .pull-right[ <img src="Module06sp20_files/figure-html/map06-1.png" width="100%" style="display: block; margin: auto;" /> ] --- Maybe you are only interested in Florida? .pull-left[ ```r county.df %>% filter(state_abbv == "FL") %>% ggplot() + geom_polygon( aes(x = long, y = lat, group = group, fill = medhhincome), color = "white", size = 0.05 ) + coord_fixed(1.3) + ggthemes::theme_map() + theme(legend.position = "bottom") + labs( title = "Median Household Income in Floria Counties (2015)", fill = "" ) + scale_fill_viridis_c(option = "plasma") ``` ] .pull-right[ <img src="Module06sp20_files/figure-html/map07-1.png" width="100%" style="display: block; margin: auto;" /> ] --- Hmm, so far so good but what if the data were for some geography not bundled with `{urbnmapr}`, school districts or places, for example? Not a problem, we just have to go the extra mile. First we would have to find, download, and upload the `shapefile`. Say I am looking for places (loosely described as municipalities) in New Hampshire. Well, the `{tigris}` package comes in handy because it allows you to get whatever geography's shapefiles you want. Below I am getting the shapefile for New Hampshire. ```r library(tigris) places( state = "NH", year = 2018, progress_bar = FALSE ) -> places.nh head(places.nh) ``` ``` ## An object of class "SpatialPolygonsDataFrame" ## Slot "data": ## STATEFP PLACEFP PLACENS GEOID NAME NAMELSAD LSAD CLASSFP PCICBSA ## 0 33 27380 00873600 3327380 Franklin Franklin city 25 C5 N ## 1 33 05140 00873545 3305140 Berlin Berlin city 25 C5 Y ## 2 33 12900 00873567 3312900 Claremont Claremont city 25 C5 Y ## 3 33 14200 00873569 3314200 Concord Concord city 25 C5 Y ## 4 33 50260 00873672 3350260 Nashua Nashua city 25 C5 Y ## 5 33 41300 00873643 3341300 Lebanon Lebanon city 25 C5 Y ## PCINECTA MTFCC FUNCSTAT ALAND AWATER INTPTLAT INTPTLON ## 0 N G4110 A 71050024 4496574 +43.4539146 -071.6698930 ## 1 Y G4110 A 158888109 2213481 +44.4843279 -071.2771227 ## 2 Y G4110 A 111766137 2323426 +43.3864700 -072.3344957 ## 3 Y G4110 A 165629848 8392384 +43.2324926 -071.5612523 ## 4 Y G4110 A 79857952 2326408 +42.7490744 -071.4905435 ## 5 Y G4110 A 104389361 2507986 +43.6354277 -072.2538366 ## ## Slot "polygons": ## [[1]] ## An object of class "Polygons" ## Slot "Polygons": ## [[1]] ## An object of class "Polygon" ## Slot "labpt": ## [1] -71.67 43.45 ## ## Slot "area": ## [1] 0.008401 ## ## Slot "hole": ## [1] FALSE ## ## Slot "ringDir": ## [1] 1 ## ## Slot "coords": ## [,1] [,2] ## [1,] -71.73 43.50 ## [2,] -71.73 43.50 ## [3,] -71.73 43.50 ## [4,] -71.73 43.50 ## [5,] -71.73 43.50 ## [6,] -71.73 43.50 ## [7,] -71.73 43.51 ## [8,] -71.72 43.51 ## [9,] -71.72 43.51 ## [10,] -71.72 43.51 ## [11,] -71.72 43.51 ## [12,] -71.72 43.51 ## [13,] -71.72 43.51 ## [14,] -71.72 43.51 ## [15,] -71.72 43.51 ## [16,] -71.72 43.51 ## [17,] -71.72 43.51 ## [18,] -71.72 43.51 ## [19,] -71.71 43.51 ## [20,] -71.71 43.51 ## [21,] -71.71 43.51 ## [22,] -71.71 43.51 ## [23,] -71.71 43.51 ## [24,] -71.71 43.51 ## [25,] -71.71 43.51 ## [26,] -71.71 43.51 ## [27,] -71.70 43.51 ## [28,] -71.70 43.51 ## [29,] -71.70 43.51 ## [30,] -71.70 43.51 ## [31,] -71.70 43.51 ## [32,] -71.70 43.51 ## [33,] -71.70 43.51 ## [34,] -71.70 43.52 ## [35,] -71.70 43.52 ## [36,] -71.70 43.52 ## [37,] -71.70 43.52 ## [38,] -71.70 43.52 ## [39,] -71.69 43.52 ## [40,] -71.69 43.52 ## [41,] -71.69 43.52 ## [42,] -71.69 43.52 ## [43,] -71.69 43.52 ## [44,] -71.69 43.52 ## [45,] -71.69 43.52 ## [46,] -71.69 43.52 ## [47,] -71.69 43.52 ## [48,] -71.69 43.52 ## [49,] -71.69 43.52 ## [50,] -71.69 43.52 ## [ reached getOption("max.print") -- omitted 368 rows ] ## ## ## ## Slot "plotOrder": ## [1] 1 ## ## Slot "labpt": ## [1] -71.67 43.45 ## ## Slot "ID": ## [1] "0" ## ## Slot "area": ## [1] 0.008401 ## ## ## [[2]] ## An object of class "Polygons" ## Slot "Polygons": ## [[1]] ## An object of class "Polygon" ## Slot "labpt": ## [1] -71.26 44.49 ## ## Slot "area": ## [1] 0.01823 ## ## Slot "hole": ## [1] FALSE ## ## Slot "ringDir": ## [1] 1 ## ## Slot "coords": ## [,1] [,2] ## [1,] -71.40 44.45 ## [2,] -71.40 44.45 ## [3,] -71.39 44.45 ## [4,] -71.39 44.45 ## [5,] -71.39 44.45 ## [6,] -71.38 44.47 ## [7,] -71.38 44.47 ## [8,] -71.38 44.48 ## [9,] -71.38 44.49 ## [10,] -71.37 44.49 ## [11,] -71.37 44.49 ## [12,] -71.37 44.50 ## [13,] -71.37 44.50 ## [14,] -71.37 44.50 ## [15,] -71.37 44.50 ## [16,] -71.37 44.50 ## [17,] -71.37 44.50 ## [18,] -71.37 44.50 ## [19,] -71.37 44.50 ## [20,] -71.37 44.50 ## [21,] -71.37 44.51 ## [22,] -71.36 44.51 ## [23,] -71.36 44.52 ## [24,] -71.36 44.52 ## [25,] -71.36 44.52 ## [26,] -71.36 44.52 ## [27,] -71.36 44.52 ## [28,] -71.35 44.52 ## [29,] -71.35 44.52 ## [30,] -71.34 44.52 ## [31,] -71.30 44.52 ## [32,] -71.30 44.53 ## [33,] -71.30 44.53 ## [34,] -71.29 44.53 ## [35,] -71.29 44.53 ## [36,] -71.29 44.53 ## [37,] -71.29 44.53 ## [38,] -71.29 44.53 ## [39,] -71.29 44.53 ## [40,] -71.27 44.53 ## [41,] -71.27 44.53 ## [42,] -71.26 44.53 ## [43,] -71.26 44.53 ## [44,] -71.26 44.53 ## [45,] -71.26 44.53 ## [46,] -71.26 44.53 ## [47,] -71.26 44.53 ## [48,] -71.26 44.53 ## [49,] -71.26 44.53 ## [50,] -71.26 44.53 ## [ reached getOption("max.print") -- omitted 223 rows ] ## ## ## ## Slot "plotOrder": ## [1] 1 ## ## Slot "labpt": ## [1] -71.26 44.49 ## ## Slot "ID": ## [1] "1" ## ## Slot "area": ## [1] 0.01823 ## ## ## [[3]] ## An object of class "Polygons" ## Slot "Polygons": ## [[1]] ## An object of class "Polygon" ## Slot "labpt": ## [1] -72.34 43.38 ## ## Slot "area": ## [1] 0.01267 ## ## Slot "hole": ## [1] FALSE ## ## Slot "ringDir": ## [1] 1 ## ## Slot "coords": ## [,1] [,2] ## [1,] -72.42 43.38 ## [2,] -72.42 43.38 ## [3,] -72.42 43.38 ## [4,] -72.42 43.38 ## [5,] -72.42 43.38 ## [6,] -72.42 43.38 ## [7,] -72.42 43.38 ## [8,] -72.42 43.38 ## [9,] -72.42 43.38 ## [10,] -72.42 43.38 ## [11,] -72.42 43.38 ## [12,] -72.41 43.38 ## [13,] -72.41 43.38 ## [14,] -72.41 43.38 ## [15,] -72.41 43.38 ## [16,] -72.41 43.38 ## [17,] -72.41 43.38 ## [18,] -72.41 43.38 ## [19,] -72.41 43.38 ## [20,] -72.41 43.38 ## [21,] -72.41 43.38 ## [22,] -72.41 43.38 ## [23,] -72.41 43.38 ## [24,] -72.41 43.39 ## [25,] -72.41 43.39 ## [26,] -72.41 43.39 ## [27,] -72.41 43.39 ## [28,] -72.41 43.39 ## [29,] -72.41 43.39 ## [30,] -72.41 43.39 ## [31,] -72.41 43.39 ## [32,] -72.41 43.39 ## [33,] -72.41 43.39 ## [34,] -72.41 43.39 ## [35,] -72.41 43.39 ## [36,] -72.41 43.39 ## [37,] -72.41 43.39 ## [38,] -72.41 43.39 ## [39,] -72.41 43.39 ## [40,] -72.40 43.39 ## [41,] -72.40 43.39 ## [42,] -72.40 43.39 ## [43,] -72.40 43.39 ## [44,] -72.40 43.39 ## [45,] -72.40 43.39 ## [46,] -72.40 43.39 ## [47,] -72.40 43.39 ## [48,] -72.40 43.39 ## [49,] -72.40 43.39 ## [50,] -72.40 43.39 ## [ reached getOption("max.print") -- omitted 554 rows ] ## ## ## ## Slot "plotOrder": ## [1] 1 ## ## Slot "labpt": ## [1] -72.34 43.38 ## ## Slot "ID": ## [1] "2" ## ## Slot "area": ## [1] 0.01267 ## ## ## [[4]] ## An object of class "Polygons" ## Slot "Polygons": ## [[1]] ## An object of class "Polygon" ## Slot "labpt": ## [1] -71.56 43.23 ## ## Slot "area": ## [1] 0.01928 ## ## Slot "hole": ## [1] FALSE ## ## Slot "ringDir": ## [1] 1 ## ## Slot "coords": ## [,1] [,2] ## [1,] -71.67 43.26 ## [2,] -71.66 43.26 ## [3,] -71.65 43.26 ## [4,] -71.65 43.27 ## [5,] -71.63 43.27 ## [6,] -71.62 43.27 ## [7,] -71.62 43.27 ## [8,] -71.62 43.28 ## [9,] -71.62 43.28 ## [10,] -71.62 43.28 ## [11,] -71.62 43.28 ## [12,] -71.62 43.28 ## [13,] -71.62 43.28 ## [14,] -71.61 43.28 ## [15,] -71.61 43.28 ## [16,] -71.61 43.28 ## [17,] -71.61 43.28 ## [18,] -71.61 43.28 ## [19,] -71.61 43.28 ## [20,] -71.61 43.28 ## [21,] -71.60 43.28 ## [22,] -71.60 43.28 ## [23,] -71.60 43.28 ## [24,] -71.60 43.28 ## [25,] -71.60 43.28 ## [26,] -71.60 43.28 ## [27,] -71.60 43.28 ## [28,] -71.60 43.28 ## [29,] -71.60 43.28 ## [30,] -71.60 43.28 ## [31,] -71.60 43.28 ## [32,] -71.60 43.28 ## [33,] -71.60 43.28 ## [34,] -71.60 43.28 ## [35,] -71.59 43.28 ## [36,] -71.59 43.29 ## [37,] -71.59 43.29 ## [38,] -71.59 43.29 ## [39,] -71.59 43.29 ## [40,] -71.59 43.29 ## [41,] -71.59 43.29 ## [42,] -71.59 43.29 ## [43,] -71.59 43.29 ## [44,] -71.59 43.29 ## [45,] -71.59 43.29 ## [46,] -71.59 43.29 ## [47,] -71.59 43.29 ## [48,] -71.59 43.29 ## [49,] -71.59 43.29 ## [50,] -71.58 43.29 ## [ reached getOption("max.print") -- omitted 1192 rows ] ## ## ## ## Slot "plotOrder": ## [1] 1 ## ## Slot "labpt": ## [1] -71.56 43.23 ## ## Slot "ID": ## [1] "3" ## ## Slot "area": ## [1] 0.01928 ## ## ## [[5]] ## An object of class "Polygons" ## Slot "Polygons": ## [[1]] ## An object of class "Polygon" ## Slot "labpt": ## [1] -71.49 42.75 ## ## Slot "area": ## [1] 0.009036 ## ## Slot "hole": ## [1] FALSE ## ## Slot "ringDir": ## [1] 1 ## ## Slot "coords": ## [,1] [,2] ## [1,] -71.56 42.79 ## [2,] -71.56 42.79 ## [3,] -71.56 42.79 ## [4,] -71.56 42.79 ## [5,] -71.56 42.79 ## [6,] -71.56 42.79 ## [7,] -71.56 42.79 ## [8,] -71.56 42.79 ## [9,] -71.56 42.79 ## [10,] -71.56 42.79 ## [11,] -71.56 42.79 ## [12,] -71.55 42.79 ## [13,] -71.55 42.79 ## [14,] -71.55 42.79 ## [15,] -71.55 42.79 ## [16,] -71.55 42.79 ## [17,] -71.55 42.79 ## [18,] -71.55 42.79 ## [19,] -71.55 42.79 ## [20,] -71.55 42.79 ## [21,] -71.55 42.79 ## [22,] -71.55 42.80 ## [23,] -71.55 42.80 ## [24,] -71.55 42.80 ## [25,] -71.55 42.80 ## [26,] -71.55 42.80 ## [27,] -71.55 42.80 ## [28,] -71.55 42.80 ## [29,] -71.55 42.80 ## [30,] -71.55 42.80 ## [31,] -71.55 42.80 ## [32,] -71.55 42.80 ## [33,] -71.55 42.80 ## [34,] -71.55 42.80 ## [35,] -71.55 42.80 ## [36,] -71.55 42.80 ## [37,] -71.55 42.80 ## [38,] -71.55 42.80 ## [39,] -71.55 42.80 ## [40,] -71.55 42.80 ## [41,] -71.55 42.80 ## [42,] -71.55 42.80 ## [43,] -71.55 42.80 ## [44,] -71.55 42.80 ## [45,] -71.55 42.80 ## [46,] -71.55 42.80 ## [47,] -71.55 42.80 ## [48,] -71.55 42.80 ## [49,] -71.55 42.80 ## [50,] -71.55 42.80 ## [ reached getOption("max.print") -- omitted 1133 rows ] ## ## ## ## Slot "plotOrder": ## [1] 1 ## ## Slot "labpt": ## [1] -71.49 42.75 ## ## Slot "ID": ## [1] "4" ## ## Slot "area": ## [1] 0.009036 ## ## ## [[6]] ## An object of class "Polygons" ## Slot "Polygons": ## [[1]] ## An object of class "Polygon" ## Slot "labpt": ## [1] -72.25 43.64 ## ## Slot "area": ## [1] 0.01192 ## ## Slot "hole": ## [1] FALSE ## ## Slot "ringDir": ## [1] 1 ## ## Slot "coords": ## [,1] [,2] ## [1,] -72.34 43.62 ## [2,] -72.34 43.62 ## [3,] -72.33 43.62 ## [4,] -72.33 43.62 ## [5,] -72.33 43.62 ## [6,] -72.33 43.62 ## [7,] -72.33 43.62 ## [8,] -72.33 43.62 ## [9,] -72.33 43.62 ## [10,] -72.33 43.62 ## [11,] -72.33 43.62 ## [12,] -72.33 43.62 ## [13,] -72.33 43.63 ## [14,] -72.33 43.63 ## [15,] -72.33 43.63 ## [16,] -72.33 43.63 ## [17,] -72.33 43.63 ## [18,] -72.33 43.63 ## [19,] -72.33 43.63 ## [20,] -72.33 43.63 ## [21,] -72.33 43.63 ## [22,] -72.33 43.63 ## [23,] -72.33 43.63 ## [24,] -72.33 43.63 ## [25,] -72.33 43.63 ## [26,] -72.33 43.63 ## [27,] -72.33 43.63 ## [28,] -72.33 43.63 ## [29,] -72.33 43.63 ## [30,] -72.33 43.63 ## [31,] -72.33 43.63 ## [32,] -72.33 43.63 ## [33,] -72.33 43.63 ## [34,] -72.33 43.63 ## [35,] -72.33 43.63 ## [36,] -72.33 43.63 ## [37,] -72.33 43.63 ## [38,] -72.33 43.63 ## [39,] -72.33 43.63 ## [40,] -72.33 43.63 ## [41,] -72.33 43.63 ## [42,] -72.33 43.63 ## [43,] -72.33 43.63 ## [44,] -72.33 43.63 ## [45,] -72.33 43.63 ## [46,] -72.33 43.63 ## [47,] -72.33 43.63 ## [48,] -72.33 43.63 ## [49,] -72.33 43.63 ## [50,] -72.33 43.63 ## [ reached getOption("max.print") -- omitted 547 rows ] ## ## ## ## Slot "plotOrder": ## [1] 1 ## ## Slot "labpt": ## [1] -72.25 43.64 ## ## Slot "ID": ## [1] "5" ## ## Slot "area": ## [1] 0.01192 ## ## ## ## Slot "plotOrder": ## [1] 4 2 3 6 5 1 ## ## Slot "bbox": ## min max ## x -72.42 -71.12 ## y 42.70 44.53 ## ## Slot "proj4string": ## CRS arguments: +proj=longlat +ellps=GRS80 +no_defs ``` --- Okay, so now that I have the shapefile, how can I use it? I need to `fortify` it so that it looks like a regular dataframe rather than the native SpatialPolygonsDataFrame format it comes in. When I go to make the map I am going to add the state shapefile too since otherwise the state's boundary will not show up. .pull-left[ ```r places.nh %>% fortify(region = "GEOID") -> nh.df ggplot() + geom_polygon( data = subset(state.df, state_name == "New Hampshire"), aes(x = long, y = lat, group = group), fill = "white", color = "black" ) + geom_polygon( data = nh.df, aes(x = long, y = lat, group = group, fill = id, color = id) ) + coord_fixed(1.3) + ggthemes::theme_map() + theme(legend.position = "none") ``` ] .pull-right[ <img src="Module06sp20_files/figure-html/nh01-1.png" width="100%" style="display: block; margin: auto;" /> ] --- Of course, the `fill` is superficial here. But say we had some data for places in New Hampshire, maybe the size of the population, as in `nh.data.RData`. Now we could join `nh.data` with `nh.df` to create `nh` and then map. ```r load(here::here("data", "nh.data.RData")) nh.df %>% left_join(nh.data, by = c("id" = "GEOID")) -> nh glimpse(nh) ``` ``` ## Rows: 32,312 ## Columns: 9 ## $ long <dbl> -71.23, -71.23, -71.23, -71.23, -71.23, -71.23, -71.23, -71… ## $ lat <dbl> 43.47, 43.47, 43.46, 43.46, 43.46, 43.46, 43.46, 43.46, 43.… ## $ order <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, … ## $ hole <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL… ## $ piece <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,… ## $ id <chr> "3300980", "3300980", "3300980", "3300980", "3300980", "330… ## $ group <fct> 3300980.1, 3300980.1, 3300980.1, 3300980.1, 3300980.1, 3300… ## $ NAME <chr> "Alton CDP, New Hampshire", "Alton CDP, New Hampshire", "Al… ## $ population <dbl> 168, 168, 168, 168, 168, 168, 168, 168, 168, 168, 168, 168,… ``` --- Now we plot: .pull-left[ ```r ggplot() + geom_polygon( data = subset(state.df, state_name == "New Hampshire"), aes(x = long, y = lat, group = group), fill = "white", color = "black" ) + geom_polygon( data = nh, aes(x = long, y = lat, group = group, fill = population) ) + coord_fixed(1.3) + scale_fill_viridis_c(option = "viridis") + ggthemes::theme_map() + theme(legend.position = "bottom") + labs( fill = "", title = "Population Distribution in New Hampshire's Places", subtitle = "(American Community Survey, 2014-2018)" ) ``` ] .pull-right[ <img src="Module06sp20_files/figure-html/nh03-1.png" width="100%" style="display: block; margin: auto;" /> ] --- You could have also filled by creating quartiles, etc., using `{Santoku}`, so do not forget that option. Before we move on, one more map, to show you the possibilities. Here, I am plotting the locations of the parking tickets issued in Philadelphia, this time with the `{leaflet}` package. First the tickets data-set, reduced to a random sample of 20% of the tickets issued in the month of December in 2017 (to keep the data size manageable). I have also created a `popup` that will display specific information if someone clicks on a point in the map. .pull-left[ ```r readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-12-03/tickets.csv") %>% mutate( year = lubridate::year(issue_datetime), month = lubridate::month(issue_datetime) ) %>% filter(month == 12, lon > -75.5) %>% sample_frac(0.2) -> tickets tickets %>% unite(display, c(issuing_agency, issue_datetime, fine), sep = "; ", remove = FALSE) -> tickets library(leaflet) library(htmltools) library(widgetframe) leaflet(tickets) %>% addTiles() %>% addCircles(lng = ~ lon, lat = ~ lat, popup = ~htmlEscape(display), color = "steelblue", opacity = 0.10) %>% frameWidget(width = "100%", height = "400") ``` ] .pull-right[
] --- Voila! A few lines of code and we have an interactive map that can be used to display whatever evidence we want to display. Note that you need geographic coordinates since without them the data cannot be attached to a physical location. Let us see another twist on this. Say I am trying to map the total number of COVID-19 cases in Ohio. I know the county and I know the number of cases that occurred, as well as the latitude and longitude of each county. Well, I can build a similar plot except making the size of the circle conditional upon the number of cases. The more the cases, the larger the radius of the circle. ```r readr::read_csv("https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv") -> covid covid %>% filter(state == "Ohio", date == "2020-04-17") -> cov19 ``` --- But there are no coordinates in `cov19`. Well, I can get those from the `{housingData}` package, and then merge that with `cov19`. ```r library(housingData) geoCounty %>% filter(state == "OH") %>% separate(county, into = c("countyname", "extra"), sep = " County", remove = TRUE) %>% mutate(countyname = stringr::str_to_sentence(countyname)) -> oh oh %>% left_join(cov19, by = c("countyname" = "county")) -> ohcov19 ohcov19 %>% unite(display, c(countyname, cases), sep = ": ", remove = FALSE) -> ohcov19 ``` --- Okay, now we have everything we need to build the map. .pull-left[ ```r leaflet(ohcov19) %>% addTiles() %>% addCircleMarkers( lng = ~ lon, lat = ~ lat, popup = ~htmlEscape(display), color = "salmon", opacity = 0.10, radius = ~sqrt(cases) ) %>% frameWidget(width = "100%", height = "450") ``` ] .pull-right[
] --- class: inverse, middle, center # .fancy[.heat[ Interactive Graphics with Plotly and Highcharter ]] --- Interactive graphics are useful in situations where you would like the user/viewer to see the data values or other details. Say, for example, I have a plot and want to make it interactive. How can I do that? By saving my regular plot and then using `{plotly}` to add a `ggplotly()` wrapper around the plot. .pull-left[ ```r library(plotly) ggplot() + geom_point( data = mpg, mapping = aes(x = cty, y = hwy, color = trans) ) + labs(x = "City Mileage", y = "Highway Mileage", color = "Transmission") -> pl01 lst <- list() ggplotly(pl01) -> lst htmltools::tagList(lst) ``` ] .pull-right[
] --- These plots are useful when presenting data to a live audience (in a talk, or on the web). But, I prefer `{highcharter}` since it does a lot of things well, and they are aesthetically pleasing. Let us stay with the COVID-19 example. Say I want a bar-chart of the total number of cases by state. .pull-left[ ```r library(highcharter) covid %>% filter(date == "2020-04-17") %>% rename(State = state, `Total Cases` = cases) -> tab1 lst <- list() hchart(tab1, "bar", hcaes(x = State, y = `Total Cases`)) -> lst htmltools::tagList(lst) ``` ] .pull-right[
] --- Notice the key elements here: The basic function call is `hchart()` and we are specifying that we want a bar-chart, and we are also providing the quantities that should go on the x and y axis, respectively. Note that x actually ends up as the y when you specify a "bar" chart. What if I wanted a line-chart, maybe of the number of cases over time? And I wanted this just for a few states? We could do that too, as shown below. Note that I am creating `tab2`, a frequency table of the number of cases by state and date, and then converting total_cases into a logarithmic form (saved as `log_cases`) so that we can compare the rate of change from one date to the next on a common scale. --- .pull-left[ ```r covid %>% filter(state %in% c("Ohio", "Florida", "California", "New Jersey", "Ohio", "New York"), date >= "2020-03-15") %>% mutate(log_cases = log(cases)) -> tab2 lst <- list() hchart(tab2, "line", hcaes(x = date, y = log_cases, group = state)) -> lst htmltools::tagList(lst) ``` ] .pull-right[
] --- Now here is a county-level chart that shows the total number of cases as of April 14, 2020. This is stored in `tab3` created as shown below. Pay attention to this creation because we are not just creating a frequency table but also adding in a specific key we are calling `code` because we will need to join these data to the map data. ```r covid %>% filter(date == "2020-04-17") %>% unite(Location, c(county, state), sep = ", ", remove = TRUE) -> tab3 library(urbnmapr) data(counties) counties %>% separate(county_fips, into = c("stfips", "fips"), sep = 2, remove = FALSE) %>% mutate(leader = "us", stlower = stringr::str_to_lower(state_abbv)) %>% unite(code, c(leader, stlower, fips), sep = "-") -> cdf cdf %>% select(code, county_fips) %>% distinct() -> cdf2 tab3 %>% left_join(cdf2, by = c("fips" = "county_fips")) -> tab4 ``` --- Here comes the map! .pull-left[ ```r library(viridis) lst <- list() hcmap("countries/us/us-all-all", data = tab4, name = "COVID-19 Cases", value = "cases", joinBy = c("hc-key", "code"), borderColor = "steelblue") %>% hc_colorAxis( stops = color_stops(10, rev(magma(10))) ) %>% hc_legend(layout = "horizontal", align = "right", floating = TRUE, valueDecimals = 0, valueSuffix = "") -> lst htmltools::tagList(lst) ``` ] .pull-right[
] --- Note that `countries/us/us-all-all` indicates that we want counties. If we wanted the states instead it would have been `countries/us/us-all`. The `value = ` flags the column that has to be used to populate the color scheme. The `joinBy` connects our data (`tab4`) to `hcmap()` so that `total_cases` gets attached to the correct location in the map. What if we wanted only Ohio? Well, in that case we could subset as shown below: .pull-left[ ```r tab4 %>% filter(grepl("-oh-", code)) -> tab5 lst <- list() hcmap("countries/us/us-oh-all", data = tab5, name = "COVID-19 Cases", value = "cases", joinBy = c("hc-key", "code"), borderColor = "steelblue") %>% hc_colorAxis(stops = color_stops(10, rev(magma(10)))) %>% hc_legend(layout = "horizontal", align = "right", floating = TRUE, valueDecimals = 0, valueSuffix = "") -> lst htmltools::tagList(lst) ``` ] .pull-right[
] --- class: right, middle <img class="circle" src="https://github.com/aniruhil.png" width="175px"/> # Find me at... [<svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 512 512"><path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"/></svg> @aruhil](http://twitter.com/aruhil) [<svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 512 512"><path d="M326.612 185.391c59.747 59.809 58.927 155.698.36 214.59-.11.12-.24.25-.36.37l-67.2 67.2c-59.27 59.27-155.699 59.262-214.96 0-59.27-59.26-59.27-155.7 0-214.96l37.106-37.106c9.84-9.84 26.786-3.3 27.294 10.606.648 17.722 3.826 35.527 9.69 52.721 1.986 5.822.567 12.262-3.783 16.612l-13.087 13.087c-28.026 28.026-28.905 73.66-1.155 101.96 28.024 28.579 74.086 28.749 102.325.51l67.2-67.19c28.191-28.191 28.073-73.757 0-101.83-3.701-3.694-7.429-6.564-10.341-8.569a16.037 16.037 0 0 1-6.947-12.606c-.396-10.567 3.348-21.456 11.698-29.806l21.054-21.055c5.521-5.521 14.182-6.199 20.584-1.731a152.482 152.482 0 0 1 20.522 17.197zM467.547 44.449c-59.261-59.262-155.69-59.27-214.96 0l-67.2 67.2c-.12.12-.25.25-.36.37-58.566 58.892-59.387 154.781.36 214.59a152.454 152.454 0 0 0 20.521 17.196c6.402 4.468 15.064 3.789 20.584-1.731l21.054-21.055c8.35-8.35 12.094-19.239 11.698-29.806a16.037 16.037 0 0 0-6.947-12.606c-2.912-2.005-6.64-4.875-10.341-8.569-28.073-28.073-28.191-73.639 0-101.83l67.2-67.19c28.239-28.239 74.3-28.069 102.325.51 27.75 28.3 26.872 73.934-1.155 101.96l-13.087 13.087c-4.35 4.35-5.769 10.79-3.783 16.612 5.864 17.194 9.042 34.999 9.69 52.721.509 13.906 17.454 20.446 27.294 10.606l37.106-37.106c59.271-59.259 59.271-155.699.001-214.959z"/></svg> aniruhil.org](https://aniruhil.org) [<svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 512 512"><path d="M476 3.2L12.5 270.6c-18.1 10.4-15.8 35.6 2.2 43.2L121 358.4l287.3-253.2c5.5-4.9 13.3 2.6 8.6 8.3L176 407v80.5c0 23.6 28.5 32.9 42.5 15.8L282 426l124.6 52.2c14.2 6 30.4-2.9 33-18.2l72-432C515 7.8 493.3-6.8 476 3.2z"/></svg> ruhil@ohio.edu](mailto:ruhil@ohio.edu)