class: title-slide, center, middle background-image: url(images/oupoppies.jpeg) background-size: cover # .large[.fancy[Visualizing Data with R]] ## .fancy[Ani Ruhil] --- ## Agenda - Graphics with `ggplot2` - Interactive graphics with `highcharter` - Interactive graphics with `plotly` - Maps - with `ggplot2` - with `leaflet` --- class: inverse, middle, center <center><img src = "images/hex-ggplot2.png", width = 200px></center> --- # [the grammar of graphics](http://vita.had.co.nz/papers/layered-grammar.html) .pull-left[ <center><img src = "images/grammarofgraphics.png", width = 400px></center> ] .pull-right[ `data =` cleaned and mutated or summarized to give us what we'd like to visualize `geom =` what kind of a visual do you want? A map, bar-chart, line-chart, scatter-plot, something else? `coordinate system =` what should go on the x-axis? y-axis? What other `aesthetics` should be used, border colors, fill colors, text or other annotations, plotting symbols, facet the plot to show breakouts by some attribute, something else? ] --- .pull-left[ ```r load("data/multkey.merge.RData") my.df <- multkey.merge library(tidyverse) ggplot(data = my.df) ``` <img src="Module03_files/figure-html/unnamed-chunk-1-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ The canvas is blank because we have not specified what goes on the x-axis, y-axis That can be specified via `aes` ... the aesthetics ] --- .pull-left[ ```r ggplot(data = my.df, aes(x = college_desc)) ``` <img src="Module03_files/figure-html/unnamed-chunk-2-1.png" width="100%" style="display: block; margin: auto;" /> Aha! Now we see the labels on the x-axis but nothing more. Why? Because we have not specified what type of a graphic we want ... a bar-chart perhaps? ] .pull-right[ ```r ggplot(data = my.df, aes(x = college_desc)) + geom_bar() ``` <img src="Module03_files/figure-html/unnamed-chunk-3-1.png" width="100%" style="display: block; margin: auto;" /> Notice that `geom_bar()` generates a bar-chart The y-axis is mapping the frequency ] --- We could do better of course, by customizing the labels for the x-axis and y-axis, adding a title and/or subtitle, a caption, maybe even coloring the bars ```r ggplot(data = my.df, aes(x = college_desc, fill = college_desc)) + geom_bar() + labs(title = "Distribution of Students by College", subtitle = "(Multiple Terms)", caption = "Source: Ohio University's Office of Institutional Research") + theme(legend.position = "bottom") ``` .plot-callout[ <img src="Module03_files/figure-html/fig01-inset-1.png" width="100%" height="99%" style="display: block; margin: auto;" /> ] --- <img src="Module03_files/figure-html/fig01-inset-1.png" width="100%" height="99%" style="display: block; margin: auto;" /> --- The x-axis labels are hard to read so we could flip the x- and y-axis ```r ggplot(data = my.df, aes(x = college_desc, fill = college_desc)) + geom_bar() + labs(title = "Distribution of Students by College", subtitle = "(Multiple Terms)", caption = "Source: Ohio University's Office of Institutional Research") + theme(legend.position = "bottom") + coord_flip() ``` .plot-callout[ <img src="Module03_files/figure-html/fig02-inset-1.png" width="100%" height="99%" style="display: block; margin: auto;" /> ] --- <img src="Module03_files/figure-html/fig02-inset-1.png" width="100%" height="99%" style="display: block; margin: auto;" /> --- How about ordering the colleges in terms of increasing/decreasing frequency? ```r library(forcats) my.df %>% group_by(college_desc) %>% summarise(frequency = n()) %>% ggplot(aes(x = fct_reorder(college_desc, frequency), y = frequency, fill = college_desc)) + geom_bar(stat = "identity") + labs(title = "Distribution of Students by College", subtitle = "(Multiple Terms)", caption = "Source: Ohio University's Office of Institutional Research") + theme(legend.position = "bottom") + coord_flip() ``` Note how we are using the pipe operator `%>%` to do some calculations before seamlessly rolling into the plotting commands `fct_reorder(college_desc, frequency)` is ordering the bars for us .plot-callout[ <img src="Module03_files/figure-html/fig03-inset-1.png" width="100%" height="99%" style="display: block; margin: auto;" /> ] --- <img src="Module03_files/figure-html/fig03-inset-1.png" width="100%" height="99%" style="display: block; margin: auto;" /> --- ```r library(forcats) my.df %>% group_by(college_desc) %>% summarise(frequency = n()) %>% ggplot(aes(x = fct_reorder(college_desc, -frequency), y = frequency, fill = college_desc)) + geom_bar(stat = "identity") + labs(title = "Distribution of Students by College", subtitle = "(Multiple Terms)", caption = "Source: Ohio University's Office of Institutional Research") + theme(legend.position = "bottom") + coord_flip() ``` .center[Note: `fct_reorder(college_desc, -frequency)`] .plot-callout[ <img src="Module03_files/figure-html/fig03b-inset-1.png" width="100%" height="99%" style="display: block; margin: auto;" /> ] --- <img src="Module03_files/figure-html/fig03b-inset-1.png" width="100%" height="99%" style="display: block; margin: auto;" /> --- We could still do better ... do we need a legend? No. How about better axis labels? ```r my.df %>% group_by(college_desc) %>% summarise(frequency = n()) %>% ggplot(aes(x = fct_reorder(college_desc, frequency), y = frequency, fill = college_desc)) + geom_bar(stat = "identity") + labs(title = "Distribution of Students by College", subtitle = "(Multiple Terms)", caption = "Source: Ohio University's Office of Institutional Research", x = "Number of Students", y = "College") + theme(legend.position = "hide") + coord_flip() ``` .plot-callout[ <img src="Module03_files/figure-html/fig04-inset-1.png" width="100%" height="99%" style="display: block; margin: auto;" /> ] --- <img src="Module03_files/figure-html/fig04-inset-1.png" width="100%" height="99%" style="display: block; margin: auto;" /> --- ### What if I want to look at enrollment numbers by sex? ```r my.df %>% filter(sex.f %in% c("Male", "Female")) %>% group_by(college_desc, sex.f) %>% summarise(frequency = n()) %>% ggplot(aes(x = fct_reorder(college_desc, frequency), y = frequency, fill = college_desc)) + geom_bar(stat = "identity") + labs(title = "Distribution of Students by College", subtitle = "(Multiple Terms)", caption = "Source: Ohio University's Office of Institutional Research", y = "Number of Students", x = "College") + theme(legend.position = "hide") + coord_flip() + facet_wrap(~ sex.f) ``` .plot-callout[ <img src="Module03_files/figure-html/fig05-inset-1.png" width="100%" height="99%" style="display: block; margin: auto;" /> ] --- <img src="Module03_files/figure-html/fig05-inset-1.png" width="100%" height="99%" style="display: block; margin: auto;" /> --- What if I want to show percentages instead of frequencies? ```r my.df %>% filter(sex.f %in% c("Male", "Female")) %>% group_by(college_desc, sex.f) %>% summarise(frequency = n()) %>% mutate(percent = (frequency / sum(frequency)) * 100) %>% ggplot(aes(x = fct_reorder(college_desc, percent), y = percent, fill = sex.f)) + geom_bar(stat = "identity") + labs(title = "Distribution of Students by Sex and College", subtitle = "(Multiple Terms)", caption = "Source: Ohio University's Office of Institutional Research", y = "Percent", x = "College", fill = "") + theme(legend.position = "bottom") + coord_flip() ``` .plot-callout[ <img src="Module03_files/figure-html/fig06-inset-1.png" width="100%" height="99%" style="display: block; margin: auto;" /> ] --- <img src="Module03_files/figure-html/fig06-inset-1.png" width="100%" height="99%" style="display: block; margin: auto;" /> --- What if I want to store the summarized data as a data frame and then plot? ```r tab1 <- my.df %>% filter(sex.f %in% c("Male", "Female")) %>% group_by(college_desc, sex.f) %>% summarise(frequency = n()) %>% mutate(percent = (frequency / sum(frequency)) * 100) ggplot(data = tab1, aes(x = fct_reorder(college_desc, percent), y = percent, fill = sex.f)) + geom_bar(stat = "identity") + labs(title = "Distribution of Students by Sex and College", subtitle = "(Multiple Terms)", caption = "Source: Ohio University's Office of Institutional Research", y = "Percent", x = "College", fill = "") + theme(legend.position = "bottom") + coord_flip() ``` .center[This approach is handy if you need to print or make available the table] .plot-callout[ <img src="Module03_files/figure-html/fig07-inset-1.png" width="100%" height="99%" style="display: block; margin: auto;" /> ] --- <img src="Module03_files/figure-html/fig07-inset-1.png" width="100%" height="99%" style="display: block; margin: auto;" /> --- ```r tab1p <- tab1 %>% mutate_at(vars(starts_with("percent")), funs(round(., 2))) DT::datatable(tab1p, caption = "Distribution of Students by Sex and College", rownames = FALSE, colnames = c("College", "Sex", "Number", "Percent")) ``` ```{=html} <div id="htmlwidget-5b5b5f9eadd376fe290a" style="width:100%;height:auto;" class="datatables html-widget"></div> <script type="application/json" data-for="htmlwidget-5b5b5f9eadd376fe290a">{"x":{"filter":"none","caption":"<caption>Distribution of Students by Sex and College<\/caption>","data":[["Arts & Sciences","Arts & Sciences","Business","Business","Communication","Communication","Education","Education","Engineering & Technology","Engineering & Technology","Fine Arts","Fine Arts","George Voinovich School","George Voinovich School","Health Sciences & Professions","Health Sciences & Professions","Honors Tutorial","Honors Tutorial","International Studies","International Studies","Miscellaneous","Miscellaneous","Osteopathic Medicine","Osteopathic Medicine","Regional Higher Ed","Regional Higher Ed","University College","University College"],["Female","Male","Female","Male","Female","Male","Female","Male","Female","Male","Female","Male","Female","Male","Female","Male","Female","Male","Female","Male","Female","Male","Female","Male","Female","Male","Female","Male"],[98267,75574,19875,33731,18844,14468,25939,11605,4542,22552,20396,13900,877,440,62345,14740,315,166,506,415,198,110,5608,6562,10774,8269,9164,8378],[56.53,43.47,37.08,62.92,56.57,43.43,69.09,30.91,16.76,83.24,59.47,40.53,66.59,33.41,80.88,19.12,65.49,34.51,54.94,45.06,64.29,35.71,46.08,53.92,56.58,43.42,52.24,47.76]],"container":"<table class=\"display\">\n <thead>\n <tr>\n <th>College<\/th>\n <th>Sex<\/th>\n <th>Number<\/th>\n <th>Percent<\/th>\n <\/tr>\n <\/thead>\n<\/table>","options":{"columnDefs":[{"className":"dt-right","targets":[2,3]}],"order":[],"autoWidth":false,"orderClasses":false}},"evals":[],"jsHooks":[]}</script> ``` --- What if we'd like to break out the preceding plot by students' rank? ```r my.df %>% filter(sex.f %in% c("Male", "Female")) %>% group_by(college_desc, rank_desc, sex.f) %>% summarise(frequency = n()) %>% mutate(percent = (frequency / sum(frequency)) * 100) %>% ggplot(aes(x = fct_reorder(college_desc, percent), y = percent, fill = sex.f)) + geom_bar(stat = "identity") + labs(title = "Distribution of Students by College", subtitle = "(Multiple Terms)", caption = "Source: Ohio University's Office of Institutional Research", y = "Percent", x = "College", fill = "Student's Sex at Birth") + theme(legend.position = "bottom") + coord_flip() + facet_wrap(~ rank_desc) ``` .plot-callout[ <img src="Module03_files/figure-html/fig08-inset-1.png" width="100%" height="99%" style="display: block; margin: auto;" /> ] --- <img src="Module03_files/figure-html/fig08-inset-1.png" width="100%" height="99%" style="display: block; margin: auto;" /> --- ## `geom_line() and geom_point()` Let us calculate enrollments by college and term ```r tab2 <- my.df %>% group_by(term_code, college_desc) %>% summarise(frequency = n_distinct(anon_id)) ggplot() + geom_point(data = tab2, aes(x = term_code, y = frequency, group = college_desc, color = college_desc, size = frequency, shape = college_desc)) + geom_line(data = tab2, aes(x = term_code, y = frequency, group = college_desc, color = college_desc, linetype = college_desc)) + labs(x = "Term", y = "Number Enrolled", color = "") ``` .plot-callout[ <img src="Module03_files/figure-html/fig09-inset-1.png" width="100%" height="99%" style="display: block; margin: auto;" /> ] --- <img src="Module03_files/figure-html/fig09-inset-1.png" width="100%" height="99%" style="display: block; margin: auto;" /> --- class: right, middle <img class="circle" src="https://github.com/aniruhil.png" width="175px"/> # Find me at... [<svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 512 512"><path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"/></svg> @aruhil](http://twitter.com/aruhil) [<svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 512 512"><path d="M326.612 185.391c59.747 59.809 58.927 155.698.36 214.59-.11.12-.24.25-.36.37l-67.2 67.2c-59.27 59.27-155.699 59.262-214.96 0-59.27-59.26-59.27-155.7 0-214.96l37.106-37.106c9.84-9.84 26.786-3.3 27.294 10.606.648 17.722 3.826 35.527 9.69 52.721 1.986 5.822.567 12.262-3.783 16.612l-13.087 13.087c-28.026 28.026-28.905 73.66-1.155 101.96 28.024 28.579 74.086 28.749 102.325.51l67.2-67.19c28.191-28.191 28.073-73.757 0-101.83-3.701-3.694-7.429-6.564-10.341-8.569a16.037 16.037 0 0 1-6.947-12.606c-.396-10.567 3.348-21.456 11.698-29.806l21.054-21.055c5.521-5.521 14.182-6.199 20.584-1.731a152.482 152.482 0 0 1 20.522 17.197zM467.547 44.449c-59.261-59.262-155.69-59.27-214.96 0l-67.2 67.2c-.12.12-.25.25-.36.37-58.566 58.892-59.387 154.781.36 214.59a152.454 152.454 0 0 0 20.521 17.196c6.402 4.468 15.064 3.789 20.584-1.731l21.054-21.055c8.35-8.35 12.094-19.239 11.698-29.806a16.037 16.037 0 0 0-6.947-12.606c-2.912-2.005-6.64-4.875-10.341-8.569-28.073-28.073-28.191-73.639 0-101.83l67.2-67.19c28.239-28.239 74.3-28.069 102.325.51 27.75 28.3 26.872 73.934-1.155 101.96l-13.087 13.087c-4.35 4.35-5.769 10.79-3.783 16.612 5.864 17.194 9.042 34.999 9.69 52.721.509 13.906 17.454 20.446 27.294 10.606l37.106-37.106c59.271-59.259 59.271-155.699.001-214.959z"/></svg> aniruhil.org](https://aniruhil.org) [<svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 512 512"><path d="M476 3.2L12.5 270.6c-18.1 10.4-15.8 35.6 2.2 43.2L121 358.4l287.3-253.2c5.5-4.9 13.3 2.6 8.6 8.3L176 407v80.5c0 23.6 28.5 32.9 42.5 15.8L282 426l124.6 52.2c14.2 6 30.4-2.9 33-18.2l72-432C515 7.8 493.3-6.8 476 3.2z"/></svg> ruhil@ohio.edu](mailto:ruhil@ohio.edu)