Graphics with ggplot2
Interactive graphics with highcharter
Interactive graphics with plotly
Maps
ggplot2
leaflet
data =
cleaned and mutated or summarized to give us what we'd like to visualize
geom =
what kind of a visual do you want? A map, bar-chart, line-chart, scatter-plot, something else?
coordinate system =
what should go on the x-axis? y-axis?
What other aesthetics
should be used, border colors, fill colors, text or other annotations, plotting symbols, facet the plot to show breakouts by some attribute, something else?
load("data/multkey.merge.RData")my.df <- multkey.mergelibrary(tidyverse)ggplot(data = my.df)
The canvas is blank because we have not specified what goes on the x-axis, y-axis
That can be specified via aes
... the aesthetics
ggplot(data = my.df, aes(x = college_desc))
Aha! Now we see the labels on the x-axis but nothing more. Why?
Because we have not specified what type of a graphic we want ... a bar-chart perhaps?
ggplot(data = my.df, aes(x = college_desc)) + geom_bar()
Notice that geom_bar()
generates a bar-chart
The y-axis is mapping the frequency
We could do better of course, by customizing the labels for the x-axis and y-axis, adding a title and/or subtitle, a caption, maybe even coloring the bars
ggplot(data = my.df, aes(x = college_desc, fill = college_desc)) + geom_bar() + labs(title = "Distribution of Students by College", subtitle = "(Multiple Terms)", caption = "Source: Ohio University's Office of Institutional Research") + theme(legend.position = "bottom")
The x-axis labels are hard to read so we could flip the x- and y-axis
ggplot(data = my.df, aes(x = college_desc, fill = college_desc)) + geom_bar() + labs(title = "Distribution of Students by College", subtitle = "(Multiple Terms)", caption = "Source: Ohio University's Office of Institutional Research") + theme(legend.position = "bottom") + coord_flip()
How about ordering the colleges in terms of increasing/decreasing frequency?
library(forcats)my.df %>% group_by(college_desc) %>% summarise(frequency = n()) %>% ggplot(aes(x = fct_reorder(college_desc, frequency), y = frequency, fill = college_desc)) + geom_bar(stat = "identity") + labs(title = "Distribution of Students by College", subtitle = "(Multiple Terms)", caption = "Source: Ohio University's Office of Institutional Research") + theme(legend.position = "bottom") + coord_flip()
Note how we are using the pipe operator %>%
to do some calculations before seamlessly rolling into the plotting commands
fct_reorder(college_desc, frequency)
is ordering the bars for us
library(forcats)my.df %>% group_by(college_desc) %>% summarise(frequency = n()) %>% ggplot(aes(x = fct_reorder(college_desc, -frequency), y = frequency, fill = college_desc)) + geom_bar(stat = "identity") + labs(title = "Distribution of Students by College", subtitle = "(Multiple Terms)", caption = "Source: Ohio University's Office of Institutional Research") + theme(legend.position = "bottom") + coord_flip()
Note: fct_reorder(college_desc, -frequency)
We could still do better ... do we need a legend? No. How about better axis labels?
my.df %>% group_by(college_desc) %>% summarise(frequency = n()) %>% ggplot(aes(x = fct_reorder(college_desc, frequency), y = frequency, fill = college_desc)) + geom_bar(stat = "identity") + labs(title = "Distribution of Students by College", subtitle = "(Multiple Terms)", caption = "Source: Ohio University's Office of Institutional Research", x = "Number of Students", y = "College") + theme(legend.position = "hide") + coord_flip()
my.df %>% filter(sex.f %in% c("Male", "Female")) %>% group_by(college_desc, sex.f) %>% summarise(frequency = n()) %>% ggplot(aes(x = fct_reorder(college_desc, frequency), y = frequency, fill = college_desc)) + geom_bar(stat = "identity") + labs(title = "Distribution of Students by College", subtitle = "(Multiple Terms)", caption = "Source: Ohio University's Office of Institutional Research", y = "Number of Students", x = "College") + theme(legend.position = "hide") + coord_flip() + facet_wrap(~ sex.f)
What if I want to show percentages instead of frequencies?
my.df %>% filter(sex.f %in% c("Male", "Female")) %>% group_by(college_desc, sex.f) %>% summarise(frequency = n()) %>% mutate(percent = (frequency / sum(frequency)) * 100) %>% ggplot(aes(x = fct_reorder(college_desc, percent), y = percent, fill = sex.f)) + geom_bar(stat = "identity") + labs(title = "Distribution of Students by Sex and College", subtitle = "(Multiple Terms)", caption = "Source: Ohio University's Office of Institutional Research", y = "Percent", x = "College", fill = "") + theme(legend.position = "bottom") + coord_flip()
What if I want to store the summarized data as a data frame and then plot?
tab1 <- my.df %>% filter(sex.f %in% c("Male", "Female")) %>% group_by(college_desc, sex.f) %>% summarise(frequency = n()) %>% mutate(percent = (frequency / sum(frequency)) * 100) ggplot(data = tab1, aes(x = fct_reorder(college_desc, percent), y = percent, fill = sex.f)) + geom_bar(stat = "identity") + labs(title = "Distribution of Students by Sex and College", subtitle = "(Multiple Terms)", caption = "Source: Ohio University's Office of Institutional Research", y = "Percent", x = "College", fill = "") + theme(legend.position = "bottom") + coord_flip()
This approach is handy if you need to print or make available the table
tab1p <- tab1 %>% mutate_at(vars(starts_with("percent")), funs(round(., 2)))DT::datatable(tab1p, caption = "Distribution of Students by Sex and College", rownames = FALSE, colnames = c("College", "Sex", "Number", "Percent"))
<div id="htmlwidget-5b5b5f9eadd376fe290a" style="width:100%;height:auto;" class="datatables html-widget"></div><script type="application/json" data-for="htmlwidget-5b5b5f9eadd376fe290a">{"x":{"filter":"none","caption":"<caption>Distribution of Students by Sex and College<\/caption>","data":[["Arts & Sciences","Arts & Sciences","Business","Business","Communication","Communication","Education","Education","Engineering & Technology","Engineering & Technology","Fine Arts","Fine Arts","George Voinovich School","George Voinovich School","Health Sciences & Professions","Health Sciences & Professions","Honors Tutorial","Honors Tutorial","International Studies","International Studies","Miscellaneous","Miscellaneous","Osteopathic Medicine","Osteopathic Medicine","Regional Higher Ed","Regional Higher Ed","University College","University College"],["Female","Male","Female","Male","Female","Male","Female","Male","Female","Male","Female","Male","Female","Male","Female","Male","Female","Male","Female","Male","Female","Male","Female","Male","Female","Male","Female","Male"],[98267,75574,19875,33731,18844,14468,25939,11605,4542,22552,20396,13900,877,440,62345,14740,315,166,506,415,198,110,5608,6562,10774,8269,9164,8378],[56.53,43.47,37.08,62.92,56.57,43.43,69.09,30.91,16.76,83.24,59.47,40.53,66.59,33.41,80.88,19.12,65.49,34.51,54.94,45.06,64.29,35.71,46.08,53.92,56.58,43.42,52.24,47.76]],"container":"<table class=\"display\">\n <thead>\n <tr>\n <th>College<\/th>\n <th>Sex<\/th>\n <th>Number<\/th>\n <th>Percent<\/th>\n <\/tr>\n <\/thead>\n<\/table>","options":{"columnDefs":[{"className":"dt-right","targets":[2,3]}],"order":[],"autoWidth":false,"orderClasses":false}},"evals":[],"jsHooks":[]}</script>
What if we'd like to break out the preceding plot by students' rank?
my.df %>% filter(sex.f %in% c("Male", "Female")) %>% group_by(college_desc, rank_desc, sex.f) %>% summarise(frequency = n()) %>% mutate(percent = (frequency / sum(frequency)) * 100) %>% ggplot(aes(x = fct_reorder(college_desc, percent), y = percent, fill = sex.f)) + geom_bar(stat = "identity") + labs(title = "Distribution of Students by College", subtitle = "(Multiple Terms)", caption = "Source: Ohio University's Office of Institutional Research", y = "Percent", x = "College", fill = "Student's Sex at Birth") + theme(legend.position = "bottom") + coord_flip() + facet_wrap(~ rank_desc)
geom_line() and geom_point()
Let us calculate enrollments by college and term
tab2 <- my.df %>% group_by(term_code, college_desc) %>% summarise(frequency = n_distinct(anon_id)) ggplot() + geom_point(data = tab2, aes(x = term_code, y = frequency, group = college_desc, color = college_desc, size = frequency, shape = college_desc)) + geom_line(data = tab2, aes(x = term_code, y = frequency, group = college_desc, color = college_desc, linetype = college_desc)) + labs(x = "Term", y = "Number Enrolled", color = "")
Graphics with ggplot2
Interactive graphics with highcharter
Interactive graphics with plotly
Maps
ggplot2
leaflet
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |