Visualizing Data

Ani Ruhil
2020-07-07

ggplot2 is one of the more popular R packages for data visualization and hence is the package I will walk you through. It can do a lot but we will only focus on the very basics, the type of visualizations you may need for program evaluation. In particular, we will look at bar-charts, histograms, box-plots, scatter-plots, line-charts, and if we get that far, maybe a simple map or two. I will stick with our myhsb data so let us load our file, and the ggplot2 library as well.

library(here)

load(here("workshops/ropeg/handouts/data", "myhsb.RData"))

library(ggplot2)

Bar-Charts

Let us build a bar-chart of read.quartiles

ggplot(
  data = myhsb, # the data to be used
  aes(x = read.quartiles) # what should go on the x-axis
  ) +
  geom_bar() # the type of geom we want

Now, this is the same as moving all of the aes(...) commands to within the geom_bar(...) as shown below.

ggplot(
  data = myhsb # the data to be used
  ) +
  geom_bar(
      aes(x = read.quartiles) # what should go on the x-axis
  ) # the type of geom we want

We can spruce things up by improving the labels on the x-axis and y-axis, and by adding a title to the plot.

ggplot(
  data = myhsb 
  ) +
  geom_bar(
      aes(x = read.quartiles) 
  ) + 
  labs(
    x = "Quartiles of the Standardized Reading Score",
    y = "Frequency",
    title = "Bar-chart of Reading Quartiles"
  )

Now you might want to explore differences between male and female students here. Since the bars will be gray for both, we can use a fill statement that will lean on unique levels of the variable we specify and assign a unique color to each level.

ggplot(
  data = myhsb 
  ) +
  geom_bar(
      aes(x = read.quartiles, fill = female.f)
  ) + 
  labs(
    x = "Quartiles of the Standardized Reading Score",
    y = "Frequency",
    title = "Bar-chart of Reading Quartiles",
    subtitle = "(by Sex)"
  )

Notice this just stacks one group on top of another, making comparison difficult. One way to avoid this would be to tell R to dodge the bars, i.e., put them side-by-side.

ggplot(
  data = myhsb 
  ) +
  geom_bar(
      aes(x = read.quartiles, fill = female.f),
      position = "dodge"
  ) + 
  labs(
    x = "Quartiles of the Standardized Reading Score",
    y = "Frequency",
    title = "Bar-chart of Reading Quartiles",
    subtitle = "(by Sex)"
  )

Alternatively, you could have also put each sex side-by-side via facet_grid().

ggplot(
  data = myhsb 
  ) +
  geom_bar(
      aes(x = read.quartiles)
  ) + 
  facet_grid(~ female.f) + 
  labs(
    x = "Quartiles of the Standardized Reading Score",
    y = "Frequency",
    title = "Bar-chart of Reading Quartiles",
    subtitle = "(by Sex)"
  )

Now, the beautiful thing here is say you want to also break this out by race/ethnicity. How could that work?

ggplot(
  data = myhsb 
  ) +
  geom_bar(
      aes(x = read.quartiles)
  ) + 
  facet_grid(race.f ~ female.f) + 
  labs(
    x = "Quartiles of the Standardized Reading Score",
    y = "Frequency",
    title = "Bar-chart of Reading Quartiles",
    subtitle = "(by Race/Ethnicity and Sex)"
  )