Introduction to Basic Inferential Statistics, R, and the {survey} and {srvyr} packages

These are materials from a workshop I taught in August 2019 to Junior Fulbright Scholars from Egypt. The goal was to do a very quick overview of basic inferential statistics before introducing R and RStudio, showing participants how to conduct basic data manipulation with the tidyverse, some data visualizations with ggplot2, and finally some inferential statistics with the survey and srvyr packages.

Stats 101: A Quick Overview

We started the workshop with a quick run through the usual methods of basic descriptive and inferential statistics the participants were likely to have been trained in.

Introduction to R & RStudio

This was a gentle introduction to the basic logic behind the workings and setup of R & RStudio In two cohorts only one student had been exposed to R (but not to RStudio) so the idea was not to overwhelm them in a workshop delivered over a very short period of time and yet leave them with some critical skills they could build on.

Reading the Demographic and Health Survey (DHS) data

All students (save perhaps one across the cohorts) was working with the USAID’s Demographic and Health Survey (DHS) data for Egypt or India. These are some very interesting and complex surveys run periodically in a number of countries and the data are available by registering with the source agency. Survey weights have to be used and so the focus was to essentially show how easy it is to weight the data in R, without needing some SPSS add-on or having to pay for SAS or Stata.

Visualizing the DHS data

The last thing we worked on was learning the basics of ggplot2 … how to generate appropriate graphics given the the type of data to be visually represented. Mapping was of some interest to most of the participants because they wanted to show health outcomes for women or children in the governorates or by some other geographic variation (urban versus rural, for example).