+ - 0:00:00
Notes for current slide
Notes for next slide
  • Go slow and make sure everyone is able to knit
  • Minimize panic and keep the environment light

Introduction to R

Ani Ruhil

1 / 33

Agenda

  • Install R and RStudio
    • First install the latest version of from here
    • Then install the latest version of from here
  • Test installation
  • Install some packages
  • Understand how R Markdown works
  • Read data in various formats
  • Basic data processing and saving
  • Fun with leaflet
2 / 33
  • Go slow and make sure everyone is able to knit
  • Minimize panic and keep the environment light

Understand your RStudio Environment

3 / 33

(1) The Console ... (2) Knitting and Code Chunk options

... key options ...

(1) Console = This is where commands are issued to R, either by typing and hitting enter or running commands from a script (like your R Markdown file)
(2) Environment = stores and shows you all the objects created
(3) History shows you a running list of all commands issued to R
(4) Connections = shows you any databases/servers you are connected to and also allows you to initiate a new connection
(5) Files = shows you files and folders in your current working directory, and you can move up/down in the folder hierarchy
(6) Plots = show you all plots that have been generated
(7) Packages = shows you installed packages
(8) help = allows you to get the help pages by typing in keywords
(9) Viewer = shows you are "live" documents running on the server
(10) Knit = allows you to generated html/pdf/word documents from a script
(11) Insert = allows you to insert a vanilla R chunk. You can (and should) give unique name to code chunks so that you can easily diagnose which chunk is not working
(12) Run = allows you to run lines/chunks

Customize the detachable panes via Tools -> Global Options...

You also have a spellchecker; use it

4 / 33

Installing packages

Now we install some packages via Tools -> Install Packages... and updated packages via Tools -> Check for Package Updates...1

devtools, reshape2, lubridate, car, Hmisc, gapminder, leaflet,
DT, data.table, htmltools, scales, ggridges, here, knitr, here,
kableExtra, haven, readr, readxl, ggplot2

Other packages will be installed as needed

Update packages via Tools -> Check for Package Updates...

[1] It is a good idea to update packages on a regular basis but note that every now and then something might break with an update. When this happens check the package's source, usually on github for solutions.

5 / 33
  • Make sure they install devtools

Rprojects

(1) Create a folder called mpa6020

(2) Inside the mpa6020 folder create a subfolder called data. The folder structure will now be as shown below

mpa6020/
└── my-rmarkdown-file-01.Rmd
└── my-rmarkdown-file-02.Rmd
└── data/
└── some data file
└── another data file

All data you download or create go into the data folder. All R code files reside in the mpa6020 folder.

Open the Rmd file I sent you: Module01_forClass.Rmd and save it in the mpa6020 folder. Save the data I sent you to the data folder.

(3) Now create a project via File -> New Project and choose Existing Directory. Browse to the mpa6020 folder and click Create Project. RStudio will restart and when it does you will be in the project folder and will see a file called mpa6020.Rproj

6 / 33
  • Point out that every time they start working they can click on mpa6020.Rproj and everything should work seamlessly unless something breaks

R Markdown files

  • Go to New File -> R Markdown ... and enter a My First Rmd File in title and your name.

  • Click OK.
  • Now File -> Save As.. and save it as testing_rmd in the code sub-folder
  • Click this button:

    You may see a message that says some packages need to be installed/updated. Allow these to be installed/updated.

7 / 33
  • Emphasize the importance of the YAML YAML Ain't Markup Language
  • Urge patience again since some packages may have to be installed more than once, perhaps via devtools, and some may not have admin rights (the horror, the horror!!)
  • Show them how to knit to Word and to PDF
  • Tell them you will show them how to generate a slide-deck later, if anyone is interested

... if all goes well ...

As the document knits, watch for error messages

8 / 33

Specific R Markdown code block commands

Golden Rule: Unique name for each chunk (no whitespace in name). Forgot? Use namer()

library(namer)
name_chunks("myfilename.Rmd")
  • eval = If FALSE, knitr will not run the code in the code chunk.
  • include = If FALSE, knitr will run the chunk but not include the chunk in the final document.
  • echo = If FALSE, knitr will not display the code in the code chunk above it’s results in the final document.
  • error = If FALSE, knitr will not display any error messages generated by the code.
  • message = If FALSE, knitr will not display any messages generated by the code.
  • warning = If FALSE, knitr will not display any warning messages generated by the code.
  • cache = If TRUE, knitr will cache the results to reuse in future knits. Knitr will reuse the results until the code chunk is altered.
  • dev = The R function name that will be used as a graphical device to record plots, e.g. dev='CairoPDF'.
  • dpi = A number for knitr to use as the dots per inch (dpi) in graphics (when applicable).
  • fig.align = 'center', 'left', 'right' alignment in the knit document
  • fig.height = height of the figure (in inches, for example)
  • fig.width = width of the figure (in inches, for example)
  • out.height, out.width = The width and height to scale plots to in the final output.

Other options can be found in the cheatsheet available here

9 / 33

Reading in data files

10 / 33

Reading data

Make sure you have the following data-sets in the data folder. If you don't then the commands that follow will not work. We start by reading a simple comma-separated variable format file and then a tab-delimited variable format file.

library(here) # loaded once per session
read.csv(here("data", "ImportDataCSV.csv"), sep = ",", header = TRUE) -> df.csv # note sep = ","
read.csv(here("data", "ImportDataTAB.txt"), sep = "\t", header = TRUE) -> df.tab # note sep = "\t"

If the files were read then Environment should show objects called df.csv and df.tab. If you don't see these then check the following:

  • Make sure you have the csv/txt files in your data folder
  • Make sure the folder has been correctly named (no blank spaces before or after, all lowercase, etc)
  • Make sure the data folder is inside mpa6020 -> code
11 / 33
  • Point out the importance of setting the data path to data/filename.ext

Excel files can be read via the readxl package

library(readxl)
read_excel(here("data", "ImportDataXLS.xls")) -> df.xls
read_excel(here("data", "ImportDataXLSX.xlsx")) -> df.xlsx

SPSS, Stata, SAS files can be read via the haven package

library(haven)
read_stata(here("data", "ImportDataStata.dta")) -> df.stata
read_sas(here("data", "ImportDataSAS.sas7bdat")) -> df.sas
read_sav(here("data", "ImportDataSPSS.sav")) -> df.spss

Fixed-width files: It is also common to encounter fixed-width files where the raw data are stored without any gaps between successive variables. However, these files will come with documentation that will tell you where each variable starts and ends, along with other details about each variable.

read.fwf(here("data", "fwfdata.txt"), widths = c(4, 9, 2, 4), header = FALSE,
col.names = c("Name", "Month", "Day", "Year")) -> df.fwf

Notice we need widths = c() and col.names = c(). We will wrestle with some fixed-width files in the coming weeks.

12 / 33

Reading Files from the Web

It is possible to specify the full web-path for a file and read it in, rather than storing a local copy. This is often useful when updated by the source (Census Bureau, Bureau of Labor, Bureau of Economic Analysis, etc.)

read.table("http://data.princeton.edu/wws509/datasets/effort.dat") -> fpe
read.table("https://stats.idre.ucla.edu/stat/data/test.txt",
header = TRUE) -> test.txt
read.csv("https://stats.idre.ucla.edu/stat/data/test.csv",
header = TRUE) -> test.csv
library(foreign)
read.spss("https://stats.idre.ucla.edu/stat/data/hsb2.sav") -> hsb2.spss
df.hsb2.spss = as.data.frame(hsb2.spss)

hsb2.spss was read with the foreign package2, an alternative to haven

  • foreign calls read.spss while haven calls read_spss

[2] The foreign package will also read Stata, SAS, and other formats. I end up defaulting to haven now. There are other packages for reading SPSS, SAS, etc. files ... sas7bdat, rio, data.table, xlsx, XLConnect, gdata and others.

13 / 33
  • Point out that they must have an internet connection or else the file won't be read
  • Remind them that if the source file's URL change the file may not be read, but it is easy to check if a broken URL is the source of the error by using a browser

Reading compressed files

temp = tempfile()
download.file("ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/nvss/bridged_race/pcen_v2018_y1018.sas7bdat.zip",
temp,
mode = "wb")
haven::read_sas(unz(temp, "pcen_v2018_y1018.sas7bdat")) -> oursasdata
unlink(temp)

You can save your data in a format that R will recognize, giving it the RData or rdata extension

save(oursasdata, file = "data/oursasdata.RData")
save(oursasdata, file = "data/oursasdata.rdata")

Check your data directory to confirm both files are present

14 / 33

Labeling data - a small example

15 / 33

Minimal example of data processing

Working with the hsb2 data: 200 students from the High school and Beyond study

read.table('https://stats.idre.ucla.edu/stat/data/hsb2.csv',
header = TRUE, sep = ",") -> hsb2
  • female = (0/1)
  • race = (1=hispanic 2=asian 3=african-amer 4=white)
  • ses = socioeconomic status (1=low 2=middle 3=high)
  • schtyp = type of school (1=public 2=private)
  • prog = type of program (1=general 2=academic 3=vocational)
  • read = standardized reading score
  • write = standardized writing score
  • math = standardized math score
  • science = standardized science score
  • socst = standardized social studies score
16 / 33
17 / 33

There are no label values for the various qualitative variables (female, race, ses, schtyp, and prog) so we create these.3

factor(hsb2$female, levels = c(0, 1), labels = c("Male", "Female")) -> hsb2$female
factor(hsb2$race, levels = c(1:4), labels = c("Hispanic", "Asian", "African American", "White")) -> hsb2$race
factor(hsb2$ses, levels = c(1:3), labels = c("Low", "Middle", "High")) -> hsb2$ses
factor(hsb2$schtyp, levels = c(1:2), labels = c("Public", "Private")) -> hsb2$schtyp
factor(hsb2$prog, levels = c(1:3), labels = c("General", "Academic", "Vocational")) -> hsb2$prog

[3] This is just a quick run through with creating value labels; we will cover this in greater detail in a later module.

18 / 33

save your work!!

Having added labels to the factors in hsb2 we can now save the data for later use.

save(hsb2, file = "data/hsb2.RData")

Let us test if this R Markdown file will to html

If all is good then we can Close Project

  • RStudio will close your project and reopen in a vanilla session
19 / 33
  • Help with any knitting problems
  • Remind them to save the Rmd before they Close Project

More with data

20 / 33

Data in packages

Almost all R packages come bundled with data-sets, too many of them to walk you through but

To load data from a package, if you know the data-set's name, run

library(HistData)
data("Galton")
names(Galton)
## [1] "parent" "child"

or you can run

data("GaltonFamilies", package = "HistData")
names(GaltonFamilies)
## [1] "family" "father" "mother" "midparentHeight"
## [5] "children" "childNum" "gender" "childHeight"
21 / 33

Saving data and workspaces

You can certainly save your data via

  • save(dataname, file = "filepath/filename.RData") or
  • save(dataname, file = "filepath/filename.rdata")
data(mtcars)
save(mtcars, file = "data/mtcars.RData")
{{rm(list = ls())}}# To clear the Environment
load("data/mtcars.RData")

You can also save multiple data files as follows:

data(mtcars)
library(ggplot2)
data(diamonds)
save(mtcars, diamonds, file = "data/mydata.RData")
rm(list = ls()) # To clear the Environment
load("data/mydata.RData")
22 / 33

If you want to save just a single object from the environment and then load it in a later session, maybe with a different name, then you should use saveRDS() and readRDS()

data(mtcars)
saveRDS(mtcars, file = "data/mydata.RDS")
rm(list = ls()) # To clear the Environment
readRDS("data/mydata.RDS") -> ourdata

If instead you did the following, note that you have to did the following, the file will be read with the name when saved

data(mtcars)
save(mtcars, file = "data/mtcars.RData")
rm(list = ls()) # To clear the Environment
load("data/mtcars.RData") -> ourdata # Note ourdata is listed as "mtcars"

If you want to save everything you have done in the work session you can via save.image()

save.image(file = "mywork_jan182018.RData")
  • The next time you start RStudio this image will be automatically loaded
  • Useful if you have a lot of R code you have written and various objects generated and do not want to start from scratch the next time around.
23 / 33

Let them know that if not in a project and they try to close RStudio after some code has been run, they will be prompted to save (or not) the workspace and they should say "no"

Mapping in R with leaflet

24 / 33

leaflet is an easy to learn JavaScript library that generates interactive maps

library(leaflet)
library(leaflet.extras)
library(widgetframe)
leaflet() %>% setView(lat = 39.322577, lng = -82.106336, zoom = 14) %>%
addTiles() %>% setMapWidgetStyle() %>%
frameWidget(height = '275')
  • setView() centers the map with given lat/lng
  • zoom = applies zoom factor
25 / 33

... drop a pin on Building 21

leaflet() %>% setView(lat = 39.322577, lng = -82.106336, zoom = 15) %>%
addMarkers(lat = 39.319984, lng = -82.107084, popup = c("The Ridges, Building 21")) %>%
addTiles() %>% setMapWidgetStyle() %>%
frameWidget(height = '325')
26 / 33

Exercises for practice

27 / 33

Ex. 1: Creating and knitting a new RMarkdown file

Open a fresh session by launching RStudio and then running File -> Open Project...

Give it a title, your name as the author, and then save it with in code with the following name: m1ex1.Rmd

Delete all content after the following code chunk

Add this level 1 heading The Starwars Data and then insert your first code chunk exactly as shown below

library(dplyr)
data(starwars)
str(starwars)

Add this level 2 heading Character Heights and Weights and then your second code chunk

plot(starwars$height, plot$mass)

Now knit this file to html

28 / 33

Ex. 2: Lorem Ipsum paragraphs and graphs

Go to this website and generate five Lorem Ipsum placeholder text paragraphs

  • para 1: must have level 1 heading
  • para 2: must have level 2 heading
  • para 3: must have level 3 heading
  • para 4: must have level 4 heading
  • para 5: must have level 5 heading

Using the starwars data, create five code chunks, one after each paragraph

  • Each code chunk will have the same R code (see below)
plot(starwars$height, plot$mass)

Now knit this file to html

29 / 33

Ex. 3: Reading in three data files

Create a new RMarkdown file that is blank after the initial setup code chunk

Insert a code chunk that reads in both these files found on the web

  • http://www.stata.com/data/jwooldridge/eacsap/mroz.dta
  • http://calcnet.mth.cmich.edu/org/spss/V16_materials/DataSets_v16/airline_passengers.sav

In a follow-up code chunk, run the summary() command on each data-set

In a separate code chunk, read in this dataset after you download it and save the unzipped file in your data folder.

  • The variable gender has the following codes: Zero = unknown; 1 = male; 2 = female
  • Use this coding scheme to convert gender into a factor with these value labels

In a follow-up chunk run both the following commands on this data-set

  • names()
  • str()
  • summary()

In a final chunk, run the commands necessary to save each of the three data-sets as separate RData files. Make sure you save them in your data folder. Now knit the complete Rmd file to html

30 / 33

Ex. 4: Knitting with prettydoc

I'd like you to use a specific Rmd because these are very readable

You had installed the prettydoc package so now create a prettydoc Rmd file as shown below:

Now take all the text and code chunk you created in Ex. 3 and insert it in this file. Make sure you add a title, etc in the YAML and then knit the file to html

You can play with the theme: and highlight: fields, choosing from the options displayed here

To see native R Markdown formatting options read the documentation

31 / 33
  • Point out that the native Rmd formats allow for a lot more options than does the prettydoc format
  • Initial files I provide will be prettydoc because of their simplicity but later formats will rely on more flexible Rmd formats

RStudio webinars for more details

RStudio runs and archives free webinars. Sign up with your email and watch them if you want more details of specific functionalities

32 / 33

Agenda

  • Install R and RStudio
    • First install the latest version of from here
    • Then install the latest version of from here
  • Test installation
  • Install some packages
  • Understand how R Markdown works
  • Read data in various formats
  • Basic data processing and saving
  • Fun with leaflet
2 / 33
  • Go slow and make sure everyone is able to knit
  • Minimize panic and keep the environment light
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow