Introduction to RAni Ruhil1 / 33

Agenda

Install R and RStudio
- First install the latest version of from here
- Then install the latest version of from here
Test installation
Install some packages
Understand how R Markdown works
Read data in various formats
Basic data processing and saving
Fun with leaflet

2 / 33

Go slow and make sure everyone is able to knit
Minimize panic and keep the environment light

Understand your RStudio Environment

3 / 33

(1) The Console ... (2) Knitting and Code Chunk options

... key options ...

(1) Console = This is where commands are issued to R, either by typing and hitting enter or running commands from a script (like your R Markdown file)
(2) Environment = stores and shows you all the objects created
(3) History shows you a running list of all commands issued to R
(4) Connections = shows you any databases/servers you are connected to and also allows you to initiate a new connection
(5) Files = shows you files and folders in your current working directory, and you can move up/down in the folder hierarchy
(6) Plots = show you all plots that have been generated
(7) Packages = shows you installed packages
(8) help = allows you to get the help pages by typing in keywords
(9) Viewer = shows you are "live" documents running on the server
(10) Knit = allows you to generated html/pdf/word documents from a script
(11) Insert = allows you to insert a vanilla R chunk. You can (and should) give unique name to code chunks so that you can easily diagnose which chunk is not working
(12) Run = allows you to run lines/chunks

Customize the detachable panes via Tools -> Global Options...

You also have a spellchecker; use it

4 / 33

Installing packages

Now we install some packages via Tools -> Install Packages... and updated packages via Tools -> Check for Package Updates...¹

devtools, reshape2, lubridate, car, Hmisc, gapminder, leaflet, 
DT, data.table, htmltools, scales, ggridges, here, knitr, here, 
kableExtra, haven, readr, readxl, ggplot2

Other packages will be installed as needed

Update packages via Tools -> Check for Package Updates...

[1] It is a good idea to update packages on a regular basis but note that every now and then something might break with an update. When this happens check the package's source, usually on github for solutions.

5 / 33

Make sure they install devtools

Rprojects

(1) Create a folder called mpa6020

(2) Inside the mpa6020 folder create a subfolder called data. The folder structure will now be as shown below

mpa6020/
    └── my-rmarkdown-file-01.Rmd
    └── my-rmarkdown-file-02.Rmd
    └── data/
        └── some data file
        └── another data file

All data you download or create go into the data folder. All R code files reside in the mpa6020 folder.

Open the Rmd file I sent you: Module01_forClass.Rmd and save it in the mpa6020 folder. Save the data I sent you to the data folder.

(3) Now create a project via File -> New Project and choose Existing Directory. Browse to the mpa6020 folder and click Create Project. RStudio will restart and when it does you will be in the project folder and will see a file called mpa6020.Rproj

6 / 33

Point out that every time they start working they can click on mpa6020.Rproj and everything should work seamlessly unless something breaks

R Markdown files

Go to New File -> R Markdown ... and enter a My First Rmd File in title and your name.

Click OK.
Now File -> Save As.. and save it as testing_rmd in the code sub-folder
Click this button:

You may see a message that says some packages need to be installed/updated. Allow these to be installed/updated.

7 / 33

Emphasize the importance of the YAML YAML Ain't Markup Language
Urge patience again since some packages may have to be installed more than once, perhaps via devtools, and some may not have admin rights (the horror, the horror!!)
Show them how to knit to Word and to PDF
Tell them you will show them how to generate a slide-deck later, if anyone is interested

... if all goes well ...

As the document knits, watch for error messages

8 / 33

Specific R Markdown code block commands

Golden Rule: Unique name for each chunk (no whitespace in name). Forgot? Use namer()

library(namer)
name_chunks("myfilename.Rmd")

eval = If FALSE, knitr will not run the code in the code chunk.
include = If FALSE, knitr will run the chunk but not include the chunk in the final document.
echo = If FALSE, knitr will not display the code in the code chunk above it’s results in the final document.
error = If FALSE, knitr will not display any error messages generated by the code.
message = If FALSE, knitr will not display any messages generated by the code.
warning = If FALSE, knitr will not display any warning messages generated by the code.
cache = If TRUE, knitr will cache the results to reuse in future knits. Knitr will reuse the results until the code chunk is altered.
dev = The R function name that will be used as a graphical device to record plots, e.g. dev='CairoPDF'.
dpi = A number for knitr to use as the dots per inch (dpi) in graphics (when applicable).
fig.align = 'center', 'left', 'right' alignment in the knit document
fig.height = height of the figure (in inches, for example)
fig.width = width of the figure (in inches, for example)
out.height, out.width = The width and height to scale plots to in the final output.

Other options can be found in the cheatsheet available here

9 / 33

Reading in data files 10 / 33

Reading data

Make sure you have the following data-sets in the data folder. If you don't then the commands that follow will not work. We start by reading a simple comma-separated variable format file and then a tab-delimited variable format file.

library(here) # loaded once per session 
read.csv(here("data", "ImportDataCSV.csv"), sep = ",", header = TRUE) -> df.csv # note sep = ","
read.csv(here("data", "ImportDataTAB.txt"), sep = "\t", header = TRUE) -> df.tab # note sep = "\t"

If the files were read then Environment should show objects called df.csv and df.tab. If you don't see these then check the following:

Make sure you have the csv/txt files in your data folder
Make sure the folder has been correctly named (no blank spaces before or after, all lowercase, etc)
Make sure the data folder is inside mpa6020 -> code

11 / 33

Point out the importance of setting the data path to data/filename.ext

Excel files can be read via the readxl package

library(readxl)
read_excel(here("data", "ImportDataXLS.xls")) -> df.xls 
read_excel(here("data", "ImportDataXLSX.xlsx")) -> df.xlsx

SPSS, Stata, SAS files can be read via the haven package

library(haven)
read_stata(here("data", "ImportDataStata.dta")) -> df.stata 
read_sas(here("data", "ImportDataSAS.sas7bdat")) -> df.sas
read_sav(here("data", "ImportDataSPSS.sav")) -> df.spss

Fixed-width files: It is also common to encounter fixed-width files where the raw data are stored without any gaps between successive variables. However, these files will come with documentation that will tell you where each variable starts and ends, along with other details about each variable.

read.fwf(here("data", "fwfdata.txt"), widths = c(4, 9, 2, 4), header = FALSE, 
                 col.names = c("Name", "Month", "Day", "Year")) -> df.fwf

Notice we need widths = c() and col.names = c(). We will wrestle with some fixed-width files in the coming weeks.

12 / 33

Reading Files from the Web

It is possible to specify the full web-path for a file and read it in, rather than storing a local copy. This is often useful when updated by the source (Census Bureau, Bureau of Labor, Bureau of Economic Analysis, etc.)

read.table("http://data.princeton.edu/wws509/datasets/effort.dat") -> fpe
read.table("https://stats.idre.ucla.edu/stat/data/test.txt", 
                  header = TRUE) -> test.txt 
read.csv("https://stats.idre.ucla.edu/stat/data/test.csv", 
                    header = TRUE) -> test.csv
library(foreign)
read.spss("https://stats.idre.ucla.edu/stat/data/hsb2.sav") -> hsb2.spss
df.hsb2.spss = as.data.frame(hsb2.spss)

hsb2.spss was read with the foreign package², an alternative to haven

foreign calls read.spss while haven calls read_spss

[2] The foreign package will also read Stata, SAS, and other formats. I end up defaulting to haven now. There are other packages for reading SPSS, SAS, etc. files ... sas7bdat, rio, data.table, xlsx, XLConnect, gdata and others.

13 / 33

Point out that they must have an internet connection or else the file won't be read
Remind them that if the source file's URL change the file may not be read, but it is easy to check if a broken URL is the source of the error by using a browser

Reading compressed files

temp = tempfile()
download.file("ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/nvss/bridged_race/pcen_v2018_y1018.sas7bdat.zip",
              temp,
              mode = "wb")
haven::read_sas(unz(temp, "pcen_v2018_y1018.sas7bdat")) -> oursasdata
unlink(temp)

You can save your data in a format that R will recognize, giving it the RData or rdata extension

save(oursasdata, file = "data/oursasdata.RData")
save(oursasdata, file = "data/oursasdata.rdata")

Check your data directory to confirm both files are present

14 / 33

Labeling data - a small example 15 / 33

Minimal example of data processing

Working with the hsb2 data: 200 students from the High school and Beyond study

read.table('https://stats.idre.ucla.edu/stat/data/hsb2.csv',
                  header = TRUE, sep = ",") -> hsb2

female = (0/1)
race = (1=hispanic 2=asian 3=african-amer 4=white)
ses = socioeconomic status (1=low 2=middle 3=high)
schtyp = type of school (1=public 2=private)
prog = type of program (1=general 2=academic 3=vocational)
read = standardized reading score
write = standardized writing score
math = standardized math score
science = standardized science score
socst = standardized social studies score

16 / 33

	id	female	race	ses	schtyp	prog	read	write	math	science	socst
1	70	0	4	1	1	1	57	52	41	47	57
2	121	1	4	2	1	3	68	59	53	63	61
3	86	0	4	3	1	1	44	33	54	58	31
4	141	0	4	3	1	3	63	44	47	53	56
5	172	0	4	2	1	2	47	52	57	53	61

17 / 33

There are no label values for the various qualitative variables (female, race, ses, schtyp, and prog) so we create these.³

factor(hsb2$female, levels = c(0, 1), labels = c("Male", "Female")) -> hsb2$female
factor(hsb2$race, levels = c(1:4), labels = c("Hispanic", "Asian", "African American", "White")) -> hsb2$race
factor(hsb2$ses, levels = c(1:3), labels = c("Low", "Middle", "High")) -> hsb2$ses 
factor(hsb2$schtyp, levels = c(1:2), labels = c("Public", "Private")) -> hsb2$schtyp 
factor(hsb2$prog, levels = c(1:3), labels = c("General", "Academic", "Vocational")) -> hsb2$prog

[3] This is just a quick run through with creating value labels; we will cover this in greater detail in a later module.

18 / 33

Show entries

	id	female	race	ses	schtyp	prog	read
1	70	Male	White	Low	Public	General	57
2	121	Female	White	Middle	Public	Vocational	68
3	86	Male	White	High	Public	General	44
4	141	Male	White	High	Public	Vocational	63
5	172	Male	White	Middle	Public	Academic	47

Showing 1 to 5 of 200 entries

Previous1 2 3 4 5…40Next

save your work!!

Having added labels to the factors in hsb2 we can now save the data for later use.

save(hsb2, file = "data/hsb2.RData")

Let us test if this R Markdown file will to html

If all is good then we can Close Project

RStudio will close your project and reopen in a vanilla session

19 / 33

Help with any knitting problems
Remind them to save the Rmd before they Close Project

More with data 20 / 33

Data in packages

Almost all R packages come bundled with data-sets, too many of them to walk you through but

To load data from a package, if you know the data-set's name, run

library(HistData)
data("Galton")
names(Galton)

## [1] "parent" "child"

or you can run

data("GaltonFamilies", package = "HistData")
names(GaltonFamilies)

## [1] "family"          "father"          "mother"          "midparentHeight"
## [5] "children"        "childNum"        "gender"          "childHeight"

21 / 33

Saving data and workspaces

You can certainly save your data via

save(dataname, file = "filepath/filename.RData") or
save(dataname, file = "filepath/filename.rdata")

data(mtcars)
save(mtcars, file = "data/mtcars.RData")
{{rm(list = ls())}}# To clear the Environment
load("data/mtcars.RData")

You can also save multiple data files as follows:

data(mtcars)
library(ggplot2)
data(diamonds)
save(mtcars, diamonds, file = "data/mydata.RData")
rm(list = ls()) # To clear the Environment
load("data/mydata.RData")

22 / 33

If you want to save just a single object from the environment and then load it in a later session, maybe with a different name, then you should use saveRDS() and readRDS()

data(mtcars)
saveRDS(mtcars, file = "data/mydata.RDS")
rm(list = ls()) # To clear the Environment
readRDS("data/mydata.RDS") -> ourdata

If instead you did the following, note that you have to did the following, the file will be read with the name when saved

data(mtcars)
save(mtcars, file = "data/mtcars.RData")
rm(list = ls())  # To clear the Environment
load("data/mtcars.RData") -> ourdata # Note ourdata is listed as "mtcars"

If you want to save everything you have done in the work session you can via save.image()

save.image(file = "mywork_jan182018.RData")

The next time you start RStudio this image will be automatically loaded
Useful if you have a lot of R code you have written and various objects generated and do not want to start from scratch the next time around.

23 / 33

Let them know that if not in a project and they try to close RStudio after some code has been run, they will be prompted to save (or not) the workspace and they should say "no"

Mapping in R with leaflet

24 / 33

leaflet is an easy to learn JavaScript library that generates interactive maps

library(leaflet)
library(leaflet.extras)
library(widgetframe)
leaflet() %>% setView(lat = 39.322577, lng = -82.106336, zoom = 14) %>% 
  addTiles() %>% setMapWidgetStyle() %>%
  frameWidget(height = '275')

setView() centers the map with given lat/lng
zoom = applies zoom factor

25 / 33

... drop a pin on Building 21

leaflet() %>% setView(lat = 39.322577, lng = -82.106336, zoom = 15) %>% 
  addMarkers(lat = 39.319984, lng = -82.107084, popup = c("The Ridges, Building 21")) %>% 
  addTiles() %>% setMapWidgetStyle() %>%
  frameWidget(height = '325')

26 / 33

Exercises for practice

27 / 33

Ex. 1: Creating and knitting a new RMarkdown file

Open a fresh session by launching RStudio and then running File -> Open Project...

Give it a title, your name as the author, and then save it with in code with the following name: m1ex1.Rmd

Delete all content after the following code chunk

Add this level 1 heading The Starwars Data and then insert your first code chunk exactly as shown below

library(dplyr)
data(starwars)
str(starwars)

Add this level 2 heading Character Heights and Weights and then your second code chunk

plot(starwars$height, plot$mass)

Now knit this file to html

28 / 33

Ex. 2: Lorem Ipsum paragraphs and graphs

Go to this website and generate five Lorem Ipsum placeholder text paragraphs

para 1: must have level 1 heading
para 2: must have level 2 heading
para 3: must have level 3 heading
para 4: must have level 4 heading
para 5: must have level 5 heading

Using the starwars data, create five code chunks, one after each paragraph

Each code chunk will have the same R code (see below)

plot(starwars$height, plot$mass)

Now knit this file to html

29 / 33

Ex. 3: Reading in three data files

Create a new RMarkdown file that is blank after the initial setup code chunk

Insert a code chunk that reads in both these files found on the web

http://www.stata.com/data/jwooldridge/eacsap/mroz.dta
http://calcnet.mth.cmich.edu/org/spss/V16_materials/DataSets_v16/airline_passengers.sav

In a follow-up code chunk, run the summary() command on each data-set

In a separate code chunk, read in this dataset after you download it and save the unzipped file in your data folder.

The variable gender has the following codes: Zero = unknown; 1 = male; 2 = female
Use this coding scheme to convert gender into a factor with these value labels

In a follow-up chunk run both the following commands on this data-set

names()
str()
summary()

In a final chunk, run the commands necessary to save each of the three data-sets as separate RData files. Make sure you save them in your data folder. Now knit the complete Rmd file to html

30 / 33

Ex. 4: Knitting with prettydoc

I'd like you to use a specific Rmd because these are very readable

You had installed the prettydoc package so now create a prettydoc Rmd file as shown below:

Now take all the text and code chunk you created in Ex. 3 and insert it in this file. Make sure you add a title, etc in the YAML and then knit the file to html

You can play with the theme: and highlight: fields, choosing from the options displayed here

To see native R Markdown formatting options read the documentation

31 / 33

Point out that the native Rmd formats allow for a lot more options than does the prettydoc format
Initial files I provide will be prettydoc because of their simplicity but later formats will rely on more flexible Rmd formats

RStudio webinars for more details

RStudio runs and archives free webinars. Sign up with your email and watch them if you want more details of specific functionalities

32 / 33

Find me at...

@aruhil
aniruhil.org
ruhil@ohio.edu

33 / 33

Help

Keyboard shortcuts

↑, ←, Pg Up, k

Go to previous slide

↓, →, Pg Dn, Space, j

Go to next slide

Home

Go to first slide

End

Go to last slide

Number + Return

Go to specific slide

b / m / f

Toggle blackout / mirrored / fullscreen mode

Clone slideshow

Toggle presenter mode

Restart the presentation timer

?, h

Toggle this help