+ - 0:00:00
Notes for current slide
Notes for next slide

Introduction to R & RStudio

Professor Ruhil

1 / 22

Agenda

  • Install R and RStudio, and the RStudio Cloud alternative
  • Understand the RStudio panes and functionalities
  • Installing packages
  • Understand how R Markdown works
  • Read and write data in various formats
  • Brief overview of variable types and labeling values
  • Saving data in R format
2 / 22

Installing R & RStudio vs. RStudio Cloud

(1) First install the latest version of from here

(2) Then install the latest version of from here

(3) Launch RStudio and check that it shows something like the image below:

(1) Use the link you received in your email to create a free account on RStudio Cloud, and then to access the course project. Once you are logged into RStudio Cloud you should see a tab called "Project." Clicking on this should give you the following browser screen:

3 / 22
  • Go slow and make sure everyone is able to knit
  • Minimize panic and keep the environment light

Understand your RStudio Environment

4 / 22

(1) The Console ... (2) Knitting and Code Chunk options

Installing packages

Open the Rmd file I sent you: Module01.Rmd and save it in the code folder Save the data I sent you to the data folder

Now we install some packages via Tools -> Install Packages...

and update packages via Tools -> Check for Package Updates...1

devtools, ggplot2, dplyr, reshape2, lubridate, car, Hmisc,
gapminder, leaflet, prettydoc, DT, data.table, htmltools,
scales, ggridges

Other packages will be installed as needed.

Note: If you are running some code I have provided or you have found online and you get a message saying

Error in library(xyz) : there is no package called ‘xyz’

go ahead and install that library.

[1] It is a good idea to update packages regularly. Every now and then an update might break something but it is usually fixed sooner rather than later by the developer.

5 / 22
  • Make sure they install devtools and prettydoc

Rprojects (ONLY if installing on your own computer)

  • Create a folder called mpa5830. Inside mpa5830 create ONE sub-folder called data. The folder structure will now be
mpa5830/
└── data/
└── datafile-1
└── datafile-2
└── ...
  • Create a project via File -> New Project, choose Existing Directory,
  • Browse to the mpa5830 folder

  • RStudio will restart and you will be in the project folder, seeing a file called mpa5830.Rproj

  • From now on, start every session by double-clicking mpa5830.Rproj

6 / 22
  • Point out that every time they start working they can click on mpa5830.Rproj and everything should work seamlessly unless something breaks

R Markdown files

  • Go to New File -> R Markdown ... and enter a My First Rmd File in title and your name.

  • Click OK.
  • Now File -> Save As.. and save it as testing_rmd in the code sub-folder
  • Click this button:

    You may see a message that says some packages need to be installed/updated. Allow these to be installed/updated.

7 / 22
  • Emphasize the importance of the YAML YAML Ain't Markup Language
  • Urge patience again since some packages may have to be installed more than once, perhaps via devtools, and some may not have admin rights (the horror, the horror!!)
  • Show them how to knit to Word and to PDF
  • Tell them you will show them how to generate a slide-deck later, if anyone is interested

... if all goes well ...

As the document knits, watch for error messages

8 / 22

Reading data

Make sure you have the following data-sets in the data folder. If you don't then the commands that follow will not work. We start by reading a simple comma-separated variable format file and then a tab-delimited variable format file.

library(here)
read.csv(
here("data", "ImportDataCSV.csv"),
sep = ",",
header = TRUE) -> df.csv # note sep = ","
read.csv(
here("data", "ImportDataTAB.txt"),
sep = "\t",
header = TRUE) -> df.tab # note sep = "\t"

If the files were read then Global Environment should show objects called df.csv and df.tab. If you don't see these then check the following:

  • Make sure you have the files in your data folder
  • Make sure the folder has been correctly named (no blank spaces before or after, all lowercase, etc)
  • Make sure the data folder is inside mpa5830
9 / 22
  • Point out the importance of setting the data path to ../data/filename.ext

(a) Excel files can be read via the readxl package

library(readxl)
read_excel(
here("data", "ImportDataXLS.xls")
) -> df.xls
read_excel(
here("data", "ImportDataXLSX.xlsx")
) -> df.xlsx

(b) SPSS, Stata, SAS files can be read via the haven package

library(haven)
read_stata(
here("data", "ImportDataStata.dta")
) -> df.stata # Stata data file
read_sas(
here("data", "ImportDataSAS.sas7bdat")
) -> df.sas # SAS data file
read_sav(
here("data", "ImportDataSPSS.sav")
) -> df.spss # SPSS data file
10 / 22

(c) It is also common to encounter fixed-width files where the raw data are stored without any gaps between successive variables. However, these files will come with documentation that will tell you where each variable starts and ends, along with other details about each variable.

read.fwf(
here("data", "fwfdata.txt"),
widths = c(4, 9, 2, 4),
header = FALSE,
col.names = c("Name", "Month", "Day", "Year")
) -> df.fw

Notice we need widths = c() and col.names = c()

11 / 22

Reading Files from the Web

It is possible to specify the full web-path for a file and read it in, rather than storing a local copy. This is often useful when updated by the source (Census Bureau, Bureau of Labor, Bureau of Economic Analysis, etc.)

read.table(
"http://data.princeton.edu/wws509/datasets/effort.dat"
) -> fpe
read.table(
"https://stats.idre.ucla.edu/stat/data/test.txt",
header = TRUE
) -> test
read.csv(
"https://stats.idre.ucla.edu/stat/data/test.csv",
header = TRUE
) -> test.csv

The foreign package will also read Stata and other formats. I end up defaulting to haven now. There are other packages for reading SPSS, SAS, etc. files ... sas7bdat, rio, data.table, xlsx, XLConnect, gdata and others. ]

12 / 22
  • Point out that they must have an internet connection or else the file won't be read
  • Remind them that if the source file's URL change the file may not be read, but it is easy to check if a broken URL is the source of the error by using a browser

Reading compressed files

temp <- tempfile()
download.file(
"ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/NVSS/bridgepop/2016/pcen_v2016_y1016.sas7bdat.zip",
temp
)
haven::read_sas(
unz(
temp,
"pcen_v2016_y1016.sas7bdat"
)
) -> oursasdata
unlink(temp)
13 / 22

You can save your data in a format that R will recognize, giving it the RData or rdata extension

save(
oursasdata,
file = here("data", "oursasdata.RData")
)
save(
oursasdata,
file = here("data", "oursasdata.rdata")
)

Check your data directory to confirm both files are present

14 / 22

Minimal example of data processing

Working with the hsb2 data: 200 students from the High school and Beyond study

read.table(
'https://stats.idre.ucla.edu/stat/data/hsb2.csv',
header = TRUE,
sep = ","
) -> hsb2
  • female = (0/1)
  • race = (1=hispanic 2=asian 3=african-amer 4=white)
  • ses = socioeconomic status (1=low 2=middle 3=high)
  • schtyp = type of school (1=public 2=private)
  • prog = type of program (1=general 2=academic 3=vocational)
  • read = standardized reading score
  • write = standardized writing score
  • math = standardized math score
  • science = standardized science score
  • socst = standardized social studies score
 
15 / 22

There are no label values for the various qualitative variables (female, race, ses, schtyp, and prog) so we create these.3

factor(hsb2$female,
levels = c(0, 1),
labels = c("Male", "Female")
) -> hsb2$female.f
factor(hsb2$race,
levels = c(1:4),
labels = c("Hispanic", "Asian", "African American", "White")
) -> hsb2$race.f
factor(hsb2$ses,
levels = c(1:3),
labels = c("Low", "Middle", "High")
) -> hsb2$ses.f
factor(hsb2$schtyp,
levels = c(1:2),
labels = c("Public", "Private")
) -> hsb2$schtyp.f
factor(hsb2$prog,
levels = c(1:3),
labels = c("General", "Academic", "Vocational")
) -> hsb2$prog.f

[3] This is just a quick run through with creating value labels; we will cover this in greater detail in a later module.

16 / 22
 

save your work!!

Having added labels to the factors in hsb2 we can now save the data for later use.

save(hsb2, file = here("data", "hsb2.RData"))

Let us test if this R Markdown file will to html

If all is good then we can Close Project

  • RStudio will close your project and reopen in a vanilla session
17 / 22
  • Help with any knitting problems
  • Remind them to save the Rmd before they Close Project

Data in packages

Almost all R packages come bundled with data-sets, too many of them to walk you through but

To load data from a package, if you know the data-set's name, run

library(HistData)
data("Galton")
names(Galton)
## [1] "parent" "child"

or you can run

data("GaltonFamilies", package = "HistData")
names(GaltonFamilies)
## [1] "family" "father" "mother" "midparentHeight"
## [5] "children" "childNum" "gender" "childHeight"
18 / 22

Saving data and workspaces

You can certainly save your data via

  • save(dataname, file = "filepath/filename.RData") or
  • save(dataname, file = "filepath/filename.rdata")
data(mtcars)
names(mtcars)
save(mtcars, file = here("data", "mtcars.RData"))
rm(list = ls())
load(here("data", "mtcars.RData"))

You can also save multiple data files as follows:

data(mtcars)
library(ggplot2)
data(diamonds)
save(mtcars, diamonds, file = here("data", "mydata.RData"))
rm(list = ls()) # To clear the Environment
load(here("data", "mydata.RData"))
19 / 22

If you want to save just a single object from the environment and then load it in a later session, maybe with a different name, then you should use saveRDS() and readRDS()

data(mtcars)
saveRDS(mtcars, file = here("data", "mydata.RDS"))
rm(list = ls()) # To clear the Environment
ourdata <- readRDS(here("data", "mydata.RDS"))

If instead you did the following, note that you have to did the following, the file will be read with the name when saved

data(mtcars)
save(mtcars, file = here("data", "mtcars.RData"))
rm(list = ls()) # To clear the Environment
ourdata <- load(here("data", "mtcars.RData")) # Note ourdata is listed as "mtcars"

If you want to save everything you have done in the work session you can via save.image()

save.image(file = here("data", "mywork_jan182018.RData"))
  • The next time you start RStudio this image will be automatically loaded
  • Useful if you have a lot of R code you have written and various objects generated and do not want to start from scratch the next time around.
20 / 22

Let them know that if not in a project and they try to close RStudio after some code has been run, they will be prompted to save (or not) the workspace and they should say "no"

RStudio webinars

The fantastic team at RStudio runs free webinar that are often very helpful so be sure to signup with your email. Here are some video recordings of webinars that are relevant to what we have covered so far.

21 / 22

Agenda

  • Install R and RStudio, and the RStudio Cloud alternative
  • Understand the RStudio panes and functionalities
  • Installing packages
  • Understand how R Markdown works
  • Read and write data in various formats
  • Brief overview of variable types and labeling values
  • Saving data in R format
2 / 22
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow