(1) The Console ... (2) Knitting and Code Chunk options
(1) Console
= This is where commands are issued to R, either by typing and hitting enter or running commands from a script (like your R Markdown file)
(2) Environment
= stores and shows you all the objects created
(3) History
shows you a running list of all commands issued to R
(4) Connections
= shows you any databases/servers you are connected to and also allows you to initiate a new connection
(5) Files
= shows you files and folders in your current working directory, and you can move up/down in the folder hierarchy
(6) Plots
= show you all plots that have been generated
(7) Packages
= shows you installed packages
(8) help
= allows you to get the help pages by typing in keywords
(9) Viewer
= shows you are "live" documents running on the server
(10) Knit
= allows you to generated html/pdf/word documents from a script
(11) Insert
= allows you to insert a vanilla R chunk. You can (and should) give unique name to code chunks so that you can easily diagnose which chunk is not working
(12) Run
= allows you to run lines/chunks
Customize the detachable
panes via Tools -> Global Options...
You also have a spellchecker; use it
Now we install some packages via Tools -> Install Packages...
and updated packages via Tools -> Check for Package Updates...
1
devtools, reshape2, lubridate, car, Hmisc, gapminder, leaflet, DT, data.table, htmltools, scales, ggridges, here, knitr, here, kableExtra, haven, readr, readxl, ggplot2
Other packages will be installed as needed
Update packages via Tools -> Check for Package Updates...
[1] It is a good idea to update packages on a regular basis but note that every now and then something might break with an update. When this happens check the package's source, usually on github
for solutions.
devtools
(1) Create a folder called mpa6020
(2) Inside the mpa6020 folder create a subfolder called data
. The folder structure will now be as shown below
mpa6020/ └── my-rmarkdown-file-01.Rmd └── my-rmarkdown-file-02.Rmd └── data/ └── some data file └── another data file
All data you download or create go into the data
folder. All R code files reside in the mpa6020
folder.
Open the Rmd file I sent you: Module01_forClass.Rmd and save it in the mpa6020 folder. Save the data I sent you to the data folder.
(3) Now create a project
via File -> New Project
and choose Existing Directory
. Browse to the mpa6020 folder and click Create Project
. RStudio will restart and when it does you will be in the project folder and will see a file called mpa6020.Rproj
mpa6020.Rproj
and everything should work seamlessly unless something breaks New File -> R Markdown ...
and enter a My First Rmd File
in title and your name
. OK
. File -> Save As..
and save it as testing_rmd
in the code sub-folderYou may see a message that says some packages need to be installed/updated. Allow these to be installed/updated.
YAML Ain't Markup Language
devtools
, and some may not have admin rights (the horror, the horror!!)... if all goes well ...
As the document knits, watch for error messages
Golden Rule: Unique name for each chunk (no whitespace in name). Forgot? Use namer()
library(namer)name_chunks("myfilename.Rmd")
eval
= If FALSE, knitr will not run the code in the code chunk. include
= If FALSE, knitr will run the chunk but not include the chunk in the final document. echo
= If FALSE, knitr will not display the code in the code chunk above it’s results in the final document. error
= If FALSE, knitr will not display any error messages generated by the code. message
= If FALSE, knitr will not display any messages generated by the code. warning
= If FALSE, knitr will not display any warning messages generated by the code. cache
= If TRUE, knitr will cache the results to reuse in future knits. Knitr will reuse the results until the code chunk is altered. dev
= The R function name that will be used as a graphical device to record plots, e.g. dev='CairoPDF'. dpi
= A number for knitr to use as the dots per inch (dpi) in graphics (when applicable). fig.align
= 'center', 'left', 'right' alignment in the knit document fig.height
= height of the figure (in inches, for example) fig.width
= width of the figure (in inches, for example) out.height, out.width
= The width and height to scale plots to in the final output. Other options can be found in the cheatsheet available here
Make sure you have the following data-sets in the data folder. If you don't then the commands that follow will not work. We start by reading a simple comma-separated variable
format file and then a tab-delimited variable
format file.
library(here) # loaded once per session read.csv(here("data", "ImportDataCSV.csv"), sep = ",", header = TRUE) -> df.csv # note sep = ","read.csv(here("data", "ImportDataTAB.txt"), sep = "\t", header = TRUE) -> df.tab # note sep = "\t"
If the files were read then Environment
should show objects called df.csv
and df.tab
. If you don't see these then check the following:
data/filename.ext
Excel files can be read via the readxl
package
library(readxl)read_excel(here("data", "ImportDataXLS.xls")) -> df.xls read_excel(here("data", "ImportDataXLSX.xlsx")) -> df.xlsx
SPSS, Stata, SAS files can be read via the haven
package
library(haven)read_stata(here("data", "ImportDataStata.dta")) -> df.stata read_sas(here("data", "ImportDataSAS.sas7bdat")) -> df.sasread_sav(here("data", "ImportDataSPSS.sav")) -> df.spss
Fixed-width files: It is also common to encounter fixed-width files where the raw data are stored without any gaps between successive variables. However, these files will come with documentation that will tell you where each variable starts and ends, along with other details about each variable.
read.fwf(here("data", "fwfdata.txt"), widths = c(4, 9, 2, 4), header = FALSE, col.names = c("Name", "Month", "Day", "Year")) -> df.fwf
Notice we need widths = c()
and col.names = c()
. We will wrestle with some fixed-width files in the coming weeks.
It is possible to specify the full web-path for a file and read it in, rather than storing a local copy. This is often useful when updated by the source (Census Bureau, Bureau of Labor, Bureau of Economic Analysis, etc.)
read.table("http://data.princeton.edu/wws509/datasets/effort.dat") -> fperead.table("https://stats.idre.ucla.edu/stat/data/test.txt", header = TRUE) -> test.txt read.csv("https://stats.idre.ucla.edu/stat/data/test.csv", header = TRUE) -> test.csvlibrary(foreign)read.spss("https://stats.idre.ucla.edu/stat/data/hsb2.sav") -> hsb2.spssdf.hsb2.spss = as.data.frame(hsb2.spss)
hsb2.spss
was read with the foreign
package2, an alternative to haven
foreign
calls read.spss
while haven
calls read_spss
[2] The foreign
package will also read Stata, SAS, and other formats. I end up defaulting to haven
now. There are other packages for reading SPSS, SAS, etc. files ... sas7bdat
, rio
, data.table
, xlsx
, XLConnect
, gdata
and others.
temp = tempfile()download.file("ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/nvss/bridged_race/pcen_v2018_y1018.sas7bdat.zip", temp, mode = "wb")haven::read_sas(unz(temp, "pcen_v2018_y1018.sas7bdat")) -> oursasdataunlink(temp)
You can save your data in a format that R will recognize, giving it the RData or rdata extension
save(oursasdata, file = "data/oursasdata.RData")save(oursasdata, file = "data/oursasdata.rdata")
Check your data directory to confirm both files are present
Working with the hsb2 data: 200 students from the High school and Beyond study
read.table('https://stats.idre.ucla.edu/stat/data/hsb2.csv', header = TRUE, sep = ",") -> hsb2
female
= (0/1) race
= (1=hispanic 2=asian 3=african-amer 4=white) ses
= socioeconomic status (1=low 2=middle 3=high) schtyp
= type of school (1=public 2=private) prog
= type of program (1=general 2=academic 3=vocational) read
= standardized reading score write
= standardized writing score math
= standardized math score science
= standardized science score socst
= standardized social studies score There are no label values for the various qualitative variables (female, race, ses, schtyp, and prog) so we create these.3
factor(hsb2$female, levels = c(0, 1), labels = c("Male", "Female")) -> hsb2$femalefactor(hsb2$race, levels = c(1:4), labels = c("Hispanic", "Asian", "African American", "White")) -> hsb2$racefactor(hsb2$ses, levels = c(1:3), labels = c("Low", "Middle", "High")) -> hsb2$ses factor(hsb2$schtyp, levels = c(1:2), labels = c("Public", "Private")) -> hsb2$schtyp factor(hsb2$prog, levels = c(1:3), labels = c("General", "Academic", "Vocational")) -> hsb2$prog
[3] This is just a quick run through with creating value labels; we will cover this in greater detail in a later module.
Having added labels to the factors in hsb2 we can now save the data for later use.
save(hsb2, file = "data/hsb2.RData")
Let us test if this R Markdown file will to html
If all is good then we can Close Project
Close Project
Almost all R packages come bundled with data-sets, too many of them to walk you through but
To load data from a package, if you know the data-set's name, run
library(HistData)data("Galton")names(Galton)
## [1] "parent" "child"
or you can run
data("GaltonFamilies", package = "HistData")names(GaltonFamilies)
## [1] "family" "father" "mother" "midparentHeight"## [5] "children" "childNum" "gender" "childHeight"
You can certainly save your data via
save(dataname, file = "filepath/filename.RData")
or save(dataname, file = "filepath/filename.rdata")
data(mtcars)save(mtcars, file = "data/mtcars.RData"){{rm(list = ls())}}# To clear the Environmentload("data/mtcars.RData")
You can also save multiple data files as follows:
data(mtcars)library(ggplot2)data(diamonds)save(mtcars, diamonds, file = "data/mydata.RData")rm(list = ls()) # To clear the Environmentload("data/mydata.RData")
If you want to save just a single object
from the environment and then load it in a later session, maybe with a different name, then you should use saveRDS()
and readRDS()
data(mtcars)saveRDS(mtcars, file = "data/mydata.RDS")rm(list = ls()) # To clear the EnvironmentreadRDS("data/mydata.RDS") -> ourdata
If instead you did the following, note that you have to did the following, the file will be read with the name when saved
data(mtcars)save(mtcars, file = "data/mtcars.RData")rm(list = ls()) # To clear the Environmentload("data/mtcars.RData") -> ourdata # Note ourdata is listed as "mtcars"
If you want to save everything you have done in the work session you can via save.image()
save.image(file = "mywork_jan182018.RData")
Let them know that if not in a project and they try to close RStudio after some code has been run, they will be prompted to save (or not) the workspace
and they should say "no"
leaflet
is an easy to learn JavaScript library that generates interactive maps
library(leaflet)library(leaflet.extras)library(widgetframe)leaflet() %>% setView(lat = 39.322577, lng = -82.106336, zoom = 14) %>% addTiles() %>% setMapWidgetStyle() %>% frameWidget(height = '275')
setView()
centers the map with given lat/lng zoom =
applies zoom factor ... drop a pin on Building 21
leaflet() %>% setView(lat = 39.322577, lng = -82.106336, zoom = 15) %>% addMarkers(lat = 39.319984, lng = -82.107084, popup = c("The Ridges, Building 21")) %>% addTiles() %>% setMapWidgetStyle() %>% frameWidget(height = '325')
Open a fresh session by launching RStudio and then running File -> Open Project...
Give it a title, your name as the author, and then save it with in code with the following name: m1ex1.Rmd
Delete all content after the following code chunk
Add this level 1 heading The Starwars Data
and then insert your first code chunk exactly as shown below
library(dplyr)data(starwars)str(starwars)
Add this level 2 heading Character Heights and Weights
and then your second code chunk
plot(starwars$height, plot$mass)
Now knit this file to html
Go to this website and generate five Lorem Ipsum placeholder text paragraphs
Using the starwars
data, create five code chunks, one after each paragraph
plot(starwars$height, plot$mass)
Now knit this file to html
Create a new RMarkdown
file that is blank after the initial setup code chunk
Insert a code chunk that reads in both these files found on the web
http://www.stata.com/data/jwooldridge/eacsap/mroz.dta
http://calcnet.mth.cmich.edu/org/spss/V16_materials/DataSets_v16/airline_passengers.sav
In a follow-up code chunk, run the summary()
command on each data-set
In a separate code chunk, read in this dataset after you download it and save the unzipped file in your data folder.
gender
has the following codes: Zero = unknown; 1 = male; 2 = female
gender
into a factor
with these value labels In a follow-up chunk run both the following commands on this data-set
names()
str()
summary()
In a final chunk, run the commands necessary to save each of the three data-sets as separate RData
files. Make sure you save them in your data folder. Now knit the complete Rmd
file to html
I'd like you to use a specific Rmd because these are very readable
You had installed the prettydoc
package so now create a prettydoc Rmd file as shown below:
Now take all the text and code chunk you created in Ex. 3 and insert it in this file. Make sure you add a title, etc in the YAML
and then knit the file to html
You can play with the theme:
and highlight:
fields, choosing from the options displayed here
To see native R Markdown formatting options read the documentation
RStudio runs and archives free webinars. Sign up with your email and watch them if you want more details of specific functionalities
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |