class: title-slide, center, middle background-image: url(images/ouaerial.jpeg) background-size: cover # .fat[.fancy[Introduction to R]] ## .fat[.fancy[Ani Ruhil]] --- ## Agenda - Install R and RStudio - First install the latest version of ![](./images/Rsvgsm.png) from [here](https://cloud.r-project.org) - Then install the latest version of ![](./images/Rstudiosm.png) from [here](https://www.rstudio.com/products/rstudio/download/) - Test installation - Install some packages - Understand how R Markdown works - Read data in various formats - Basic data processing and saving - Fun with leaflet ??? - Go slow and make sure everyone is able to knit - Minimize panic and keep the environment light --- ## Understand your RStudio Environment <center><img src = "images/rstudiopanes.png", width = 700px></content> ??? (1) The Console ... (2) Knitting and [Code Chunk options](https://www.rstudio.com/wp-content/uploads/2015/03/rmarkdown-reference.pdf) --- ### ... key options ... (1) `Console` = This is where commands are issued to R, either by typing and hitting enter or running commands from a script (like your R Markdown file) (2) `Environment` = stores and shows you all the objects created (3) `History` shows you a running list of all commands issued to R (4) `Connections` = shows you any databases/servers you are connected to and also allows you to initiate a new connection (5) `Files` = shows you files and folders in your current working directory, and you can move up/down in the folder hierarchy (6) `Plots` = show you all plots that have been generated (7) `Packages` = shows you installed packages (8) `help` = allows you to get the help pages by typing in keywords (9) `Viewer` = shows you are "live" documents running on the server (10) `Knit` = allows you to generated html/pdf/word documents from a script (11) `Insert` = allows you to insert a vanilla R chunk. You can (and should) give unique name to code chunks so that you can easily diagnose which chunk is not working (12) `Run` = allows you to run lines/chunks Customize the `detachable` panes via `Tools -> Global Options...` You also have a spellchecker; use it --- ## Installing packages Now we install some packages via `Tools -> Install Packages...` and updated packages via `Tools -> Check for Package Updates...`<sup>1</sup> ```r devtools, reshape2, lubridate, car, Hmisc, gapminder, leaflet, DT, data.table, htmltools, scales, ggridges, here, knitr, here, kableExtra, haven, readr, readxl, ggplot2 ``` Other packages will be installed as needed Update packages via `Tools -> Check for Package Updates...` .footnote[[1] It is a good idea to update packages on a regular basis but note that every now and then something might break with an update. When this happens check the package's source, usually on `github` for solutions.] ??? - Make sure they install `devtools` --- ## Rprojects (1) Create a folder called `mpa6020` (2) Inside the mpa6020 folder create a subfolder called `data`. The folder structure will now be as shown below ``` mpa6020/ └── my-rmarkdown-file-01.Rmd └── my-rmarkdown-file-02.Rmd └── data/ └── some data file └── another data file ``` All data you download or create go into the `data` folder. All R code files reside in the `mpa6020` folder. Open the Rmd file I sent you: **Module01_forClass.Rmd** and save it in the **mpa6020** folder. Save the data I sent you to the **data** folder. (3) Now create a `project` via `File -> New Project` and choose `Existing Directory`. Browse to the **mpa6020** folder and click `Create Project`. RStudio will restart and when it does you will be in the project folder and will see a file called `mpa6020.Rproj` ??? - Point out that every time they start working they can click on `mpa6020.Rproj` and everything should work seamlessly unless something breaks --- ## R Markdown files - Go to `New File -> R Markdown ...` and enter a `My First Rmd File` in title and your `name`. <img src="images/Rmd.png" width="200" style="display: block; margin: auto;" /> - Click `OK`. - Now `File -> Save As..` and save it as `testing_rmd` in the **code** sub-folder - Click this button: ![](./images/knit.png) > You may see a message that says some packages need to be installed/updated. Allow these to be installed/updated. ??? - Emphasize the importance of the YAML `YAML Ain't Markup Language` - Urge patience again since some packages may have to be installed more than once, perhaps via `devtools`, and some may not have admin rights (the horror, the horror!!) - Show them how to knit to Word and to PDF - Tell them you will show them how to generate a slide-deck later, if anyone is interested --- class: inverse, center, top .pull-left[ ... if all goes well ... <img src="images/img01.png" width="70%" style="display: block; margin: auto;" /> ] .pull-right[ As the document knits, watch for error messages <center><img src = "images/simpsons.gif"></center> ] --- ### Specific R Markdown code block commands **Golden Rule:** Unique name for each chunk (no whitespace in name). Forgot? Use `namer()` ```r library(namer) name_chunks("myfilename.Rmd") ``` - `eval` = If FALSE, knitr will not run the code in the code chunk. - `include` = If FALSE, knitr will run the chunk but not include the chunk in the final document. - `echo` = If FALSE, knitr will not display the code in the code chunk above it’s results in the final document. - `error` = If FALSE, knitr will not display any error messages generated by the code. - `message` = If FALSE, knitr will not display any messages generated by the code. - `warning` = If FALSE, knitr will not display any warning messages generated by the code. - `cache` = If TRUE, knitr will cache the results to reuse in future knits. Knitr will reuse the results until the code chunk is altered. - `dev` = The R function name that will be used as a graphical device to record plots, e.g. dev='CairoPDF'. - `dpi` = A number for knitr to use as the dots per inch (dpi) in graphics (when applicable). - `fig.align` = 'center', 'left', 'right' alignment in the knit document - `fig.height` = height of the figure (in inches, for example) - `fig.width` = width of the figure (in inches, for example) - `out.height, out.width` = The width and height to scale plots to in the final output. Other options can be found in [the cheatsheet available here](https://www.rstudio.com/wp-content/uploads/2015/03/rmarkdown-reference.pdf) --- class: inverse, center, middle # .heat[.fancy[ Reading in data files ]] --- ## Reading data Make sure you have the following data-sets in the **data** folder. If you don't then the commands that follow will not work. We start by reading a simple `comma-separated variable` format file and then a `tab-delimited variable` format file. ```r library(here) # loaded once per session read.csv(here("data", "ImportDataCSV.csv"), sep = ",", header = TRUE) -> df.csv # note sep = "," read.csv(here("data", "ImportDataTAB.txt"), sep = "\t", header = TRUE) -> df.tab # note sep = "\t" ``` If the files were read then `Environment` should show objects called `df.csv` and `df.tab`. If you don't see these then check the following: - Make sure you have the csv/txt files in your **data** folder - Make sure the folder has been correctly named (no blank spaces before or after, all lowercase, etc) - Make sure the data folder is inside **mpa6020 -> code** ??? - Point out the importance of setting the data path to `data/filename.ext` --- **Excel** files can be read via the `readxl` package ```r library(readxl) read_excel(here("data", "ImportDataXLS.xls")) -> df.xls read_excel(here("data", "ImportDataXLSX.xlsx")) -> df.xlsx ``` **SPSS, Stata, SAS** files can be read via the `haven` package ```r library(haven) read_stata(here("data", "ImportDataStata.dta")) -> df.stata read_sas(here("data", "ImportDataSAS.sas7bdat")) -> df.sas read_sav(here("data", "ImportDataSPSS.sav")) -> df.spss ``` **Fixed-width** files: It is also common to encounter fixed-width files where the raw data are stored without any gaps between successive variables. However, these files will come with documentation that will tell you where each variable starts and ends, along with other details about each variable. <center><img src = "./images/fwftxt.png", width = 200px></center> ```r read.fwf(here("data", "fwfdata.txt"), widths = c(4, 9, 2, 4), header = FALSE, col.names = c("Name", "Month", "Day", "Year")) -> df.fwf ``` Notice we need `widths = c()` and `col.names = c()`. We will wrestle with some fixed-width files in the coming weeks. --- ## Reading Files from the Web It is possible to specify the full web-path for a file and read it in, rather than storing a local copy. This is often useful when updated by the source (Census Bureau, Bureau of Labor, Bureau of Economic Analysis, etc.) ```r read.table("http://data.princeton.edu/wws509/datasets/effort.dat") -> fpe read.table("https://stats.idre.ucla.edu/stat/data/test.txt", header = TRUE) -> test.txt read.csv("https://stats.idre.ucla.edu/stat/data/test.csv", header = TRUE) -> test.csv library(foreign) read.spss("https://stats.idre.ucla.edu/stat/data/hsb2.sav") -> hsb2.spss df.hsb2.spss = as.data.frame(hsb2.spss) ``` `hsb2.spss` was read with the `foreign` package<sup>2</sup>, an alternative to `haven` - `foreign` calls `read.spss` while `haven` calls `read_spss` .footnote[[2] The `foreign` package will also read Stata, SAS, and other formats. I end up defaulting to `haven` now. There are other packages for reading SPSS, SAS, etc. files ... `sas7bdat`, `rio`, `data.table`, `xlsx`, `XLConnect`, `gdata` and others. ] ??? - Point out that they must have an internet connection or else the file won't be read - Remind them that if the source file's URL change the file may not be read, but it is easy to check if a broken URL is the source of the error by using a browser --- ## Reading compressed files ```r temp = tempfile() download.file("ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/nvss/bridged_race/pcen_v2018_y1018.sas7bdat.zip", temp, mode = "wb") haven::read_sas(unz(temp, "pcen_v2018_y1018.sas7bdat")) -> oursasdata unlink(temp) ``` You can save your data in a format that R will recognize, giving it the **RData** or **rdata** extension ```r save(oursasdata, file = "data/oursasdata.RData") save(oursasdata, file = "data/oursasdata.rdata") ``` Check your **data** directory to confirm both files are present --- class: inverse, center, middle # .heat[.fancy[ Labeling data - a small example ]] --- ## Minimal example of data processing Working with the **hsb2** data: 200 students from the High school and Beyond study ```r read.table('https://stats.idre.ucla.edu/stat/data/hsb2.csv', header = TRUE, sep = ",") -> hsb2 ``` - `female` = (0/1) - `race` = (1=hispanic 2=asian 3=african-amer 4=white) - `ses` = socioeconomic status (1=low 2=middle 3=high) - `schtyp` = type of school (1=public 2=private) - `prog` = type of program (1=general 2=academic 3=vocational) - `read` = standardized reading score - `write` = standardized writing score - `math` = standardized math score - `science` = standardized science score - `socst` = standardized social studies score ---
--- There are no label values for the various qualitative variables (female, race, ses, schtyp, and prog) so we create these.<sup>3</sup> ```r factor(hsb2$female, levels = c(0, 1), labels = c("Male", "Female")) -> hsb2$female factor(hsb2$race, levels = c(1:4), labels = c("Hispanic", "Asian", "African American", "White")) -> hsb2$race factor(hsb2$ses, levels = c(1:3), labels = c("Low", "Middle", "High")) -> hsb2$ses factor(hsb2$schtyp, levels = c(1:2), labels = c("Public", "Private")) -> hsb2$schtyp factor(hsb2$prog, levels = c(1:3), labels = c("General", "Academic", "Vocational")) -> hsb2$prog ``` .footnote[[3] This is just a quick run through with creating value labels; we will cover this in greater detail in a later module. ] ---
#### save your work!! Having added labels to the factors in __hsb2__ we can now save the data for later use. ```r save(hsb2, file = "data/hsb2.RData") ``` Let us test if this R Markdown file will ![](./images/knit.png) to html If all is good then we can `Close Project` - RStudio will close your project and reopen in a vanilla session ??? - Help with any knitting problems - Remind them to save the Rmd before they `Close Project` --- class: inverse, center, middle # .heat[.fancy[ More with data ]] --- ## Data in packages Almost all R packages come bundled with data-sets, too many of them to walk you through but - [see here for standard ones](https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html) - [here are some more](https://vincentarelbundock.github.io/Rdatasets/datasets.html) - [and some more](http://www.public.iastate.edu/~hofmann/data_in_r_sortable.html) To load data from a package, if you know the data-set's name, run ```r library(HistData) data("Galton") names(Galton) ``` ``` ## [1] "parent" "child" ``` or you can run ```r data("GaltonFamilies", package = "HistData") names(GaltonFamilies) ``` ``` ## [1] "family" "father" "mother" "midparentHeight" ## [5] "children" "childNum" "gender" "childHeight" ``` --- ## Saving data and workspaces You can certainly save your data via - `save(dataname, file = "filepath/filename.RData")` or - `save(dataname, file = "filepath/filename.rdata")` ```r data(mtcars) save(mtcars, file = "data/mtcars.RData") {{rm(list = ls())}}# To clear the Environment load("data/mtcars.RData") ``` You can also save multiple data files as follows: ```r data(mtcars) library(ggplot2) data(diamonds) *save(mtcars, diamonds, file = "data/mydata.RData") rm(list = ls()) # To clear the Environment load("data/mydata.RData") ``` --- If you want to save just a single `object` from the environment and then load it in a later session, maybe with a different name, then you should use `saveRDS()` and `readRDS()` ```r data(mtcars) saveRDS(mtcars, file = "data/mydata.RDS") rm(list = ls()) # To clear the Environment readRDS("data/mydata.RDS") -> ourdata ``` If instead you did the following, note that you have to did the following, the file will be read with the name when saved ```r data(mtcars) save(mtcars, file = "data/mtcars.RData") rm(list = ls()) # To clear the Environment load("data/mtcars.RData") -> ourdata # Note ourdata is listed as "mtcars" ``` If you want to save everything you have done in the work session you can via `save.image()` ```r save.image(file = "mywork_jan182018.RData") ``` - The next time you start RStudio this image will be automatically loaded - Useful if you have a lot of R code you have written and various objects generated and do not want to start from scratch the next time around. ??? Let them know that if not in a project and they try to close RStudio after some code has been run, they will be prompted to save (or not) the `workspace` and they should say "no" --- class: inverse, center, middle # Mapping in R with leaflet ![](./images/elaineyes.gif) --- `leaflet` is an easy to learn JavaScript library that generates interactive maps ```r library(leaflet) library(leaflet.extras) library(widgetframe) leaflet() %>% setView(lat = 39.322577, lng = -82.106336, zoom = 14) %>% addTiles() %>% setMapWidgetStyle() %>% frameWidget(height = '275') ```
- `setView()` centers the map with given lat/lng - `zoom =` applies zoom factor --- ... drop a pin on Building 21 ```r leaflet() %>% setView(lat = 39.322577, lng = -82.106336, zoom = 15) %>% addMarkers(lat = 39.319984, lng = -82.107084, popup = c("The Ridges, Building 21")) %>% addTiles() %>% setMapWidgetStyle() %>% frameWidget(height = '325') ```
--- class: inverse, center, middle # Exercises for practice ![](./images/jerrysarcasm.gif) --- ### Ex. 1: Creating and knitting a new RMarkdown file Open a fresh session by launching RStudio and then running `File -> Open Project...` Give it a title, your name as the author, and then save it with in **code** with the following name: `m1ex1.Rmd` Delete all content after the following code chunk <img src="./images/dchunk.png" width="40%" style="display: block; margin: auto;" /> Add this level 1 heading `The Starwars Data` and then insert your first code chunk *exactly as shown below* ```r library(dplyr) data(starwars) str(starwars) ``` Add this level 2 heading `Character Heights and Weights` and then your second code chunk ```r plot(starwars$height, plot$mass) ``` Now knit this file to **html** --- ### Ex. 2: Lorem Ipsum paragraphs and graphs Go to [this website](https://loremipsumgenerator.com/generator/?n=2&t=p) and generate five Lorem Ipsum placeholder text paragraphs - para 1: must have level 1 heading - para 2: must have level 2 heading - para 3: must have level 3 heading - para 4: must have level 4 heading - para 5: must have level 5 heading Using the `starwars` data, create five code chunks, one after each paragraph - Each code chunk will have the same R code (see below) ```r plot(starwars$height, plot$mass) ``` Now knit this file to **html** --- ### Ex. 3: Reading in three data files Create a new `RMarkdown` file that is blank after the `initial setup code chunk` Insert a code chunk that reads in both these files found on the web - `http://www.stata.com/data/jwooldridge/eacsap/mroz.dta` - `http://calcnet.mth.cmich.edu/org/spss/V16_materials/DataSets_v16/airline_passengers.sav` In a follow-up code chunk, run the `summary()` command on each data-set In a separate code chunk, read in [this dataset](https://s3.amazonaws.com/tripdata/201502-citibike-tripdata.zip) after you download it and save the unzipped file in your **data** folder. - The variable `gender` has the following codes: `Zero = unknown; 1 = male; 2 = female` - Use this coding scheme to convert `gender` into a `factor` with these value labels In a follow-up chunk run both the following commands on this data-set - `names()` - `str()` - `summary()` In a final chunk, run the commands necessary to save each of the three data-sets as separate `RData` files. Make sure you save them in your **data** folder. Now knit the complete `Rmd` file to **html** --- ### Ex. 4: Knitting with prettydoc I'd like you to use a specific Rmd because these are very readable You had installed the `prettydoc` package so now create a prettydoc Rmd file as shown below: .pull-left[ <center><img src = "./images/prettydoc1.png", width = 300px></center> ] .pull-right[ <center><img src = "./images/prettydoc2.png", width = 280px></center> ] Now take all the text and code chunk you created in Ex. 3 and insert it in this file. Make sure you add a title, etc in the `YAML` and then knit the file to `html` You can play with the `theme:` and `highlight:` fields, choosing from the options [displayed here](http://yixuan.cos.name/prettydoc/themes.html) To see native R Markdown formatting options [read the documentation](http://rmarkdown.rstudio.com/html_document_format.html#overview) ??? - Point out that the native Rmd formats allow for a lot more options than does the prettydoc format - Initial files I provide will be prettydoc because of their simplicity but later formats will rely on more flexible Rmd formats --- ## RStudio webinars for more details RStudio runs and archives free webinars. Sign up with your email and watch them if you want more details of specific functionalities - [Programming Part 1 (Writing code in RStudio)](https://www.rstudio.com/resources/webinars/rstudio-essentials-webinar-series-part-1/) - [Programming Part 2 (Debugging code in RStudio)](https://www.rstudio.com/resources/webinars/rstudio-essentials-webinar-series-programming-part-2/) - [Managing Change Part 1 (Projects in RStudio)](https://www.rstudio.com/resources/webinars/rstudio-essentials-webinar-series-managing-change-part-1/) - [Importing Data into R](https://www.rstudio.com/resources/webinars/importing-data-into-r/) - [Whats new with readxl](https://www.rstudio.com/resources/webinars/whats-new-with-readxl/) - [Getting your data into R](https://www.rstudio.com/resources/webinars/getting-your-data-into-r/) - [Getting Started with R Markdown](https://www.rstudio.com/resources/webinars/getting-started-with-r-markdown/) --- class: right, middle <img class="circle" src="https://github.com/aniruhil.png" width="175px"/> # Find me at... [<svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 512 512"><path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"/></svg> @aruhil](http://twitter.com/aruhil) [<svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 512 512"><path d="M326.612 185.391c59.747 59.809 58.927 155.698.36 214.59-.11.12-.24.25-.36.37l-67.2 67.2c-59.27 59.27-155.699 59.262-214.96 0-59.27-59.26-59.27-155.7 0-214.96l37.106-37.106c9.84-9.84 26.786-3.3 27.294 10.606.648 17.722 3.826 35.527 9.69 52.721 1.986 5.822.567 12.262-3.783 16.612l-13.087 13.087c-28.026 28.026-28.905 73.66-1.155 101.96 28.024 28.579 74.086 28.749 102.325.51l67.2-67.19c28.191-28.191 28.073-73.757 0-101.83-3.701-3.694-7.429-6.564-10.341-8.569a16.037 16.037 0 0 1-6.947-12.606c-.396-10.567 3.348-21.456 11.698-29.806l21.054-21.055c5.521-5.521 14.182-6.199 20.584-1.731a152.482 152.482 0 0 1 20.522 17.197zM467.547 44.449c-59.261-59.262-155.69-59.27-214.96 0l-67.2 67.2c-.12.12-.25.25-.36.37-58.566 58.892-59.387 154.781.36 214.59a152.454 152.454 0 0 0 20.521 17.196c6.402 4.468 15.064 3.789 20.584-1.731l21.054-21.055c8.35-8.35 12.094-19.239 11.698-29.806a16.037 16.037 0 0 0-6.947-12.606c-2.912-2.005-6.64-4.875-10.341-8.569-28.073-28.073-28.191-73.639 0-101.83l67.2-67.19c28.239-28.239 74.3-28.069 102.325.51 27.75 28.3 26.872 73.934-1.155 101.96l-13.087 13.087c-4.35 4.35-5.769 10.79-3.783 16.612 5.864 17.194 9.042 34.999 9.69 52.721.509 13.906 17.454 20.446 27.294 10.606l37.106-37.106c59.271-59.259 59.271-155.699.001-214.959z"/></svg> aniruhil.org](https://aniruhil.org) [<svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 512 512"><path d="M476 3.2L12.5 270.6c-18.1 10.4-15.8 35.6 2.2 43.2L121 358.4l287.3-253.2c5.5-4.9 13.3 2.6 8.6 8.3L176 407v80.5c0 23.6 28.5 32.9 42.5 15.8L282 426l124.6 52.2c14.2 6 30.4-2.9 33-18.2l72-432C515 7.8 493.3-6.8 476 3.2z"/></svg> ruhil@ohio.edu](mailto:ruhil@ohio.edu)