class: title-slide, center, middle background-image: url(images/ouaerial.jpeg) background-size: cover # .crimson[.fancy[Introduction to R & RStudio]] ## .crimson[.fancy[Professor Ruhil]] --- name: agenda # .fancy[ Agenda ] - Install R and RStudio, and the RStudio Cloud alternative - Understand the RStudio panes and functionalities - Installing packages - Understand how R Markdown works - Read and write data in various formats - Brief overview of variable types and labeling values - Saving data in R format --- # Installing R & RStudio vs. RStudio Cloud .pull-left[ #### If you want to use your own machine (not recommended except for the texh-savvy) (1) First install the latest version of ![](../images/Rsvgsm.png) from [here](https://cloud.r-project.org) (2) Then install the latest version of ![](../images/Rstudiosm.png) from [here](https://www.rstudio.com/products/rstudio/download/) (3) Launch RStudio and check that it shows something like the image below: <img src="images/rstudiopanes.png" width="60%" style="display: block; margin: auto;" /> ] .pull-right[ #### Using Rstudio Cloud (recommended) (1) Use the link you received in your email to create a free account on RStudio Cloud, and then to access the course project. Once you are logged into RStudio Cloud you should see a tab called "Project." Clicking on this should give you the following browser screen: <img src="images/rstudio-cloud.png" width="80%" style="display: block; margin: auto;" /> ] ??? - Go slow and make sure everyone is able to knit - Minimize panic and keep the environment light --- # Understand your RStudio Environment <center><img src = "images/rstudiopanes.png", width = 600px></content> ??? (1) The Console ... (2) Knitting and [Code Chunk options](https://www.rstudio.com/wp-content/uploads/2015/03/rmarkdown-reference.pdf) --- ## Installing packages Open the Rmd file I sent you: **Module01.Rmd** and save it in the **code** folder Save the data I sent you to the **data** folder Now we install some packages via `Tools -> Install Packages...` and update packages via `Tools -> Check for Package Updates...`<sup>1</sup> ```r devtools, ggplot2, dplyr, reshape2, lubridate, car, Hmisc, gapminder, leaflet, prettydoc, DT, data.table, htmltools, scales, ggridges ``` Other packages will be installed as needed. **Note:** If you are running some code I have provided or you have found online and you get a message saying `Error in library(xyz) : there is no package called ‘xyz’` go ahead and install that library. .footnote[[1] It is a good idea to update packages regularly. Every now and then an update might break something but it is usually fixed sooner rather than later by the developer.] ??? - Make sure they install `devtools` and `prettydoc` --- ## Rprojects (ONLY if installing on your own computer) - Create a folder called **mpa5830**. Inside **mpa5830** create ONE sub-folder called **data**. The folder structure will now be ```{} mpa5830/ └── data/ └── datafile-1 └── datafile-2 └── ... ``` .pull-left[ - Create a `project` via `File -> New Project`, choose `Existing Directory`, <center><img src = "images/projects.png", width = 275px></center> ] .pull-right[ - Browse to the **mpa5830** folder - RStudio will restart and you will be in the project folder, seeing a file called `mpa5830.Rproj` - From now on, start every session by double-clicking `mpa5830.Rproj` ] ??? - Point out that every time they start working they can click on `mpa5830.Rproj` and everything should work seamlessly unless something breaks --- ## R Markdown files - Go to `New File -> R Markdown ...` and enter a `My First Rmd File` in title and your `name`. <img src="images/Rmd.png" width="280" style="display: block; margin: auto;" /> - Click `OK`. - Now `File -> Save As..` and save it as `testing_rmd` in the **code** sub-folder - Click this button: ![](images/knit.png) > You may see a message that says some packages need to be installed/updated. Allow these to be installed/updated. ??? - Emphasize the importance of the YAML `YAML Ain't Markup Language` - Urge patience again since some packages may have to be installed more than once, perhaps via `devtools`, and some may not have admin rights (the horror, the horror!!) - Show them how to knit to Word and to PDF - Tell them you will show them how to generate a slide-deck later, if anyone is interested --- class: inverse, center, top .pull-left[ ... if all goes well ... <img src="images/img01.png" width="70%" style="display: block; margin: auto;" /> ] .pull-right[ As the document knits, watch for error messages <center><img src = "images/simpsons.gif"></center> ] --- ## Reading data Make sure you have the following data-sets in the **data** folder. If you don't then the commands that follow will not work. We start by reading a simple `comma-separated variable` format file and then a `tab-delimited variable` format file. ```r library(here) read.csv( here("data", "ImportDataCSV.csv"), sep = ",", header = TRUE) -> df.csv # note sep = "," read.csv( here("data", "ImportDataTAB.txt"), sep = "\t", header = TRUE) -> df.tab # note sep = "\t" ``` If the files were read then `Global Environment` should show objects called `df.csv` and `df.tab`. If you don't see these then check the following: - Make sure you have the files in your **data** folder - Make sure the folder has been correctly named (no blank spaces before or after, all lowercase, etc) - Make sure the data folder is inside **mpa5830** ??? - Point out the importance of setting the data path to `../data/filename.ext` --- (a) **Excel** files can be read via the `readxl` package ```r library(readxl) read_excel( here("data", "ImportDataXLS.xls") ) -> df.xls read_excel( here("data", "ImportDataXLSX.xlsx") ) -> df.xlsx ``` (b) **SPSS, Stata, SAS** files can be read via the `haven` package ```r library(haven) read_stata( here("data", "ImportDataStata.dta") ) -> df.stata # Stata data file read_sas( here("data", "ImportDataSAS.sas7bdat") ) -> df.sas # SAS data file read_sav( here("data", "ImportDataSPSS.sav") ) -> df.spss # SPSS data file ``` --- (c) It is also common to encounter **fixed-width** files where the raw data are stored without any gaps between successive variables. However, these files will come with documentation that will tell you where each variable starts and ends, along with other details about each variable. <center><img src = "images/fwftxt.png", width = 200px></center> ```r read.fwf( here("data", "fwfdata.txt"), widths = c(4, 9, 2, 4), header = FALSE, col.names = c("Name", "Month", "Day", "Year") ) -> df.fw ``` Notice we need `widths = c()` and `col.names = c()` --- ## Reading Files from the Web It is possible to specify the full web-path for a file and read it in, rather than storing a local copy. This is often useful when updated by the source (Census Bureau, Bureau of Labor, Bureau of Economic Analysis, etc.) ```r read.table( "http://data.princeton.edu/wws509/datasets/effort.dat" ) -> fpe read.table( "https://stats.idre.ucla.edu/stat/data/test.txt", header = TRUE ) -> test read.csv( "https://stats.idre.ucla.edu/stat/data/test.csv", header = TRUE ) -> test.csv ``` The `foreign` package will also read Stata and other formats. I end up defaulting to `haven` now. There are other packages for reading SPSS, SAS, etc. files ... `sas7bdat`, `rio`, `data.table`, `xlsx`, `XLConnect`, `gdata` and others. ] ??? - Point out that they must have an internet connection or else the file won't be read - Remind them that if the source file's URL change the file may not be read, but it is easy to check if a broken URL is the source of the error by using a browser --- ## Reading compressed files ```r temp <- tempfile() download.file( "ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/NVSS/bridgepop/2016/pcen_v2016_y1016.sas7bdat.zip", temp ) haven::read_sas( unz( temp, "pcen_v2016_y1016.sas7bdat" ) ) -> oursasdata unlink(temp) ``` --- You can save your data in a format that R will recognize, giving it the **RData** or **rdata** extension ```r save( oursasdata, file = here("data", "oursasdata.RData") ) save( oursasdata, file = here("data", "oursasdata.rdata") ) ``` Check your **data** directory to confirm both files are present --- ## Minimal example of data processing Working with the **hsb2** data: 200 students from the High school and Beyond study ```r read.table( 'https://stats.idre.ucla.edu/stat/data/hsb2.csv', header = TRUE, sep = "," ) -> hsb2 ``` - female = (0/1) - race = (1=hispanic 2=asian 3=african-amer 4=white) - ses = socioeconomic status (1=low 2=middle 3=high) - schtyp = type of school (1=public 2=private) - prog = type of program (1=general 2=academic 3=vocational) - read = standardized reading score - write = standardized writing score - math = standardized math score - science = standardized science score - socst = standardized social studies score
--- There are no `label values` for the various qualitative variables (female, race, ses, schtyp, and prog) so we create these.<sup>3</sup> ```r factor(hsb2$female, levels = c(0, 1), labels = c("Male", "Female") ) -> hsb2$female.f factor(hsb2$race, levels = c(1:4), labels = c("Hispanic", "Asian", "African American", "White") ) -> hsb2$race.f factor(hsb2$ses, levels = c(1:3), labels = c("Low", "Middle", "High") ) -> hsb2$ses.f factor(hsb2$schtyp, levels = c(1:2), labels = c("Public", "Private") ) -> hsb2$schtyp.f factor(hsb2$prog, levels = c(1:3), labels = c("General", "Academic", "Vocational") ) -> hsb2$prog.f ``` .footnote[[3] This is just a quick run through with creating value labels; we will cover this in greater detail in a later module. ] ---
#### save your work!! Having added labels to the factors in __hsb2__ we can now save the data for later use. ```r save(hsb2, file = here("data", "hsb2.RData")) ``` Let us test if this R Markdown file will ![](./images/knit.png) to html If all is good then we can `Close Project` - RStudio will close your project and reopen in a vanilla session ??? - Help with any knitting problems - Remind them to save the Rmd before they `Close Project` --- ## Data in packages Almost all R packages come bundled with data-sets, too many of them to walk you through but - [see here for standard ones](https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html) - [here are some more](https://vincentarelbundock.github.io/Rdatasets/datasets.html) - [and some more](http://www.public.iastate.edu/~hofmann/data_in_r_sortable.html) To load data from a package, if you know the data-set's name, run ```r library(HistData) data("Galton") names(Galton) ``` ``` ## [1] "parent" "child" ``` or you can run ```r data("GaltonFamilies", package = "HistData") names(GaltonFamilies) ``` ``` ## [1] "family" "father" "mother" "midparentHeight" ## [5] "children" "childNum" "gender" "childHeight" ``` --- ## Saving data and workspaces You can certainly save your data via - `save(dataname, file = "filepath/filename.RData")` or - `save(dataname, file = "filepath/filename.rdata")` ```r data(mtcars) names(mtcars) save(mtcars, file = here("data", "mtcars.RData")) *rm(list = ls()) load(here("data", "mtcars.RData")) ``` You can also save multiple data files as follows: ```r data(mtcars) library(ggplot2) data(diamonds) *save(mtcars, diamonds, file = here("data", "mydata.RData")) rm(list = ls()) # To clear the Environment load(here("data", "mydata.RData")) ``` --- If you want to save just a single `object` from the environment and then load it in a later session, maybe with a different name, then you should use `saveRDS()` and `readRDS()` ```r data(mtcars) *saveRDS(mtcars, file = here("data", "mydata.RDS")) rm(list = ls()) # To clear the Environment *ourdata <- readRDS(here("data", "mydata.RDS")) ``` If instead you did the following, note that you have to did the following, the file will be read with the name when saved ```r data(mtcars) save(mtcars, file = here("data", "mtcars.RData")) rm(list = ls()) # To clear the Environment ourdata <- load(here("data", "mtcars.RData")) # Note ourdata is listed as "mtcars" ``` If you want to save everything you have done in the work session you can via `save.image()` ```r save.image(file = here("data", "mywork_jan182018.RData")) ``` - The next time you start RStudio this image will be automatically loaded - Useful if you have a lot of R code you have written and various objects generated and do not want to start from scratch the next time around. ??? Let them know that if not in a project and they try to close RStudio after some code has been run, they will be prompted to save (or not) the `workspace` and they should say "no" --- # RStudio webinars The fantastic team at RStudio runs free webinar that are often very helpful so be sure to signup with your email. Here are some video recordings of webinars that are relevant to what we have covered so far. - [Programming Part 1 (Writing code in RStudio)](https://www.rstudio.com/resources/webinars/rstudio-essentials-webinar-series-part-1/) - [Programming Part 2 (Debugging code in RStudio)](https://www.rstudio.com/resources/webinars/rstudio-essentials-webinar-series-programming-part-2/) - [Managing Change Part 1 (Projects in RStudio)](https://www.rstudio.com/resources/webinars/rstudio-essentials-webinar-series-managing-change-part-1/) - [Importing Data into R](https://www.rstudio.com/resources/webinars/importing-data-into-r/) - [Whats new with readxl](https://www.rstudio.com/resources/webinars/whats-new-with-readxl/) - [Getting your data into R](https://www.rstudio.com/resources/webinars/getting-your-data-into-r/) - [Getting Started with R Markdown](https://www.rstudio.com/resources/webinars/getting-started-with-r-markdown/) --- class: right, middle <img class="circle" src="https://github.com/aniruhil.png" width="175px"/> # Find me at... [<svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 512 512"><path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"/></svg> @aruhil](http://twitter.com/aruhil) [<svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 512 512"><path d="M326.612 185.391c59.747 59.809 58.927 155.698.36 214.59-.11.12-.24.25-.36.37l-67.2 67.2c-59.27 59.27-155.699 59.262-214.96 0-59.27-59.26-59.27-155.7 0-214.96l37.106-37.106c9.84-9.84 26.786-3.3 27.294 10.606.648 17.722 3.826 35.527 9.69 52.721 1.986 5.822.567 12.262-3.783 16.612l-13.087 13.087c-28.026 28.026-28.905 73.66-1.155 101.96 28.024 28.579 74.086 28.749 102.325.51l67.2-67.19c28.191-28.191 28.073-73.757 0-101.83-3.701-3.694-7.429-6.564-10.341-8.569a16.037 16.037 0 0 1-6.947-12.606c-.396-10.567 3.348-21.456 11.698-29.806l21.054-21.055c5.521-5.521 14.182-6.199 20.584-1.731a152.482 152.482 0 0 1 20.522 17.197zM467.547 44.449c-59.261-59.262-155.69-59.27-214.96 0l-67.2 67.2c-.12.12-.25.25-.36.37-58.566 58.892-59.387 154.781.36 214.59a152.454 152.454 0 0 0 20.521 17.196c6.402 4.468 15.064 3.789 20.584-1.731l21.054-21.055c8.35-8.35 12.094-19.239 11.698-29.806a16.037 16.037 0 0 0-6.947-12.606c-2.912-2.005-6.64-4.875-10.341-8.569-28.073-28.073-28.191-73.639 0-101.83l67.2-67.19c28.239-28.239 74.3-28.069 102.325.51 27.75 28.3 26.872 73.934-1.155 101.96l-13.087 13.087c-4.35 4.35-5.769 10.79-3.783 16.612 5.864 17.194 9.042 34.999 9.69 52.721.509 13.906 17.454 20.446 27.294 10.606l37.106-37.106c59.271-59.259 59.271-155.699.001-214.959z"/></svg> aniruhil.org](https://aniruhil.org) [<svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 512 512"><path d="M476 3.2L12.5 270.6c-18.1 10.4-15.8 35.6 2.2 43.2L121 358.4l287.3-253.2c5.5-4.9 13.3 2.6 8.6 8.3L176 407v80.5c0 23.6 28.5 32.9 42.5 15.8L282 426l124.6 52.2c14.2 6 30.4-2.9 33-18.2l72-432C515 7.8 493.3-6.8 476 3.2z"/></svg> ruhil@ohio.edu](mailto:ruhil@ohio.edu)