Create a new RMarkdown
file that is blank after the initial setup code chunk
Insert a code chunk that reads in both these files found on the web
http://www.stata.com/data/jwooldridge/eacsap/mroz.dta
http://calcnet.mth.cmich.edu/org/spss/V16_materials/DataSets_v16/airline_passengers.sav
library(haven)
read_dta("http://www.stata.com/data/jwooldridge/eacsap/mroz.dta") -> mroz
read_sav("http://calcnet.mth.cmich.edu/org/spss/V16_materials/DataSets_v16/airline_passengers.sav") -> airline
In a follow-up code chunk, run the summary()
command on each data-set
## inlf hours kidslt6 kidsge6
## Min. :0.0000 Min. : 0.0 Min. :0.0000 Min. :0.000
## 1st Qu.:0.0000 1st Qu.: 0.0 1st Qu.:0.0000 1st Qu.:0.000
## Median :1.0000 Median : 288.0 Median :0.0000 Median :1.000
## Mean :0.5684 Mean : 740.6 Mean :0.2377 Mean :1.353
## 3rd Qu.:1.0000 3rd Qu.:1516.0 3rd Qu.:0.0000 3rd Qu.:2.000
## Max. :1.0000 Max. :4950.0 Max. :3.0000 Max. :8.000
##
## age educ wage repwage hushrs
## Min. :30.00 Min. : 5.00 Min. : 0.000 Min. :0.00 Min. : 175
## 1st Qu.:36.00 1st Qu.:12.00 1st Qu.: 0.000 1st Qu.:0.00 1st Qu.:1928
## Median :43.00 Median :12.00 Median : 1.625 Median :0.00 Median :2164
## Mean :42.54 Mean :12.29 Mean : 2.375 Mean :1.85 Mean :2267
## 3rd Qu.:49.00 3rd Qu.:13.00 3rd Qu.: 3.788 3rd Qu.:3.58 3rd Qu.:2553
## Max. :60.00 Max. :17.00 Max. :25.000 Max. :9.98 Max. :5010
##
## husage huseduc huswage faminc
## Min. :30.00 Min. : 3.00 Min. : 0.4121 Min. : 1500
## 1st Qu.:38.00 1st Qu.:11.00 1st Qu.: 4.7883 1st Qu.:15428
## Median :46.00 Median :12.00 Median : 6.9758 Median :20880
## Mean :45.12 Mean :12.49 Mean : 7.4822 Mean :23081
## 3rd Qu.:52.00 3rd Qu.:15.00 3rd Qu.: 9.1667 3rd Qu.:28200
## Max. :60.00 Max. :17.00 Max. :40.5090 Max. :96000
##
## mtr motheduc fatheduc unem
## Min. :0.4415 Min. : 0.000 Min. : 0.000 Min. : 3.000
## 1st Qu.:0.6215 1st Qu.: 7.000 1st Qu.: 7.000 1st Qu.: 7.500
## Median :0.6915 Median :10.000 Median : 7.000 Median : 7.500
## Mean :0.6789 Mean : 9.251 Mean : 8.809 Mean : 8.624
## 3rd Qu.:0.7215 3rd Qu.:12.000 3rd Qu.:12.000 3rd Qu.:11.000
## Max. :0.9415 Max. :17.000 Max. :17.000 Max. :14.000
##
## city exper nwifeinc lwage
## Min. :0.0000 Min. : 0.00 Min. :-0.02906 Min. :-2.0542
## 1st Qu.:0.0000 1st Qu.: 4.00 1st Qu.:13.02504 1st Qu.: 0.8165
## Median :1.0000 Median : 9.00 Median :17.70000 Median : 1.2476
## Mean :0.6428 Mean :10.63 Mean :20.12896 Mean : 1.1902
## 3rd Qu.:1.0000 3rd Qu.:15.00 3rd Qu.:24.46600 3rd Qu.: 1.6036
## Max. :1.0000 Max. :45.00 Max. :96.00000 Max. : 3.2189
## NA's :325
## expersq
## Min. : 0
## 1st Qu.: 16
## Median : 81
## Mean : 178
## 3rd Qu.: 225
## Max. :2025
##
## number
## Min. :104.0
## 1st Qu.:180.0
## Median :265.5
## Mean :280.3
## 3rd Qu.:360.5
## Max. :622.0
In a separate code chunk, read in this dataset after you download it and save the unzipped file in your data folder.
library(here)
read.csv(
here("data", "201502-citibike-tripdata.csv"),
header = TRUE,
sep = ","
) -> citibike
gender
has the following codes: Zero = unknown; 1 = male; 2 = female
gender
into a factor
with these value labelsfactor(citibike$gender,
levels = c(0, 1, 2),
labels = c("Unknown", "Male", "Female")
) -> citibike$gender
In a follow-up chunk run the following commands on this data-set
names()
str()
summary()
## [1] "tripduration" "starttime"
## [3] "stoptime" "start.station.id"
## [5] "start.station.name" "start.station.latitude"
## [7] "start.station.longitude" "end.station.id"
## [9] "end.station.name" "end.station.latitude"
## [11] "end.station.longitude" "bikeid"
## [13] "usertype" "birth.year"
## [15] "gender"
## 'data.frame': 196930 obs. of 15 variables:
## $ tripduration : int 801 379 2474 818 544 717 1306 913 759 585 ...
## $ starttime : Factor w/ 31630 levels "2/1/2015 0:00",..: 1 1 2 2 2 3 4 4 4 5 ...
## $ stoptime : Factor w/ 31706 levels "2/1/2015 0:07",..: 4 1 28 5 3 4 13 7 6 5 ...
## $ start.station.id : int 521 497 281 2004 323 373 352 439 335 284 ...
## $ start.station.name : Factor w/ 328 levels "1 Ave & E 15 St",..: 16 106 171 13 200 325 307 128 316 175 ...
## $ start.station.latitude : num 40.8 40.7 40.8 40.7 40.7 ...
## $ start.station.longitude: num -74 -74 -74 -74 -74 ...
## $ end.station.id : int 423 504 127 505 83 2002 504 116 2012 444 ...
## $ end.station.name : Factor w/ 328 levels "1 Ave & E 15 St",..: 305 1 37 15 29 327 1 268 118 47 ...
## $ end.station.latitude : num 40.8 40.7 40.7 40.7 40.7 ...
## $ end.station.longitude : num -74 -74 -74 -74 -74 ...
## $ bikeid : int 17131 21289 18903 21044 19868 15854 15173 17862 21183 14843 ...
## $ usertype : Factor w/ 2 levels "Customer","Subscriber": 2 2 2 2 2 2 2 2 2 2 ...
## $ birth.year : int 1978 1993 1969 1985 1957 1979 1983 1955 1985 1982 ...
## $ gender : Factor w/ 3 levels "Unknown","Male",..: 3 2 3 3 2 2 2 2 3 2 ...
## tripduration starttime stoptime
## Min. : 60.0 2/11/2015 18:20: 34 2/11/2015 8:53 : 39
## 1st Qu.: 340.0 2/11/2015 8:46 : 33 2/5/2015 8:52 : 36
## Median : 507.0 2/10/2015 18:16: 32 2/26/2015 8:50 : 35
## Mean : 649.4 2/11/2015 17:15: 32 2/12/2015 8:39 : 34
## 3rd Qu.: 764.0 2/12/2015 17:40: 32 2/5/2015 8:56 : 34
## Max. :43016.0 2/12/2015 8:35 : 32 2/10/2015 17:42: 33
## (Other) :196735 (Other) :196719
## start.station.id start.station.name start.station.latitude
## Min. : 72.0 8 Ave & W 31 St : 2238 Min. :40.68
## 1st Qu.: 307.0 W 21 St & 6 Ave : 2143 1st Qu.:40.72
## Median : 417.0 Lafayette St & E 8 St : 2130 Median :40.74
## Mean : 438.7 E 43 St & Vanderbilt Ave: 2102 Mean :40.74
## 3rd Qu.: 491.0 W 41 St & 8 Ave : 2078 3rd Qu.:40.75
## Max. :3002.0 E 17 St & Broadway : 1908 Max. :40.77
## (Other) :184331
## start.station.longitude end.station.id end.station.name
## Min. :-74.02 Min. : 72.0 W 41 St & 8 Ave : 2548
## 1st Qu.:-74.00 1st Qu.: 307.0 Lafayette St & E 8 St : 2244
## Median :-73.99 Median : 415.0 E 43 St & Vanderbilt Ave: 2239
## Mean :-73.99 Mean : 438.5 W 21 St & 6 Ave : 2221
## 3rd Qu.:-73.98 3rd Qu.: 491.0 W 33 St & 7 Ave : 2155
## Max. :-73.95 Max. :3002.0 E 17 St & Broadway : 1990
## (Other) :183533
## end.station.latitude end.station.longitude bikeid usertype
## Min. :40.68 Min. :-74.02 Min. :14530 Customer : 2265
## 1st Qu.:40.72 1st Qu.:-74.00 1st Qu.:16338 Subscriber:194665
## Median :40.74 Median :-73.99 Median :18089
## Mean :40.74 Mean :-73.99 Mean :18120
## 3rd Qu.:40.75 3rd Qu.:-73.98 3rd Qu.:19886
## Max. :40.77 Max. :-73.95 Max. :21703
##
## birth.year gender
## Min. :1899 Unknown: 2303
## 1st Qu.:1967 Male :161563
## Median :1977 Female : 33064
## Mean :1975
## 3rd Qu.:1985
## Max. :1999
## NA's :2267
In a final chunk, run the commands necessary to save each of the three data-sets as separate RData
files. Make sure you save them in your data folder.
save(mroz,
file = here("data", "mroz.RData")
)
save(airline,
file = here("data", "airline.RData")
)
save(citibike,
file = here("data", "citibike.RData")
)
Now knit the complete Rmd
file to html
Go to this page on Kaggle and read the description of the data-set on mass shootings in the United States that occurred during the 1966-2017 period. once you have read the overview of the data, click the “Data” tab and download the file called Mass Shootings Dataset.csv
. Be careful; there are several versions so the one you want is the very last one and not any that have a version number attached, such as “Mass Shootings Dataset Ver 2.csv” for example.
Now read this file into R, perhaps naming it shootings and run the summary()
command on it. Note the number of observations and the number of variables in the data-set.
read.csv(
here("data", "Mass Shootings Dataset.csv"),
header = TRUE,
sep = ","
) -> shootings
summary(shootings)
## S. Title
## Min. : 1.0 Killeen : 2
## 1st Qu.:100.2 101 California Street shootings : 1
## Median :199.5 49th Street Elementary School : 1
## Mean :199.5 Accent Signage Systems in Minneapolis: 1
## 3rd Qu.:298.8 Accent Signage Systems shooting : 1
## Max. :398.0 Air Force base shooting : 1
## (Other) :391
## Location Date
## Seattle, Washington : 6 2/20/2016: 5
## Killeen, Texas : 5 10/1/2015: 3
## Colorado Springs, Colorado: 4 2/25/2016: 3
## Dallas, Texas : 4 2/6/2016 : 3
## Omaha, Nebraska : 4 3/19/2016: 3
## Phoenix, Arizona : 4 3/29/2009: 3
## (Other) :371 (Other) :378
## Summary
## : 1
## 26-year-old Chris Harper Mercer opened fire at Umpqua Community College in southwest Oregon. The gunman shot himself to death after being wounded in a shootout with police. : 1
## 19-year-old male kills his mother and two neighbors before killing himself. Motive is unclear. : 1
## 26-year-old killed his Mother, Father, Grandmother, brother, and sister before killing himself in the home of his Grandmother. : 1
## 42-year-old husband murders wife, shoots two young family members, then shoots himself in the head. : 1
## A Couple and their sons were found shot to death in their mobile home on Wednesday, February 4. 2015. All four family members were found shot in the head. The shooter shot his wife and two boys before killing himself.: 1
## (Other) :392
## Fatalities Injured Total.victims Mental.Health.Issues
## Min. : 0.000 Min. : 0.000 Min. : 3.00 No :110
## 1st Qu.: 2.000 1st Qu.: 1.000 1st Qu.: 4.00 Unclear : 21
## Median : 4.000 Median : 3.000 Median : 6.00 Unclear : 1
## Mean : 5.015 Mean : 6.251 Mean : 10.93 unknown : 1
## 3rd Qu.: 6.000 3rd Qu.: 5.000 3rd Qu.: 10.00 Unknown :120
## Max. :58.000 Max. :515.000 Max. :573.00 Yes :145
##
## Race Gender Latitude
## White American or European American:137 : 1 Min. :21.31
## Black American or African American : 78 Female : 7 1st Qu.:33.57
## Unknown : 44 M : 17 Median :37.23
## white : 41 M/F : 1 Mean :37.26
## Some other race : 23 Male :346 3rd Qu.:41.64
## Asian American : 16 Male/Female: 4 Max. :60.79
## (Other) : 59 Unknown : 22 NA's :20
## Longitude
## Min. :-161.79
## 1st Qu.:-110.97
## Median : -88.69
## Mean : -94.74
## 3rd Qu.: -81.68
## Max. : -69.71
## NA's :20
Go to this page on Kaggle and download the file called aac_shelter_outcomes.zip
, unzip it, and AFTER reading the data overview, read in the file and generate a list of variable names with an appropriate command.
read.csv(
here("data", "aac_shelter_outcomes.csv"),
header = TRUE,
sep = ","
) -> animal
names(animal)
## [1] "age_upon_outcome" "animal_id" "animal_type" "breed"
## [5] "color" "date_of_birth" "datetime" "monthyear"
## [9] "name" "outcome_subtype" "outcome_type" "sex_upon_outcome"