+ - 0:00:00
Notes for current slide
Notes for next slide

dplyr and tidyr

Ani Ruhil

1 / 139
3 / 139

"dplyr is a grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges"

The core functions are:

  • mutate() ... to add new variables that are functions of existing variables
  • transmute() ... to add new variables and drop the rest
  • select() ... to pick variables (based on their names) to work on
  • filter() ... to pick cases (based on their values) to work on
  • summarise() .. to calculate quantities of interest (rather than add these quantities to the existing data frame) and save these as a standalone object
  • arrange() ... order the rows based on out preferences

In order to understand how these commands work we rely on a simple data frame -- all flights originating and departing from Columbus (Ohio) January through September of 2017. Let us load the data and dplyr.

library(here)
load("data/cmhflights2017.RData")
library(dplyr)
library(tidylog)
4 / 139

select

5 / 139

Say I only want the first five columns.

cmhflights2017 %>%
select(Year:DayOfWeek) -> my.df
names(my.df)
## [1] "Year" "Quarter" "Month" "DayofMonth" "DayOfWeek"
6 / 139

What if the columns were not sequentially located?

cmhflights2017 %>%
select(Year, FlightDate:DOT_ID_Reporting_Airline, Tail_Number) -> my.df
names(my.df)
## [1] "Year" "FlightDate" "Reporting_Airline"
## [4] "DOT_ID_Reporting_Airline" "Tail_Number"

Could we use column numbers instead?

cmhflights2017 %>%
select(c(1, 3, 5, 7)) -> my.df
names(my.df)
## [1] "id" "Quarter" "DayofMonth" "FlightDate"
7 / 139

What about columns that cotain a specific string?

cmhflights2017 %>%
select(contains("Origin")) -> my.df
names(my.df)
## [1] "OriginAirportID" "OriginAirportSeqID" "OriginCityMarketID" "Origin"
## [5] "OriginCityName" "OriginState" "OriginStateFips" "OriginStateName"
## [9] "OriginWac"

Columns that start or end with some string?

cmhflights2017 %>%
select(starts_with("De")) -> my.df
names(my.df)
## [1] "DestAirportID" "DestAirportSeqID" "DestCityMarketID"
## [4] "Dest" "DestCityName" "DestState"
## [7] "DestStateFips" "DestStateName" "DestWac"
## [10] "DepTime" "DepDelay" "DepDelayMinutes"
## [13] "DepDel15" "DepartureDelayGroups" "DepTimeBlk"
8 / 139
cmhflights2017 %>%
select(ends_with("Num")) -> my.df
names(my.df)
## [1] "Div1TailNum" "Div2TailNum" "Div3TailNum" "Div4TailNum" "Div5TailNum"

Say you have columns named something like "V0124", "V0125" and so on until "V1800"; SPSS files from some data archives tend to have this naming convention. In that case, you could do:

somedata %>%
select(num_range("V", 51:231)) -> my.df
9 / 139

dplyr also has one_of and matches as, for example, in

cmhflights2017 %>%
select(one_of("Year", "Quarter", "Month", "Dest")) -> my.df
names(my.df)
## [1] "Year" "Quarter" "Month" "Dest"
cmhflights2017 %>%
select(matches("Wheels")) -> my.df
names(my.df)
## [1] "WheelsOff" "WheelsOn" "Div1WheelsOn" "Div1WheelsOff" "Div2WheelsOn"
## [6] "Div2WheelsOff" "Div3WheelsOn" "Div3WheelsOff" "Div4WheelsOn" "Div4WheelsOff"
## [11] "Div5WheelsOn" "Div5WheelsOff"
10 / 139

filter

11 / 139

We used subset() to select particular cases for analysis. With dplyr, the filter() command serves the same purpose. Say we only want records from January, i.e., Month == 1.

cmhflights2017 %>%
filter(Month == 1) -> my.df
table(my.df$Month)
##
## 1
## 3757

What about only American Airline flights in January?

cmhflights2017 %>%
filter(Month == 1 & Reporting_Airline == "AA") -> my.df
table(my.df$Month, my.df$Reporting_Airline)
##
## AA
## 1 387
12 / 139

What about United Airlines flights in January to CMH (Columbus, OH) to any destination?

cmhflights2017 %>%
filter(Month == 1 & Reporting_Airline == "UA" & Dest == "CMH") -> my.df
<div id="htmlwidget-d5995c8e0aa77f45a50e" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-d5995c8e0aa77f45a50e">{"x":{"filter":"none","data":[["1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45","46","47","48","49","50","51","52","53"],["2017-01-22","2017-01-21","2017-01-20","2017-01-20","2017-01-19","2017-01-19","2017-01-18","2017-01-04","2017-01-04","2017-01-04","2017-01-03","2017-01-03","2017-01-02","2017-01-02","2017-01-31","2017-01-31","2017-01-30","2017-01-30","2017-01-29","2017-01-28","2017-01-27","2017-01-27","2017-01-13","2017-01-13","2017-01-12","2017-01-12","2017-01-11","2017-01-11","2017-01-10","2017-01-10","2017-01-09","2017-01-09","2017-01-18","2017-01-17","2017-01-17","2017-01-16","2017-01-16","2017-01-15","2017-01-14","2017-01-26","2017-01-26","2017-01-25","2017-01-25","2017-01-24","2017-01-24","2017-01-23","2017-01-23","2017-01-08","2017-01-07","2017-01-06","2017-01-06","2017-01-05","2017-01-05"],[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1],["UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA"],["CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH"]],"container":"<table class=\"compact\">\n <thead>\n <tr>\n <th> <\/th>\n <th>FlightDate<\/th>\n <th>Month<\/th>\n <th>Reporting_Airline<\/th>\n <th>Dest<\/th>\n <\/tr>\n <\/thead>\n<\/table>","options":{"pageLength":5,"columnDefs":[{"className":"dt-right","targets":2},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false,"lengthMenu":[5,10,25,50,100]}},"evals":[],"jsHooks":[]}</script>
13 / 139

What if I wanted a more complicated filter, say, flights in January or February to CMH or ORD?

cmhflights2017 %>%
filter(Month %in% c(1, 2) & Reporting_Airline == "UA" & Dest %in% c("CMH", "ORD")) -> my.df
table(my.df$Month)
##
## 1 2
## 106 145
table(my.df$Reporting_Airline)
##
## UA
## 251
table(my.df$Dest)
##
## CMH ORD
## 132 119
14 / 139

arrange

15 / 139

Say I wanted to arrange the resulting data frame by ascending order of departure delays. How might I do that?

my.df %>%
arrange(DepDelayMinutes) -> my.df2

And now in descending order of delays.

my.df %>%
arrange(-DepDelayMinutes) -> my.df2

We could tweak this further, perhaps saying sort by departure delays to CMH, and then to ORD.

my.df %>%
arrange(Dest, -DepDelayMinutes) -> my.df2
16 / 139

So far, we have seen each function in isolation. Now we streamline things a bit so that we only end up with the columns we want. That is, we can string together these functions to have a slim data frame of only those columns we selected, filtered as we want, and arranged as we want.

cmhflights2017 %>%
select(Month, Reporting_Airline, Dest, DepDelayMinutes) %>%
filter(Month %in% c(1, 2) & Reporting_Airline == "UA" & Dest %in% c("CMH", "ORD")) %>%
arrange(Month, Dest, -DepDelayMinutes) -> my.df3

Here, the end result is a data frame arranged by Month, then within Month by Destination, and then finally by descending order of flight delays. This is the beauty of dplyr, allowing us to chain together various functions to get what we want.

<div id="htmlwidget-4ee093bc1624a6fc4c0d" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-4ee093bc1624a6fc4c0d">{"x":{"filter":"none","data":[["1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45","46","47","48","49","50","51","52","53","54","55","56","57","58","59","60","61","62","63","64","65","66","67","68","69","70","71","72","73","74","75","76","77","78","79","80","81","82","83","84","85","86","87","88","89","90","91","92","93","94","95","96","97","98","99","100","101","102","103","104","105","106","107","108","109","110","111","112","113","114","115","116","117","118","119","120","121","122","123","124","125","126","127","128","129","130","131","132","133","134","135","136","137","138","139","140","141","142","143","144","145","146","147","148","149","150","151","152","153","154","155","156","157","158","159","160","161","162","163","164","165","166","167","168","169","170","171","172","173","174","175","176","177","178","179","180","181","182","183","184","185","186","187","188","189","190","191","192","193","194","195","196","197","198","199","200","201","202","203","204","205","206","207","208","209","210","211","212","213","214","215","216","217","218","219","220","221","222","223","224","225","226","227","228","229","230","231","232","233","234","235","236","237","238","239","240","241","242","243","244","245","246","247","248","249","250","251"],[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2],["UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA","UA"],["CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","CMH","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD","ORD"],[178,61,44,39,39,30,27,18,12,10,8,2,2,2,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,164,112,45,34,16,15,13,8,7,6,3,2,2,2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,376,109,66,51,38,38,37,34,21,14,13,9,7,2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,394,235,177,47,36,24,13,12,10,3,2,2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]],"container":"<table class=\"compact\">\n <thead>\n <tr>\n <th> <\/th>\n <th>Month<\/th>\n <th>Reporting_Airline<\/th>\n <th>Dest<\/th>\n <th>DepDelayMinutes<\/th>\n <\/tr>\n <\/thead>\n<\/table>","options":{"pageLength":5,"columnDefs":[{"className":"dt-right","targets":[1,4]},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false,"lengthMenu":[5,10,25,50,100]}},"evals":[],"jsHooks":[]}</script>
17 / 139

group_by

18 / 139

Let us calculate the Mean and Median delay. To do this, we will lean on the group_by() function to calculate median and mean delays by Month and Destination. While we are at it, let us see the standard deviation as well.

cmhflights2017 %>%
select(Month, Reporting_Airline, Dest, DepDelayMinutes) %>%
filter(Month %in% c(1, 2) &
Reporting_Airline == "UA" &
Dest %in% c("CMH", "ORD")
) %>%
group_by(Month, Dest) %>%
summarise(Median.Delay = median(DepDelayMinutes, na.rm = TRUE),
Mean.Delay = mean(DepDelayMinutes, na.rm = TRUE),
SD.Delay = sd(DepDelayMinutes, na.rm = TRUE)
) %>%
arrange(Month, Dest) -> tab.01
head(tab.01)
## # A tibble: 4 x 5
## # Groups: Month [2]
## Month Dest Median.Delay Mean.Delay SD.Delay
## <dbl> <chr> <dbl> <dbl> <dbl>
## 1 1 CMH 0 8.96 27.2
## 2 1 ORD 0 8.11 27.7
## 3 2 CMH 0 10.3 45.0
## 4 2 ORD 0 14.5 59.8
19 / 139

What if wanted to know how many airports can you fly to from Columbus, and how many flights there were to each in July?

cmhflights2017 %>%
select(Month, Reporting_Airline, Dest, DepDelayMinutes, Origin) %>%
filter(Month %in% c(7) &
Origin %in% c("CMH")
) %>%
group_by(Dest) %>%
summarise(No.of.Flights = n()
) -> tab.02
head(tab.02)
## # A tibble: 6 x 2
## Dest No.of.Flights
## <chr> <int>
## 1 ATL 338
## 2 BNA 62
## 3 BOS 57
## 4 BWI 113
## 5 DAL 31
## 6 DCA 57
20 / 139

How many distinct airlines fly out of Columbus?

cmhflights2017 %>%
select(Month, Reporting_Airline, Dest, DepDelayMinutes, Origin) %>%
filter(Origin %in% c("CMH")) %>%
group_by(Reporting_Airline) %>%
summarise(No.of.Airlines = n_distinct(Reporting_Airline)) -> tab.03
head(tab.03)
## # A tibble: 6 x 2
## Reporting_Airline No.of.Airlines
## <chr> <int>
## 1 AA 1
## 2 DL 1
## 3 EV 1
## 4 F9 1
## 5 OO 1
## 6 UA 1
21 / 139

And how many flights did each airline operate?

cmhflights2017 %>%
select(Month, Reporting_Airline, Dest, DepDelayMinutes, Origin) %>%
filter(Origin %in% c("CMH")) %>%
group_by(Reporting_Airline) %>%
summarise(No.of.Flights = n()) -> tab.04
head(tab.04)
## # A tibble: 6 x 2
## Reporting_Airline No.of.Flights
## <chr> <int>
## 1 AA 2555
## 2 DL 3612
## 3 EV 1972
## 4 F9 1005
## 5 OO 1573
## 6 UA 951
22 / 139

Of all of these airlines, which one has the highest mean departure delay?

cmhflights2017 %>%
select(Month, Reporting_Airline, Dest, DepDelayMinutes, Origin) %>%
filter(Origin %in% c("CMH")) %>%
group_by(Reporting_Airline) %>%
summarise(Median.Delay = mean(DepDelayMinutes, na.rm = TRUE)) -> tab.05
head(tab.05)
## # A tibble: 6 x 2
## Reporting_Airline Median.Delay
## <chr> <dbl>
## 1 AA 11.3
## 2 DL 8.19
## 3 EV 25.0
## 4 F9 14.4
## 5 OO 12.8
## 6 UA 11.1
23 / 139

mutate

Thus far we have been creating a summary data frame containing whatever we calculated -- the frequencies, averages, and so on. Some times, however, you may want these calculations to be be appended to the original data frame and stored as new columns with mutate(), as shown below. In this particular example, I am calculating median departure delay by airline (for each airport it departs from) and then including this as a column called Median.Delay in cmhflights.

cmhflights2017 %>%
group_by(Reporting_Airline, Origin) %>%
mutate(Median.Delay = mean(DepDelayMinutes, na.rm = TRUE)
) -> cmhflights
24 / 139

Note the 111th column. Now, if we wanted to add this column but drop all other columns not being used in the calculation, we could do that via transmute(). Be careful with this one though.

cmhflights2017 -> mut.df
mut.df %>%
group_by(Reporting_Airline, Dest) %>%
transmute(Median.Arr.Delay = mean(ArrDelayMinutes, na.rm = TRUE)) -> mut.df2
names(mut.df2)
## [1] "Reporting_Airline" "Dest" "Median.Arr.Delay"
25 / 139

Calculations with dplyr

26 / 139

I want a simple frequency table of flights per unique airline.

cmhflights2017 %>%
count(Reporting_Airline, sort = TRUE) -> tab.01
head(tab.01)
## Reporting_Airline n
## 1 WN 24450
## 2 DL 7224
## 3 AA 5111
## 4 EV 3945
## 5 OO 3146
## 6 F9 2009
27 / 139

By carrier and destination?

cmhflights2017 %>%
count(Reporting_Airline, Dest, sort = TRUE) -> tab.02
head(tab.02)
## Reporting_Airline Dest n
## 1 WN CMH 12225
## 2 DL CMH 3612
## 3 DL ATL 2800
## 4 AA CMH 2556
## 5 WN MDW 2001
## 6 EV CMH 1973

How many unique aircraft do we have?

cmhflights2017 %>%
summarise(unique.planes = n_distinct(Tail_Number))
## unique.planes
## 1 2473
28 / 139

How many flights do we see per unique aircraft?

cmhflights2017 %>%
group_by(Tail_Number) %>%
summarise(Number = n()) -> tab.03
<div id="htmlwidget-b7304ac0f9cb1071f314" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-b7304ac0f9cb1071f314">{"x":{"filter":"none","data":[["1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45","46","47","48","49","50","51","52","53","54","55","56","57","58","59","60","61","62","63","64","65","66","67","68","69","70","71","72","73","74","75","76","77","78","79","80","81","82","83","84","85","86","87","88","89","90","91","92","93","94","95","96","97","98","99","100","101","102","103","104","105","106","107","108","109","110","111","112","113","114","115","116","117","118","119","120","121","122","123","124","125","126","127","128","129","130","131","132","133","134","135","136","137","138","139","140","141","142","143","144","145","146","147","148","149","150","151","152","153","154","155","156","157","158","159","160","161","162","163","164","165","166","167","168","169","170","171","172","173","174","175","176","177","178","179","180","181","182","183","184","185","186","187","188","189","190","191","192","193","194","195","196","197","198","199","200","201","202","203","204","205","206","207","208","209","210","211","212","213","214","215","216","217","218","219","220","221","222","223","224","225","226","227","228","229","230","231","232","233","234","235","236","237","238","239","240","241","242","243","244","245","246","247","248","249","250","251","252","253","254","255","256","257","258","259","260","261","262","263","264","265","266","267","268","269","270","271","272","273","274","275","276","277","278","279","280","281","282","283","284","285","286","287","288","289","290","291","292","293","294","295","296","297","298","299","300","301","302","303","304","305","306","307","308","309","310","311","312","313","314","315","316","317","318","319","320","321","322","323","324","325","326","327","328","329","330","331","332","333","334","335","336","337","338","339","340","341","342","343","344","345","346","347","348","349","350","351","352","353","354","355","356","357","358","359","360","361","362","363","364","365","366","367","368","369","370","371","372","373","374","375","376","377","378","379","380","381","382","383","384","385","386","387","388","389","390","391","392","393","394","395","396","397","398","399","400","401","402","403","404","405","406","407","408","409","410","411","412","413","414","415","416","417","418","419","420","421","422","423","424","425","426","427","428","429","430","431","432","433","434","435","436","437","438","439","440","441","442","443","444","445","446","447","448","449","450","451","452","453","454","455","456","457","458","459","460","461","462","463","464","465","466","467","468","469","470","471","472","473","474","475","476","477","478","479","480","481","482","483","484","485","486","487","488","489","490","491","492","493","494","495","496","497","498","499","500","501","502","503","504","505","506","507","508","509","510","511","512","513","514","515","516","517","518","519","520","521","522","523","524","525","526","527","528","529","530","531","532","533","534","535","536","537","538","539","540","541","542","543","544","545","546","547","548","549","550","551","552","553","554","555","556","557","558","559","560","561","562","563","564","565","566","567","568","569","570","571","572","573","574","575","576","577","578","579","580","581","582","583","584","585","586","587","588","589","590","591","592","593","594","595","596","597","598","599","600","601","602","603","604","605","606","607","608","609","610","611","612","613","614","615","616","617","618","619","620","621","622","623","624","625","626","627","628","629","630","631","632","633","634","635","636","637","638","639","640","641","642","643","644","645","646","647","648","649","650","651","652","653","654","655","656","657","658","659","660","661","662","663","664","665","666","667","668","669","670","671","672","673","674","675","676","677","678","679","680","681","682","683","684","685","686","687","688","689","690","691","692","693","694","695","696","697","698","699","700","701","702","703","704","705","706","707","708","709","710","711","712","713","714","715","716","717","718","719","720","721","722","723","724","725","726","727","728","729","730","731","732","733","734","735","736","737","738","739","740","741","742","743","744","745","746","747","748","749","750","751","752","753","754","755","756","757","758","759","760","761","762","763","764","765","766","767","768","769","770","771","772","773","774","775","776","777","778","779","780","781","782","783","784","785","786","787","788","789","790","791","792","793","794","795","796","797","798","799","800","801","802","803","804","805","806","807","808","809","810","811","812","813","814","815","816","817","818","819","820","821","822","823","824","825","826","827","828","829","830","831","832","833","834","835","836","837","838","839","840","841","842","843","844","845","846","847","848","849","850","851","852","853","854","855","856","857","858","859","860","861","862","863","864","865","866","867","868","869","870","871","872","873","874","875","876","877","878","879","880","881","882","883","884","885","886","887","888","889","890","891","892","893","894","895","896","897","898","899","900","901","902","903","904","905","906","907","908","909","910","911","912","913","914","915","916","917","918","919","920","921","922","923","924","925","926","927","928","929","930","931","932","933","934","935","936","937","938","939","940","941","942","943","944","945","946","947","948","949","950","951","952","953","954","955","956","957","958","959","960","961","962","963","964","965","966","967","968","969","970","971","972","973","974","975","976","977","978","979","980","981","982","983","984","985","986","987","988","989","990","991","992","993","994","995","996","997","998","999","1000","1001","1002","1003","1004","1005","1006","1007","1008","1009","1010","1011","1012","1013","1014","1015","1016","1017","1018","1019","1020","1021","1022","1023","1024","1025","1026","1027","1028","1029","1030","1031","1032","1033","1034","1035","1036","1037","1038","1039","1040","1041","1042","1043","1044","1045","1046","1047","1048","1049","1050","1051","1052","1053","1054","1055","1056","1057","1058","1059","1060","1061","1062","1063","1064","1065","1066","1067","1068","1069","1070","1071","1072","1073","1074","1075","1076","1077","1078","1079","1080","1081","1082","1083","1084","1085","1086","1087","1088","1089","1090","1091","1092","1093","1094","1095","1096","1097","1098","1099","1100","1101","1102","1103","1104","1105","1106","1107","1108","1109","1110","1111","1112","1113","1114","1115","1116","1117","1118","1119","1120","1121","1122","1123","1124","1125","1126","1127","1128","1129","1130","1131","1132","1133","1134","1135","1136","1137","1138","1139","1140","1141","1142","1143","1144","1145","1146","1147","1148","1149","1150","1151","1152","1153","1154","1155","1156","1157","1158","1159","1160","1161","1162","1163","1164","1165","1166","1167","1168","1169","1170","1171","1172","1173","1174","1175","1176","1177","1178","1179","1180","1181","1182","1183","1184","1185","1186","1187","1188","1189","1190","1191","1192","1193","1194","1195","1196","1197","1198","1199","1200","1201","1202","1203","1204","1205","1206","1207","1208","1209","1210","1211","1212","1213","1214","1215","1216","1217","1218","1219","1220","1221","1222","1223","1224","1225","1226","1227","1228","1229","1230","1231","1232","1233","1234","1235","1236","1237","1238","1239","1240","1241","1242","1243","1244","1245","1246","1247","1248","1249","1250","1251","1252","1253","1254","1255","1256","1257","1258","1259","1260","1261","1262","1263","1264","1265","1266","1267","1268","1269","1270","1271","1272","1273","1274","1275","1276","1277","1278","1279","1280","1281","1282","1283","1284","1285","1286","1287","1288","1289","1290","1291","1292","1293","1294","1295","1296","1297","1298","1299","1300","1301","1302","1303","1304","1305","1306","1307","1308","1309","1310","1311","1312","1313","1314","1315","1316","1317","1318","1319","1320","1321","1322","1323","1324","1325","1326","1327","1328","1329","1330","1331","1332","1333","1334","1335","1336","1337","1338","1339","1340","1341","1342","1343","1344","1345","1346","1347","1348","1349","1350","1351","1352","1353","1354","1355","1356","1357","1358","1359","1360","1361","1362","1363","1364","1365","1366","1367","1368","1369","1370","1371","1372","1373","1374","1375","1376","1377","1378","1379","1380","1381","1382","1383","1384","1385","1386","1387","1388","1389","1390","1391","1392","1393","1394","1395","1396","1397","1398","1399","1400","1401","1402","1403","1404","1405","1406","1407","1408","1409","1410","1411","1412","1413","1414","1415","1416","1417","1418","1419","1420","1421","1422","1423","1424","1425","1426","1427","1428","1429","1430","1431","1432","1433","1434","1435","1436","1437","1438","1439","1440","1441","1442","1443","1444","1445","1446","1447","1448","1449","1450","1451","1452","1453","1454","1455","1456","1457","1458","1459","1460","1461","1462","1463","1464","1465","1466","1467","1468","1469","1470","1471","1472","1473","1474","1475","1476","1477","1478","1479","1480","1481","1482","1483","1484","1485","1486","1487","1488","1489","1490","1491","1492","1493","1494","1495","1496","1497","1498","1499","1500","1501","1502","1503","1504","1505","1506","1507","1508","1509","1510","1511","1512","1513","1514","1515","1516","1517","1518","1519","1520","1521","1522","1523","1524","1525","1526","1527","1528","1529","1530","1531","1532","1533","1534","1535","1536","1537","1538","1539","1540","1541","1542","1543","1544","1545","1546","1547","1548","1549","1550","1551","1552","1553","1554","1555","1556","1557","1558","1559","1560","1561","1562","1563","1564","1565","1566","1567","1568","1569","1570","1571","1572","1573","1574","1575","1576","1577","1578","1579","1580","1581","1582","1583","1584","1585","1586","1587","1588","1589","1590","1591","1592","1593","1594","1595","1596","1597","1598","1599","1600","1601","1602","1603","1604","1605","1606","1607","1608","1609","1610","1611","1612","1613","1614","1615","1616","1617","1618","1619","1620","1621","1622","1623","1624","1625","1626","1627","1628","1629","1630","1631","1632","1633","1634","1635","1636","1637","1638","1639","1640","1641","1642","1643","1644","1645","1646","1647","1648","1649","1650","1651","1652","1653","1654","1655","1656","1657","1658","1659","1660","1661","1662","1663","1664","1665","1666","1667","1668","1669","1670","1671","1672","1673","1674","1675","1676","1677","1678","1679","1680","1681","1682","1683","1684","1685","1686","1687","1688","1689","1690","1691","1692","1693","1694","1695","1696","1697","1698","1699","1700","1701","1702","1703","1704","1705","1706","1707","1708","1709","1710","1711","1712","1713","1714","1715","1716","1717","1718","1719","1720","1721","1722","1723","1724","1725","1726","1727","1728","1729","1730","1731","1732","1733","1734","1735","1736","1737","1738","1739","1740","1741","1742","1743","1744","1745","1746","1747","1748","1749","1750","1751","1752","1753","1754","1755","1756","1757","1758","1759","1760","1761","1762","1763","1764","1765","1766","1767","1768","1769","1770","1771","1772","1773","1774","1775","1776","1777","1778","1779","1780","1781","1782","1783","1784","1785","1786","1787","1788","1789","1790","1791","1792","1793","1794","1795","1796","1797","1798","1799","1800","1801","1802","1803","1804","1805","1806","1807","1808","1809","1810","1811","1812","1813","1814","1815","1816","1817","1818","1819","1820","1821","1822","1823","1824","1825","1826","1827","1828","1829","1830","1831","1832","1833","1834","1835","1836","1837","1838","1839","1840","1841","1842","1843","1844","1845","1846","1847","1848","1849","1850","1851","1852","1853","1854","1855","1856","1857","1858","1859","1860","1861","1862","1863","1864","1865","1866","1867","1868","1869","1870","1871","1872","1873","1874","1875","1876","1877","1878","1879","1880","1881","1882","1883","1884","1885","1886","1887","1888","1889","1890","1891","1892","1893","1894","1895","1896","1897","1898","1899","1900","1901","1902","1903","1904","1905","1906","1907","1908","1909","1910","1911","1912","1913","1914","1915","1916","1917","1918","1919","1920","1921","1922","1923","1924","1925","1926","1927","1928","1929","1930","1931","1932","1933","1934","1935","1936","1937","1938","1939","1940","1941","1942","1943","1944","1945","1946","1947","1948","1949","1950","1951","1952","1953","1954","1955","1956","1957","1958","1959","1960","1961","1962","1963","1964","1965","1966","1967","1968","1969","1970","1971","1972","1973","1974","1975","1976","1977","1978","1979","1980","1981","1982","1983","1984","1985","1986","1987","1988","1989","1990","1991","1992","1993","1994","1995","1996","1997","1998","1999","2000","2001","2002","2003","2004","2005","2006","2007","2008","2009","2010","2011","2012","2013","2014","2015","2016","2017","2018","2019","2020","2021","2022","2023","2024","2025","2026","2027","2028","2029","2030","2031","2032","2033","2034","2035","2036","2037","2038","2039","2040","2041","2042","2043","2044","2045","2046","2047","2048","2049","2050","2051","2052","2053","2054","2055","2056","2057","2058","2059","2060","2061","2062","2063","2064","2065","2066","2067","2068","2069","2070","2071","2072","2073","2074","2075","2076","2077","2078","2079","2080","2081","2082","2083","2084","2085","2086","2087","2088","2089","2090","2091","2092","2093","2094","2095","2096","2097","2098","2099","2100","2101","2102","2103","2104","2105","2106","2107","2108","2109","2110","2111","2112","2113","2114","2115","2116","2117","2118","2119","2120","2121","2122","2123","2124","2125","2126","2127","2128","2129","2130","2131","2132","2133","2134","2135","2136","2137","2138","2139","2140","2141","2142","2143","2144","2145","2146","2147","2148","2149","2150","2151","2152","2153","2154","2155","2156","2157","2158","2159","2160","2161","2162","2163","2164","2165","2166","2167","2168","2169","2170","2171","2172","2173","2174","2175","2176","2177","2178","2179","2180","2181","2182","2183","2184","2185","2186","2187","2188","2189","2190","2191","2192","2193","2194","2195","2196","2197","2198","2199","2200","2201","2202","2203","2204","2205","2206","2207","2208","2209","2210","2211","2212","2213","2214","2215","2216","2217","2218","2219","2220","2221","2222","2223","2224","2225","2226","2227","2228","2229","2230","2231","2232","2233","2234","2235","2236","2237","2238","2239","2240","2241","2242","2243","2244","2245","2246","2247","2248","2249","2250","2251","2252","2253","2254","2255","2256","2257","2258","2259","2260","2261","2262","2263","2264","2265","2266","2267","2268","2269","2270","2271","2272","2273","2274","2275","2276","2277","2278","2279","2280","2281","2282","2283","2284","2285","2286","2287","2288","2289","2290","2291","2292","2293","2294","2295","2296","2297","2298","2299","2300","2301","2302","2303","2304","2305","2306","2307","2308","2309","2310","2311","2312","2313","2314","2315","2316","2317","2318","2319","2320","2321","2322","2323","2324","2325","2326","2327","2328","2329","2330","2331","2332","2333","2334","2335","2336","2337","2338","2339","2340","2341","2342","2343","2344","2345","2346","2347","2348","2349","2350","2351","2352","2353","2354","2355","2356","2357","2358","2359","2360","2361","2362","2363","2364","2365","2366","2367","2368","2369","2370","2371","2372","2373","2374","2375","2376","2377","2378","2379","2380","2381","2382","2383","2384","2385","2386","2387","2388","2389","2390","2391","2392","2393","2394","2395","2396","2397","2398","2399","2400","2401","2402","2403","2404","2405","2406","2407","2408","2409","2410","2411","2412","2413","2414","2415","2416","2417","2418","2419","2420","2421","2422","2423","2424","2425","2426","2427","2428","2429","2430","2431","2432","2433","2434","2435","2436","2437","2438","2439","2440","2441","2442","2443","2444","2445","2446","2447","2448","2449","2450","2451","2452","2453","2454","2455","2456","2457","2458","2459","2460","2461","2462","2463","2464","2465","2466","2467","2468","2469","2470","2471","2472","2473"],["N004AA","N005AA","N008AA","N010AA","N013AA","N014AA","N015AA","N020AA","N024AA","N026AA","N028AA","N029AA","N032AA","N103SY","N10575","N105SY","N106SY","N107SY","N108SY","N109SY","N110SY","N11107","N11109","N11164","N11165","N11181","N11184","N11189","N11192","N11194","N113SY","N114SY","N11535","N11536","N11539","N11544","N11547","N11548","N11551","N11565","N116SY","N117SY","N118SY","N119SY","N120SY","N12135","N12157","N12160","N12163","N12166","N12167","N12172","N12175","N12195","N121SY","N12201","N12216","N12238","N122SY","N124SY","N12540","N12552","N12563","N12564","N12567","N12569","N125SY","N12754","N127SY","N128SY","N12900","N12921","N12922","N12996","N130SY","N13118","N13161","N131EV","N131SY","N13248","N132EV","N132SY","N133EV","N133SY","N134EV","N134SY","N13538","N13550","N13553","N13566","N135EV","N135SY","N136EV","N136SY","N13716","N13718","N13720","N137EV","N138EV","N138SY","N13903","N13908","N13913","N13914","N13975","N13978","N13979","N13988","N13989","N13992","N13995","N139SY","N140SY","N14105","N14143","N14148","N14158","N14162","N14168","N14173","N14174","N14180","N14188","N141SY","N14203","N14214","N14219","N14228","N14230","N14235","N14240","N14242","N14249","N14250","N142SY","N143SY","N144SY","N14542","N14543","N14558","N14562","N14568","N14570","N14573","N145SY","N146PQ","N146SY","N14731","N14735","N147PQ","N148SY","N14902","N14904","N14905","N14907","N14916","N14920","N14923","N14977","N14991","N14993","N14998","N149SY","N150SY","N151SY","N152SY","N153PQ","N153SY","N154SY","N15555","N15572","N15574","N156SY","N157SY","N15910","N15912","N15980","N15983","N15985","N15986","N160SY","N16149","N16170","N16178","N16183","N161PQ","N161SY","N16217","N16234","N162PQ","N162SY","N163SY","N164SY","N16541","N16546","N16559","N16561","N16571","N165SY","N166PQ","N166SY","N16701","N16703","N16713","N16732","N167SY","N168SY","N16911","N16918","N16919","N16976","N16981","N16987","N16999","N170PQ","N17108","N17115","N17159","N17169","N17185","N17196","N17229","N17233","N17244","N17560","N176AA","N176PQ","N17730","N17752","N17753","N17984","N181PQ","N18223","N18243","N18556","N18557","N186PQ","N187PQ","N19554","N195PQ","N197PQ","N200PQ","N200SY","N200WN","N201FR","N201LV","N202FR","N202SY","N202WN","N203FR","N203SY","N203WN","N204SY","N204WN","N205FR","N205SY","N205WN","N206FR","N206SY","N206WN","N207FR","N207SY","N207WN","N208SY","N208WN","N209SY","N209WN","N210FR","N210WN","N21154","N21197","N211FR","N211SY","N211WN","N212WN","N213FR","N213SY","N213WN","N214FR","N214WN","N21537","N215WN","N216FR","N216WR","N217JC","N218FR","N218WN","N219FR","N219WN","N220FR","N220WN","N221FR","N221WN","N222WN","N223FR","N223WN","N224WN","N225WN","N226WN","N227FR","N227WN","N228FR","N228PQ","N228WN","N22909","N229FR","N229WN","N230FR","N230WN","N23139","N231WN","N232FR","N232PQ","N232WN","N233FR","N233LV","N234WN","N235FR","N235WN","N236FR","N236WN","N23707","N23721","N237WN","N238FR","N238WN","N239WN","N240WN","N24103","N241WN","N24202","N24211","N24224","N242WN","N243WN","N244WN","N245WN","N246LV","N24706","N24729","N247WN","N248WN","N249WN","N250WN","N25134","N251WN","N25201","N252WN","N253WN","N254WN","N255WN","N256WN","N25705","N257WN","N258WN","N259WN","N260WN","N261WN","N26210","N26215","N26226","N26232","N262WN","N263WN","N264LV","N26545","N26549","N265WN","N266WN","N267WN","N268WN","N269WN","N270WN","N27152","N27190","N271LV","N27200","N27205","N27239","N27246","N272WN","N273WN","N27421","N274WN","N275WN","N276WN","N27722","N27724","N277WN","N278WN","N279WN","N280WN","N281WN","N282WN","N283WN","N28457","N28478","N284WN","N285WN","N286WN","N287WN","N288WN","N289CT","N290WN","N291WN","N292WN","N293WN","N294WN","N295WN","N296WN","N297WN","N298WN","N29906","N29917","N299WN","N301DQ","N301FR","N301NB","N302NB","N303DQ","N303FR","N304FR","N305FR","N307DQ","N307FR","N308FR","N309FR","N309US","N310DN","N310FR","N310NW","N311FR","N312FR","N312US","N313FR","N314DN","N314NB","N314US","N315DN","N315NB","N315US","N316FR","N316NB","N317NB","N317US","N318NB","N318US","N319NB","N319US","N320NB","N320US","N321NB","N321US","N322DN","N322NB","N322US","N323NB","N323US","N324NB","N324US","N325NB","N326NB","N326US","N327NB","N327NW","N328NB","N328NW","N329NB","N329NW","N330NB","N330NW","N331NB","N331NW","N33203","N33209","N33262","N33264","N33266","N33286","N33292","N332NB","N332NW","N333NB","N333NW","N334NB","N334NW","N335NB","N335NW","N336NB","N336NW","N33714","N337NB","N337NW","N338NB","N339NB","N339NW","N340NB","N34110","N341NB","N341NW","N34282","N342NB","N342NW","N343NB","N343NW","N34455","N344NB","N344NW","N345NW","N347NB","N347NW","N348NW","N349NB","N349NW","N350NA","N351NB","N351NW","N35204","N35236","N35260","N35271","N352NB","N353NW","N354NB","N354SW","N357NB","N357NW","N357SW","N358NB","N358SW","N359NB","N359NW","N359SW","N360NB","N360NW","N360SW","N361NB","N361NW","N361SW","N36207","N36272","N362NB","N362NW","N362SW","N363NB","N363NW","N363SW","N36444","N36447","N36469","N36476","N364NB","N364NW","N365NB","N365NW","N366NB","N366NW","N367NW","N368NB","N368NW","N36915","N369NB","N369NW","N370NB","N370NW","N371CA","N371DA","N371NB","N371NW","N37252","N37263","N37267","N37273","N37274","N37281","N37287","N37290","N372DA","N372NW","N3730B","N3731T","N3732J","N3733Z","N3734B","N3735D","N3736C","N3737C","N3738B","N3739P","N373DA","N373NW","N3740C","N37419","N3741S","N37422","N3742C","N37434","N37437","N3743H","N3744F","N3745B","N37462","N37466","N37468","N3746H","N37470","N37471","N37474","N3747D","N3748Y","N3749D","N374DA","N374NW","N374SW","N3750D","N3751B","N3752","N3753","N3754A","N3755D","N3756","N3757D","N3758Y","N3759","N375DA","N375NC","N3760C","N3761R","N3762Y","N3763D","N3764D","N3765","N3766","N3767","N3768","N3769L","N376DA","N376NW","N376SW","N37700","N3771K","N3772H","N3773D","N377DA","N377NW","N378DA","N378NW","N378SW","N379DA","N379SW","N380DA","N380SW","N381DN","N38257","N38268","N382DA","N382SW","N383DN","N383SW","N38417","N38443","N38454","N38459","N384DA","N384SW","N385DN","N385SW","N386DA","N386SW","N387DA","N387SW","N388DA","N388SW","N389DA","N389SW","N390DA","N390SW","N391CA","N391DA","N391SW","N392DA","N392SW","N393DA","N39415","N39416","N39423","N39463","N39475","N394DA","N394SW","N395DN","N395SW","N396DA","N396SW","N39728","N397DA","N397SW","N398CA","N398DA","N398SW","N399DA","N399WN","N3AAAA","N3ACAA","N3ADAA","N3AFAA","N3AHAA","N3AKAA","N3ALAA","N3APAA","N3ARAA","N3ASAA","N3ATAA","N3AVAA","N3AWAA","N3AXAA","N3AYAA","N3BAAA","N3BFAA","N3BJAA","N3BKAA","N3BSAA","N3BTAA","N3BUAA","N3BWAA","N3BYAA","N3CAAA","N3CCAA","N3CGAA","N3CJAA","N3CKAA","N3CLAA","N3CNAA","N3CRAA","N3CTAA","N3CUAA","N3CVAA","N3CWAA","N3DAAA","N3DBAA","N3DCAA","N3DPAA","N3DRAA","N3DTAA","N3DXAA","N3DYAA","N3EAAA","N3ECAA","N3EEAA","N3EFAA","N3EJAA","N3ELAA","N3EMAA","N3ENAA","N3EPAA","N3ERAA","N3EUAA","N3EVAA","N3EWAA","N3EXAA","N3EYAA","N3FCAA","N3FGAA","N3FHAA","N3FKAA","N3FLAA","N3FMAA","N3FPAA","N3FRAA","N3FSAA","N3FVAA","N3FXAA","N3FYAA","N3GAAA","N3GBAA","N3GCAA","N3GDAA","N3GFAA","N3GLAA","N3GTAA","N3GWAA","N3GXAA","N3HBAA","N3HCAA","N3HDAA","N3HKAA","N3HMAA","N3HPAA","N3HRAA","N3HWAA","N3HYAA","N3JAAA","N3JBAA","N3JCAA","N3JEAA","N3JGAA","N3JHAA","N3JLAA","N3JMAA","N3JNAA","N3JSAA","N3JTAA","N3JVAA","N3JWAA","N3JXAA","N3JYAA","N3KBAA","N3KCAA","N3KDAA","N3KEAA","N3KGAA","N3KJAA","N3KRAA","N3KYAA","N3LCAA","N3LFAA","N3LNAA","N3LVAA","N3LYAA","N3MCAA","N3MGAA","N3MLAA","N3MWAA","N3MYAA","N3NCAA","N3NEAA","N3NGAA","N3NHAA","N3NJAA","N3NTAA","N3NUAA","N3NWAA","N3NYAA","N3PBAA","N3PGAA","N3PJAA","N3PLAA","N3PPAA","N3PSAA","N400WN","N401UA","N401WN","N402UA","N402WN","N403UA","N403WN","N404UA","N404WN","N405UA","N405WN","N406UA","N406WN","N407UA","N407WN","N408UA","N408WN","N409UA","N409WN","N410UA","N410WN","N411UA","N411WN","N412UA","N412WN","N413UA","N413WN","N414UA","N414WN","N415UA","N415WN","N416UA","N416WN","N417UA","N417WN","N418SW","N418UA","N418WN","N419UA","N419WN","N420UA","N420WN","N421LV","N421UA","N422UA","N422WN","N423UA","N423WN","N424AA","N424UA","N424WN","N425AA","N425LV","N425UA","N426AA","N426SW","N426UA","N426WN","N427SW","N427UA","N427WN","N428WN","N429SW","N429UA","N429WN","N430SW","N430UA","N430WN","N431SW","N431UA","N431WN","N432SW","N432UA","N432WN","N433LV","N433SW","N433UA","N434AA","N434UA","N434WN","N435UA","N435WN","N436UA","N436WN","N437AA","N437SW","N437UA","N437WN","N438AA","N438SW","N438UA","N438WN","N439AA","N439SW","N439UA","N439WN","N440LV","N440SW","N440UA","N441SW","N441UA","N441WN","N442SW","N442UA","N442WN","N443SW","N443UA","N443WN","N444UA","N444WN","N445SW","N445UA","N445WN","N446SW","N446UA","N446WN","N447SW","N447UA","N447WN","N448SW","N448UA","N448WN","N449SW","N449UA","N449WN","N450WN","N451UA","N451WN","N452SW","N452UA","N452WN","N453SW","N453UA","N453WN","N45440","N454SW","N454UA","N454WN","N455CA","N455SW","N455UA","N455WN","N456UA","N456WN","N457SW","N457UA","N457WN","N458UA","N458WN","N459SW","N459UA","N459WN","N460SW","N460UA","N460WN","N461SW","N461UA","N461WN","N462UA","N462WN","N463SW","N463UA","N463WN","N464UA","N464WN","N465UA","N465WN","N466SW","N466UA","N466WN","N467UA","N467WN","N468CA","N468UA","N468WN","N469UA","N469WN","N470UA","N470WN","N471UA","N472UA","N472WN","N473UA","N473WN","N47414","N474UA","N474WN","N475UA","N475WN","N476UA","N476WN","N477UA","N477WN","N478WN","N479UA","N479WN","N480UA","N480WN","N481UA","N481WN","N482WN","N483AA","N483UA","N483WN","N484WN","N485UA","N485WN","N486UA","N486WN","N487CA","N487UA","N487WN","N488UA","N488WN","N48901","N489AA","N489UA","N489WN","N490UA","N490WN","N491UA","N491WN","N492UA","N492WN","N493UA","N493WN","N494AA","N494CA","N494UA","N494WN","N495UA","N495WN","N496CA","N496UA","N496WN","N497UA","N497WN","N498CA","N498WN","N499WN","N4UBAA","N4WNAA","N4XBAA","N4XCAA","N4XDAA","N4XEAA","N4XFAA","N4XGAA","N4XHAA","N4XJAA","N4XKAA","N4XLAA","N4XMAA","N4XPAA","N4XRAA","N4XSAA","N4XTAA","N4XUAA","N4XVAA","N4XWAA","N4XXAA","N4XYAA","N4YAAA","N4YBAA","N4YCAA","N4YDAA","N4YEAA","N4YFAA","N4YLAA","N4YNAA","N4YPAA","N4YRAA","N4YSAA","N4YTAA","N4YUAA","N500WR","N501AA","N502AA","N506CA","N513AA","N514AA","N528AA","N53442","N535US","N537US","N538CA","N539US","N540US","N541AA","N54241","N544US","N54711","N547AA","N547US","N548AA","N548CA","N548US","N549CA","N549US","N550AA","N550WN","N551WN","N552NW","N552WN","N553WN","N554CA","N554NW","N554WN","N555AA","N555LV","N556WN","N557WN","N558WN","N559WN","N560WN","N561WN","N562WN","N563WN","N564WN","N565WN","N566AA","N566WN","N567WN","N568WN","N569WN","N570WN","N582CA","N590AA","N600QX","N600WN","N601WN","N601XJ","N602SW","N602XJ","N603AT","N603SK","N603SW","N604SK","N604SW","N605QX","N605SW","N606LR","N606SK","N606SW","N607AT","N607LR","N607SK","N607SW","N608AT","N608SK","N608SW","N609SK","N609SW","N610WN","N611SK","N612SW","N613SK","N613SW","N614SK","N614SW","N615SW","N616SW","N617SW","N61898","N618WN","N619SW","N620SW","N621SW","N622SW","N623SW","N624SW","N625CA","N625SW","N626SW","N627DL","N627SW","N62889","N62896","N628SW","N629BR","N629SW","N630SK","N630WN","N631SK","N631SW","N632SK","N632SW","N633DL","N633SK","N633SW","N634SW","N635DL","N635SW","N636WN","N637SW","N638SW","N639SW","N640SW","N641CA","N641SW","N642WN","N643SW","N644SW","N645SW","N646SW","N647SW","N648SW","N650SW","N651SW","N652SW","N653SW","N654SW","N655WN","N656SW","N65832","N659BR","N659DL","N667DN","N668DN","N669DN","N6700","N6704Z","N6706Q","N670DN","N6712B","N6714Q","N6715C","N6716C","N672DL","N673DL","N675BR","N676CA","N676DL","N67815","N678CA","N679CA","N679DA","N682DA","N683DA","N68452","N685BR","N689CA","N689DL","N690DL","N691CA","N692CA","N693CA","N693DL","N695CA","N695DL","N696DL","N700GS","N701AA","N701FR","N701GS","N702AA","N702BR","N703SW","N704FR","N704SW","N705FR","N705SW","N706FR","N706SW","N707EV","N707SA","N708EV","N708FR","N708SW","N709EV","N709FR","N709SW","N710EV","N710FR","N710SW","N711FR","N711HK","N712EV","N712SW","N713AA","N713EV","N713FR","N713SW","N714CB","N714FR","N715FR","N715SW","N716EV","N716FR","N716SW","N717EV","N717FR","N717JL","N717SA","N718EV","N718FR","N718SW","N719EV","N719FR","N719SW","N720EV","N720FR","N720WN","N722EV","N723EV","N723SW","N724EV","N724SW","N725AA","N725SW","N726SW","N727SW","N728SW","N729SW","N730EV","N730SW","N731SA","N73251","N73256","N73259","N73270","N73275","N73276","N73278","N73283","N73291","N732AA","N732SW","N733SA","N73406","N734SA","N735SA","N736SA","N737JW","N738CB","N738EV","N739GB","N740EV","N740SW","N741EV","N741SA","N742AA","N742SW","N743SW","N744AA","N744EV","N744SW","N745AA","N745SW","N746AA","N746SK","N746SW","N747AA","N747SA","N748AA","N748EV","N748SK","N748SW","N749AA","N749SW","N750AA","N750EV","N750SA","N751AA","N751EV","N751SW","N752AA","N752EV","N752SK","N752SW","N753AA","N753EV","N75425","N75426","N75428","N75429","N75435","N75436","N754AA","N754EV","N754SW","N755AA","N755EV","N755SA","N756AA","N756SA","N757AA","N757LV","N758AA","N758EV","N758SW","N759EV","N759GS","N760AA","N760EV","N760SW","N761ND","N761RR","N76288","N762AA","N762SK","N762SW","N763AA","N763SK","N763SW","N764AA","N764SK","N764SW","N76502","N76503","N76504","N76505","N76508","N76514","N76515","N76516","N76517","N76533","N765AA","N765SK","N765SW","N766AA","N766SK","N766SW","N767AA","N767SK","N767SW","N768AA","N768SK","N768SW","N769AA","N769SW","N7701B","N7702A","N7703A","N7704B","N7705A","N7706A","N7707C","N7708E","N7709A","N770AA","N770SA","N770SK","N7710A","N7711N","N7712G","N7713A","N7714B","N7715E","N7716A","N7717D","N7718B","N7719A","N771SA","N771SK","N7720F","N7721E","N7722B","N7723E","N7724A","N7725A","N77261","N7726A","N7727A","N7728D","N7729A","N772SK","N772SW","N7730A","N7731A","N7732A","N7733B","N7734H","N7735A","N7736A","N7737E","N7738A","N7739A","N773SA","N773SK","N7740A","N7741C","N7742B","N77431","N7743B","N7744A","N7745A","N7746C","N7747C","N7748A","N7749B","N774SK","N774SW","N7750A","N77510","N7751A","N7752B","N77530","N77535","N77536","N77538","N775DE","N775SW","N776DE","N776WN","N777QC","N778SW","N779CA","N779SW","N780SW","N7811F","N7812G","N7813P","N7814B","N7815L","N7816B","N7817J","N7818L","N7819A","N781CA","N781WN","N7820L","N7821L","N7822A","N7823A","N7824A","N7825A","N7826B","N7827A","N78285","N7828A","N7829B","N782SA","N7830A","N7831B","N7832A","N7833A","N7834A","N7835A","N7836A","N7837A","N7838A","N7839A","N783CA","N783SW","N7840A","N7841A","N7842A","N7843A","N78448","N7844A","N7845A","N7846A","N7847A","N7848A","N7849A","N784SW","N78501","N78506","N7850B","N7851A","N78524","N7852A","N7853B","N7854B","N7855A","N7856A","N7857B","N7858A","N7859B","N785SW","N7860A","N7861J","N7862A","N7863A","N786SW","N7873A","N7874B","N7875A","N7876A","N7877H","N7878A","N7879A","N787SA","N7880D","N7881A","N7882B","N7883A","N7884G","N7885A","N788SA","N788SK","N789SW","N790SW","N791SW","N79279","N792SW","N793SA","N793SK","N794SW","N795SW","N796SW","N797MX","N798SW","N799SW","N800SK","N801AA","N801DZ","N801UA","N802AA","N802DN","N802SK","N802UA","N803AA","N803DN","N803SK","N803UA","N804AA","N804DN","N804SK","N804UA","N805AA","N805DN","N805SK","N805UA","N806AA","N806DN","N806SK","N806UA","N807AA","N807DN","N807SK","N807UA","N808AA","N808DN","N808UA","N809AA","N809CA","N809DN","N809SK","N809UA","N810AA","N810DN","N810SK","N810UA","N811DZ","N811UA","N812AA","N812DN","N812SK","N812UA","N813AA","N813DN","N813SK","N813UA","N814AA","N814DN","N814SK","N814UA","N815AA","N815DN","N815SK","N815UA","N816AA","N816DN","N816SK","N816UA","N817AA","N817DN","N817SK","N817UA","N818AA","N818DA","N818UA","N819AA","N819DN","N819UA","N820AA","N820DN","N820SK","N820UA","N821AA","N821DN","N821SK","N821UA","N822AA","N822DN","N822SK","N822UA","N823AA","N823DN","N823SK","N823UA","N824AA","N824DN","N824SK","N824UA","N825AA","N825DN","N825SK","N825UA","N826AA","N826DN","N826UA","N827AA","N827DN","N827UA","N828AA","N828UA","N829AA","N829DN","N829UA","N8301J","N8302F","N8303R","N8305E","N8306H","N8307K","N8308K","N8309C","N830AA","N830DN","N830UA","N8310C","N8311Q","N8312C","N8313F","N8314L","N8315C","N8316H","N8317M","N8318F","N8319F","N831AA","N831DN","N831UA","N8320J","N8321D","N8322X","N8323C","N8324A","N8325D","N8326F","N8327A","N8328A","N8329B","N832AA","N832DN","N832UA","N833AA","N833DN","N833UA","N834AA","N834DN","N834UA","N835AA","N835DN","N835UA","N836AA","N836DN","N836UA","N837AA","N837DN","N837UA","N838AA","N838DN","N838UA","N839AA","N839DN","N839UA","N840AA","N840DN","N840UA","N841DN","N841UA","N842DN","N842UA","N843DN","N843UA","N844DN","N844UA","N845DN","N845UA","N846DN","N846UA","N847DN","N847UA","N848DN","N848UA","N849DN","N849UA","N8501V","N8502Z","N8503A","N8504G","N8507C","N8508W","N8509U","N850DN","N850UA","N8510E","N8511K","N8512U","N8513F","N8514F","N8515X","N8517F","N8518R","N8519R","N851DN","N851UA","N8520Q","N8522P","N8523W","N8524Z","N8525S","N8526W","N8527Q","N8528Q","N8529Z","N852DN","N852UA","N8530W","N8531Q","N8532S","N8533S","N8534Z","N8535S","N8536Z","N8537Z","N8538V","N8539V","N853DN","N853UA","N8540V","N8541W","N8542Z","N8543Z","N8544Z","N8545V","N8546V","N8547V","N8548P","N854DN","N854UA","N8550Q","N8551Q","N8556Z","N855AA","N855DN","N855UA","N856DN","N857DZ","N858DZ","N859DN","N8600F","N8601C","N8602F","N8603F","N8604K","N8605E","N8606C","N8607M","N8608N","N8609A","N860DN","N8610A","N8611F","N8612K","N8613K","N8614M","N8615E","N8616C","N8617E","N8618N","N8619F","N861DN","N8620H","N8621A","N8622A","N8623F","N8624J","N8625A","N8626B","N8627B","N8628A","N8629A","N862DN","N8630B","N8631A","N8632A","N8633A","N8634A","N8635F","N8637A","N8638A","N8639B","N863DN","N8640D","N8641B","N8642E","N8643A","N8644C","N8645A","N8646B","N8647A","N8648A","N8649A","N864DN","N8650F","N8651A","N8652B","N86534","N8653A","N8654B","N8655D","N8656B","N8657B","N8658A","N8659D","N865DN","N8660A","N8661A","N8662F","N8663A","N8664J","N8665D","N8667D","N8668A","N8669B","N866DN","N8670A","N8671D","N8672F","N8673F","N8674B","N8675A","N8676A","N8677A","N8678E","N8679A","N867DN","N8680C","N8681M","N8682B","N8683D","N8684F","N8685B","N8686A","N8687A","N8688J","N8689C","N868AS","N868CA","N868DN","N8690A","N8691A","N8692F","N8693A","N8694E","N8695D","N8696E","N8697C","N8698B","N8699A","N869DN","N8705Q","N8706W","N8707P","N8708Q","N8709Q","N870DN","N8710M","N8711Q","N8712L","N8713M","N8714Q","N8715Q","N8716B","N871DN","N872DN","N873AS","N873DN","N874DN","N87512","N875AS","N875DN","N876DN","N877DN","N878AS","N878DN","N879DN","N880AS","N880DN","N881AS","N881DN","N8828D","N882AS","N882DN","N883DN","N884DN","N885AS","N885DN","N886AS","N886DN","N8884E","N889AS","N889UA","N8903A","N890UA","N891AT","N891UA","N8923A","N892AT","N892UA","N8932C","N8933B","N893UA","N8942A","N894AT","N894UA","N895AT","N895UA","N8965E","N8968E","N896AT","N896SK","N896UA","N897SK","N8982A","N898SK","N899AT","N899SK","N900DE","N900WN","N901DA","N901DE","N901WN","N902DA","N902DE","N902FR","N902WN","N903DA","N903DE","N903WN","N904DA","N904DE","N904DL","N904WN","N905DA","N905DE","N905DL","N905WN","N906AT","N906DA","N906DE","N906DL","N906EV","N906SW","N906WN","N907DA","N907DE","N907DL","N907EV","N907SW","N907WN","N908DA","N908DE","N908DL","N908WN","N909DA","N909DE","N909SW","N909WN","N910AT","N910DE","N910DL","N910DN","N910EV","N910FR","N910WN","N911DA","N911DE","N911DL","N912DE","N912DL","N912DN","N912EV","N912WN","N913DE","N913DL","N913DN","N913EV","N913WN","N914DE","N914DL","N914DN","N914EV","N914WN","N915DE","N915DL","N915DN","N915EV","N915SW","N915WN","N916DE","N916DL","N916DN","N916EV","N916WN","N917DE","N917DL","N917DN","N917EV","N917WN","N918DE","N918DH","N918DL","N918WN","N919DE","N919DL","N919DN","N919EV","N919FR","N919WN","N920AT","N920DE","N920DL","N920DN","N920EV","N920FR","N920WN","N921AT","N921DL","N921DN","N921EV","N921FR","N921WN","N922AT","N922DL","N922DX","N922EV","N922FR","N922WN","N923AT","N923DL","N923DN","N923EV","N923FR","N923WN","N924DL","N924DN","N924WN","N925AT","N925DL","N925DN","N925SW","N925WN","N926AT","N926DH","N926DL","N926EV","N926FR","N926WN","N927AT","N927DA","N927DN","N927FR","N927WN","N928AT","N928DL","N928DN","N928FR","N928WN","N929AT","N929DL","N929DN","N929SW","N929WN","N930AT","N930DL","N930DN","N930WN","N931DL","N931DN","N931EV","N931WN","N932AT","N932DL","N932DN","N932WN","N933AT","N933DL","N933DN","N933EV","N933WN","N934AT","N934DL","N934DN","N934WN","N935AT","N935DL","N935DN","N935FR","N935SW","N935WN","N936AT","N936DL","N936DN","N936EV","N936SW","N936WN","N937AT","N937DL","N937DN","N937WN","N938AT","N938DL","N938DN","N938FR","N938WN","N939AT","N939DL","N939DN","N939FR","N939WN","N940AT","N940DL","N940DN","N940WN","N941DL","N941DN","N941FR","N941WN","N942AT","N942DL","N942DN","N942WN","N943AT","N943DL","N943DN","N943FR","N943WN","N944AA","N944AT","N944DL","N944DN","N944SW","N944WN","N945AT","N945DL","N945DN","N945WN","N946AA","N946AT","N946DL","N946DN","N946SW","N946WN","N947AA","N947DL","N947DN","N947FR","N947WN","N948AT","N948DL","N948DN","N948FR","N948WN","N949AA","N949AT","N949DL","N949DN","N949FR","N949WN","N950AA","N950AT","N950DL","N950DN","N950WN","N951AA","N951AT","N951DL","N951DN","N951FR","N951WN","N952AA","N952AT","N952DL","N952DN","N952FR","N952WN","N953AA","N953AT","N953DL","N953DN","N953FR","N953SW","N953WN","N954AA","N954AT","N954DL","N954DN","N954FR","N954SW","N954WN","N955AA","N955AT","N955DL","N955DN","N955WN","N956AA","N956AT","N956DL","N956DN","N956WN","N957AA","N957AT","N957DL","N957DN","N957SW","N957WN","N958AA","N958AT","N958DL","N958DN","N958WN","N959AA","N959AT","N959DL","N959DN","N959WN","N960AT","N960DL","N960DN","N960WN","N961AA","N961AT","N961DL","N961DN","N961SW","N961WN","N962DL","N962DN","N962SW","N962WN","N963AA","N963AT","N963DL","N963DN","N963WN","N964AT","N964DL","N964DN","N964SW","N964WN","N965AT","N965DL","N965DN","N965WN","N966AT","N966DL","N966WN","N967AA","N967AT","N967DL","N967WN","N968AT","N968DL","N968WN","N969AT","N969DL","N969WN","N970AT","N970DL","N971AT","N971DL","N972AT","N972DL","N973DL","N974AA","N974AT","N974DL","N975AT","N975DL","N976DL","N977AT","N977DL","N978AT","N978DL","N978SW","N979AT","N979DL","N980DL","N981DL","N982AT","N982DL","N983AT","N983DL","N983SW","N984DL","N985AT","N985DL","N986DL","N986SW","N987AT","N987DL","N987DN","N988AT","N988DL","N988DN","N989AT","N989DL","N989DN","N990AT","N990DL","N991AT","N991DL","N992AT","N992DL","N993AT","N993DL","N994AT","N994DL","N995AT","N995DL","N996AT","N996DL","N997DL","N998AT","N998DL","N999DN",null],[6,2,8,2,2,2,4,2,4,2,6,2,4,20,40,8,20,16,6,16,12,30,14,20,31,2,22,18,20,2,14,18,26,36,22,30,27,29,27,50,18,10,18,14,8,28,6,14,24,12,10,20,28,23,14,14,2,2,14,12,26,49,23,45,46,25,12,2,16,16,32,20,8,4,12,32,9,6,22,10,8,20,4,18,18,18,10,31,34,36,10,10,8,14,2,6,4,6,12,12,20,39,26,20,14,30,36,24,19,44,6,4,24,16,2,30,17,14,14,26,22,22,13,16,20,10,2,6,8,4,8,4,6,8,12,12,18,20,31,38,39,32,30,20,16,4,14,4,6,6,18,32,46,44,43,30,6,8,36,32,50,6,18,14,12,20,6,8,30,27,26,34,26,6,18,16,33,28,6,22,12,24,14,22,4,36,8,8,2,8,24,12,14,34,32,26,32,36,22,6,4,8,4,4,2,6,22,20,13,18,25,36,14,4,18,8,20,42,24,22,13,4,4,6,12,2,4,4,4,4,32,12,8,6,45,33,22,34,37,14,10,8,20,40,52,32,56,18,33,26,16,38,18,36,38,18,42,44,10,32,28,2,36,8,44,4,22,34,52,2,30,40,18,40,30,46,8,22,10,56,29,48,42,26,51,50,27,68,40,42,44,62,36,24,40,40,40,24,48,27,31,44,6,44,29,38,57,32,34,12,44,44,8,24,36,43,36,34,34,26,38,2,2,47,28,44,30,37,32,33,6,4,6,32,43,22,36,38,2,8,34,39,40,44,16,52,2,44,38,43,44,33,2,58,36,18,26,32,4,2,4,2,40,29,38,26,30,38,34,48,37,26,34,16,22,40,31,2,6,2,36,37,2,48,35,40,4,2,30,52,42,62,24,37,32,2,2,48,40,52,26,36,28,29,31,48,34,28,46,44,36,32,24,26,42,2,14,6,2,2,8,12,12,2,16,14,10,2,2,22,2,14,2,4,8,2,8,2,2,4,2,6,6,12,9,2,3,4,3,2,6,8,6,2,2,7,12,8,6,7,2,2,4,8,7,2,7,4,12,2,8,8,4,6,6,8,4,6,2,2,6,8,8,20,2,14,8,2,2,4,2,2,5,4,4,4,10,15,6,2,2,4,4,2,2,2,2,4,14,2,6,7,8,8,6,4,7,2,4,6,2,12,6,8,2,2,4,2,4,6,8,20,12,10,9,2,4,8,10,6,6,2,2,11,6,14,12,2,4,2,2,2,10,8,3,2,2,8,2,8,30,6,6,6,4,6,14,4,12,4,4,2,6,6,8,2,2,30,8,18,18,18,16,24,14,22,4,18,16,22,8,20,4,24,6,12,4,4,11,4,10,2,2,2,16,2,2,8,20,18,15,6,12,2,4,19,18,22,14,24,18,18,12,18,16,10,9,12,8,20,14,6,20,10,18,18,10,4,46,28,24,14,26,18,8,14,6,40,22,26,16,44,9,8,4,12,56,20,44,2,8,4,4,10,48,16,51,10,37,12,36,12,48,24,43,4,40,12,22,61,12,54,8,2,2,2,6,2,12,36,10,40,18,74,4,10,58,12,17,40,12,36,2,2,2,2,4,2,4,4,2,4,6,2,2,2,4,2,4,2,4,2,2,2,2,4,2,4,4,2,2,2,2,2,2,2,8,6,2,4,2,2,2,2,4,2,2,2,4,2,1,6,2,6,4,2,2,2,2,2,2,2,4,2,2,2,2,2,2,4,2,2,2,2,2,2,2,2,4,2,2,4,2,2,4,4,2,2,2,4,2,2,6,6,6,2,2,4,2,2,6,2,4,2,4,2,6,2,2,6,2,4,2,2,2,2,2,2,2,4,2,4,2,2,2,2,2,4,2,2,2,2,2,4,2,2,2,2,2,40,10,28,2,42,6,29,2,47,3,37,8,45,4,30,2,40,2,14,6,36,10,32,16,26,6,44,4,40,4,43,6,34,2,43,24,6,44,5,32,4,26,32,4,2,46,6,27,56,4,32,56,44,2,58,10,14,35,18,8,47,34,6,4,30,8,8,52,14,4,25,20,8,48,40,12,4,22,2,33,10,34,4,39,24,4,6,46,26,10,10,38,42,10,2,36,40,12,6,16,4,46,24,6,35,10,6,20,2,27,14,6,23,20,6,46,14,4,48,6,14,30,12,2,36,52,4,34,10,6,36,6,2,44,2,16,6,34,26,6,2,23,8,32,10,8,30,6,32,12,6,50,12,2,42,4,4,38,8,40,12,8,36,6,42,8,45,2,2,51,6,41,14,6,26,2,30,6,41,6,2,38,2,47,6,10,44,6,51,4,46,4,28,34,6,30,10,32,4,20,28,16,2,44,36,10,50,2,41,18,6,46,6,47,33,2,6,18,6,33,6,22,8,26,14,40,8,10,4,37,6,38,14,6,50,8,20,8,32,33,34,32,14,39,54,38,52,40,32,58,32,36,50,42,38,31,38,33,38,45,29,34,48,60,58,47,46,46,46,36,46,37,51,41,66,18,26,22,8,30,47,37,2,2,4,8,4,2,34,4,2,6,27,4,50,20,2,22,4,37,48,22,4,44,37,22,2,16,36,36,34,25,34,30,36,30,31,26,34,50,28,28,32,32,36,30,12,46,16,20,66,10,50,13,4,6,52,14,44,6,50,8,4,44,4,10,8,38,4,12,6,8,4,26,12,4,12,6,4,24,22,22,10,2,14,43,53,28,44,42,30,2,28,40,2,37,2,2,56,14,48,12,35,2,38,14,39,6,16,44,55,2,62,46,60,52,58,50,4,58,33,55,55,44,64,56,21,18,28,20,14,18,36,40,2,10,4,2,4,2,2,2,2,4,2,2,4,2,2,2,14,6,6,2,12,16,2,4,4,2,21,12,4,2,16,31,20,2,16,4,2,32,2,2,31,2,8,38,4,24,4,41,2,51,8,36,8,2,18,2,4,34,20,2,52,6,36,4,44,1,12,2,24,40,4,4,41,10,2,32,6,2,2,48,14,6,32,12,4,42,17,6,44,8,14,48,13,36,2,25,40,28,36,22,16,42,33,4,2,6,4,4,2,8,4,2,2,45,51,2,56,40,28,42,28,6,32,8,42,8,30,22,24,40,54,10,40,22,40,40,2,37,32,26,44,6,2,40,24,38,36,24,34,16,8,39,26,8,2,42,50,10,4,2,4,2,4,2,36,8,34,37,6,26,40,30,18,24,32,8,36,10,34,58,6,28,6,37,6,30,24,30,38,28,36,46,12,39,10,2,6,6,2,2,1,2,2,6,56,4,28,34,10,28,29,10,52,44,10,38,22,20,28,42,18,36,40,38,26,37,28,31,32,18,36,38,28,49,36,34,28,16,32,30,46,14,48,40,30,22,38,14,2,24,32,34,34,6,41,36,34,32,36,42,38,32,42,32,30,30,10,37,36,42,4,30,38,29,22,32,42,31,8,44,32,2,50,30,6,6,4,12,2,66,2,44,36,60,14,30,58,31,26,31,28,28,20,36,24,31,16,30,30,34,38,40,30,32,45,46,10,34,48,25,28,46,41,20,40,27,32,38,22,32,14,50,40,23,30,42,2,37,20,37,32,48,42,50,8,8,51,42,2,52,24,38,20,14,28,14,26,35,12,16,2,6,26,32,41,28,35,50,47,28,38,31,36,67,26,31,34,32,1,28,26,52,8,44,56,2,34,28,32,24,10,46,10,38,10,14,38,4,20,10,26,10,26,10,36,4,24,14,15,6,14,18,31,4,14,12,24,8,20,16,39,2,6,26,4,6,14,12,31,12,24,16,12,6,44,4,12,21,44,4,18,16,41,4,15,12,64,8,26,18,40,4,10,16,42,8,24,18,46,14,10,24,4,4,31,4,22,8,28,6,36,6,36,8,6,14,46,6,28,12,20,8,12,8,44,6,28,12,48,6,12,42,10,12,38,8,40,10,18,22,12,10,18,16,20,17,10,34,6,2,20,14,12,18,16,18,14,16,12,20,30,4,18,26,18,22,25,30,26,20,31,28,24,44,12,12,52,12,14,32,4,14,44,4,12,34,8,16,28,6,14,33,12,8,44,6,8,48,12,12,8,14,8,8,4,12,4,10,6,12,12,10,10,12,10,14,10,14,14,14,20,22,16,18,17,2,24,16,19,18,10,28,18,18,20,18,14,18,16,18,14,14,24,6,20,19,12,6,6,6,10,4,9,18,2,4,12,12,12,4,20,14,12,6,12,8,10,6,4,4,12,12,2,4,2,2,10,18,5,10,4,4,26,21,10,18,12,17,12,23,22,20,6,10,24,14,12,24,26,20,22,13,26,4,22,32,28,20,30,24,20,20,14,14,8,16,20,26,14,22,20,24,18,16,2,20,36,20,18,26,18,12,12,16,22,6,18,18,20,2,22,20,24,26,20,24,24,8,26,12,20,20,24,12,18,26,18,4,7,18,26,20,16,28,16,27,30,15,6,14,14,26,26,22,24,16,26,10,20,12,26,4,19,30,22,24,29,14,16,34,26,20,6,8,4,4,6,10,14,6,14,8,16,6,4,2,8,4,4,2,14,2,22,6,12,8,16,7,10,2,6,10,4,28,20,4,6,8,17,10,24,4,12,12,11,12,17,2,6,18,14,6,16,24,3,4,6,8,12,2,20,8,6,24,2,22,28,22,4,18,28,42,16,45,14,12,30,32,38,7,28,32,8,20,21,42,13,27,40,42,4,14,14,30,2,14,34,8,22,34,2,12,24,14,34,30,34,18,26,2,42,6,36,14,28,16,58,41,32,30,20,16,24,16,10,24,18,32,22,20,22,28,22,20,12,32,22,20,18,18,2,42,32,34,16,10,34,20,46,16,4,20,20,16,30,32,30,26,14,10,52,20,10,36,44,24,2,54,51,4,32,14,2,4,37,8,22,17,10,38,23,8,36,10,12,60,52,22,16,40,6,16,6,2,28,8,12,32,4,32,43,6,30,22,36,52,12,26,12,15,61,14,26,6,2,58,4,26,12,21,24,12,2,55,4,18,10,32,6,36,14,2,45,4,26,18,42,2,18,19,2,22,38,4,32,16,4,6,35,12,34,12,36,8,24,23,18,34,4,18,14,14,56,6,21,14,24,26,10,68,38,8,31,22,36,10,39,18,58,48,10,4,14,12,16,28,8,20,22,34,16,4,26,10,4,25,4,21,12,42,54,6,30,10,42,18,4,6,26,30,22,58,8,10,34,18,40,4,6,26,22,62,44,8,8,32,18,46,30,4,2,28,18,54,10,36,6,4,34,16,4,2,28,12,8,32,14,43,8,10,34,22,53,2,6,30,20,4,39,2,4,32,10,62,6,5,20,11,28,8,26,4,36,10,8,24,16,2,46,27,18,2,45,8,4,26,10,38,4,30,20,6,19,6,20,18,36,2,28,48,2,10,28,40,6,28,38,8,22,36,4,22,8,28,2,16,18,2,6,22,4,32,32,10,32,6,42,4,10,24,28,16,5,32,8,47,2,20,2,29,33,4,4,37,9,6,27,4,6,29,6,4,19,6,42,6,30,4,24,8,25,6,22,2,24,27,6,39,37,75]],"container":"<table class=\"compact\">\n <thead>\n <tr>\n <th> <\/th>\n <th>Tail_Number<\/th>\n <th>Number<\/th>\n <\/tr>\n <\/thead>\n<\/table>","options":{"pageLength":5,"columnDefs":[{"className":"dt-right","targets":2},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false,"lengthMenu":[5,10,25,50,100]}},"evals":[],"jsHooks":[]}</script>
29 / 139

If you wanted options in terms of carriers when flying out of Columbus, which destination would give you maximal options?

cmhflights2017 %>%
filter(Origin == "CMH") %>%
group_by(Dest) %>%
summarise(unique.carriers = n_distinct(Reporting_Airline)) %>%
arrange(-unique.carriers) -> tab.04
<div id="htmlwidget-755eada48da4e8bb620f" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-755eada48da4e8bb620f">{"x":{"filter":"none","data":[["1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28"],["DEN","DTW","IAH","MCO","MSP","ORD","RSW","ATL","LAS","LAX","LGA","PHX","TPA","BNA","BOS","BWI","CLT","DAL","DCA","DFW","EWR","FLL","HOU","MDW","MSY","OAK","PHL","STL"],[4,3,3,3,3,3,3,2,2,2,2,2,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]],"container":"<table class=\"compact\">\n <thead>\n <tr>\n <th> <\/th>\n <th>Dest<\/th>\n <th>unique.carriers<\/th>\n <\/tr>\n <\/thead>\n<\/table>","options":{"pageLength":5,"columnDefs":[{"className":"dt-right","targets":2},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false,"lengthMenu":[5,10,25,50,100]}},"evals":[],"jsHooks":[]}</script>
30 / 139

How many flights to each destination, from Columbus, on February 15?

cmhflights2017 %>%
filter(Month == 1 & DayofMonth == 1 & Origin == "CMH") %>%
group_by(Dest) %>%
summarise(n.flts = n()) %>%
arrange(-n.flts) -> tab.05
<div id="htmlwidget-cb9ae2d1e80e3d15b7d6" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-cb9ae2d1e80e3d15b7d6">{"x":{"filter":"none","data":[["1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21"],["ATL","BWI","DFW","IAH","MCO","MDW","LAS","PHX","BNA","DCA","DEN","EWR","LAX","RSW","TPA","BOS","DAL","DTW","FLL","OAK","STL"],[7,4,4,4,4,4,3,3,2,2,2,2,2,2,2,1,1,1,1,1,1]],"container":"<table class=\"compact\">\n <thead>\n <tr>\n <th> <\/th>\n <th>Dest<\/th>\n <th>n.flts<\/th>\n <\/tr>\n <\/thead>\n<\/table>","options":{"pageLength":5,"columnDefs":[{"className":"dt-right","targets":2},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false,"lengthMenu":[5,10,25,50,100]}},"evals":[],"jsHooks":[]}</script>
31 / 139

What if we want the preceding by airline?

cmhflights2017 %>%
filter(Month == 1 & DayofMonth == 1 & Origin == "CMH") %>%
group_by(Dest, Reporting_Airline) %>%
summarise(n.flts = n()) %>%
arrange(-n.flts) -> tab.06
DT::datatable(tab.06, class = "compact")
<div id="htmlwidget-403ee707181ed83c47b8" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-403ee707181ed83c47b8">{"x":{"filter":"none","data":[["1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28"],["ATL","BWI","DFW","MDW","IAH","MCO","ATL","BNA","DCA","EWR","LAS","PHX","RSW","TPA","BOS","DAL","DEN","DEN","DTW","FLL","IAH","LAS","LAX","LAX","MCO","OAK","PHX","STL"],["DL","WN","AA","WN","EV","WN","WN","WN","WN","EV","WN","AA","WN","WN","WN","WN","F9","WN","OO","WN","OO","F9","AA","DL","F9","WN","WN","WN"],[5,4,4,4,3,3,2,2,2,2,2,2,2,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1]],"container":"<table class=\"compact\">\n <thead>\n <tr>\n <th> <\/th>\n <th>Dest<\/th>\n <th>Reporting_Airline<\/th>\n <th>n.flts<\/th>\n <\/tr>\n <\/thead>\n<\/table>","options":{"pageLength":5,"columnDefs":[{"className":"dt-right","targets":3},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false,"lengthMenu":[5,10,25,50,100]}},"evals":[],"jsHooks":[]}</script>
32 / 139

unnest (aka anti-nesting)

33 / 139

Every now and then the data you have to work with includes some list-columns -- columns that are like mini-data frames themselves. Suddenly R functions that would otherwise work will not execute as desired. The unnest function from the tidyverse unpacks these list-columns into a regular tibble. First a small data frame that shows a list-column.

library(tidyverse)
data_frame(
x = 1:3,
y = c("a", "d,e,f", "g,h")
) -> bad.df
bad.df
## # A tibble: 3 x 2
## x y
## <int> <chr>
## 1 1 a
## 2 2 d,e,f
## 3 3 g,h
34 / 139

If I want to clean up list-column y, I can do that as shown below.

bad.df %>%
transform(y = strsplit(y, ",")) %>%
unnest(y) -> good.df
good.df
## # A tibble: 6 x 2
## x y
## <int> <chr>
## 1 1 a
## 2 2 d
## 3 2 e
## 4 2 f
## 5 3 g
## 6 3 h
35 / 139

Now see a slightly more complicated list-column, with y made of named rows .... a and b, a with 1 value and b with 2 values.

data_frame(
x = 1:2,
y = list(a = 1, b = 3:4)
) -> badder.df
badder.df
## # A tibble: 2 x 2
## x y
## <int> <named list>
## 1 1 <dbl [1]>
## 2 2 <int [2]>

To clean this up we could do:

badder.df %>%
unnest(., .id = "name") -> gooder.df
gooder.df
## # A tibble: 3 x 3
## x y name
## <int> <dbl> <chr>
## 1 1 1 a
## 2 2 3 b
## 3 2 4 b
36 / 139
37 / 139

Most packages you use in R call for data to be tidy. What are tidy data?

  • Each variable forms a column.
  • Each observation forms a row.
  • Each type of observational unit forms a table.

Here are some untidy data

38 / 139

(a) The Pew survey data

pew <- read.delim(
file = "http://stat405.had.co.nz/data/pew.txt",
header = TRUE,
stringsAsFactors = FALSE,
check.names = F
)
head(pew)
## religion <$10k $10-20k $20-30k $30-40k $40-50k $50-75k $75-100k $100-150k
## 1 Agnostic 27 34 60 81 76 137 122 109
## 2 Atheist 12 27 37 52 35 70 73 59
## 3 Buddhist 27 21 30 34 33 58 62 39
## 4 Catholic 418 617 732 670 638 1116 949 792
## 5 Don’t know/refused 15 14 15 11 10 35 21 17
## 6 Evangelical Prot 575 869 1064 982 881 1486 949 723
## >150k Don't know/refused
## 1 84 96
## 2 74 76
## 3 53 54
## 4 633 1489
## 5 18 116
## 6 414 1529
39 / 139

(b) Here are some more examples

library(DSR)
table2
## # A tibble: 12 x 4
## country year key value
## <fct> <int> <fct> <int>
## 1 Afghanistan 1999 cases 745
## 2 Afghanistan 1999 population 19987071
## 3 Afghanistan 2000 cases 2666
## 4 Afghanistan 2000 population 20595360
## 5 Brazil 1999 cases 37737
## 6 Brazil 1999 population 172006362
## 7 Brazil 2000 cases 80488
## 8 Brazil 2000 population 174504898
## 9 China 1999 cases 212258
## 10 China 1999 population 1272915272
## 11 China 2000 cases 213766
## 12 China 2000 population 1280428583
40 / 139
table3
## # A tibble: 6 x 3
## country year rate
## <fct> <int> <chr>
## 1 Afghanistan 1999 745/19987071
## 2 Afghanistan 2000 2666/20595360
## 3 Brazil 1999 37737/172006362
## 4 Brazil 2000 80488/174504898
## 5 China 1999 212258/1272915272
## 6 China 2000 213766/1280428583
table4
## # A tibble: 3 x 3
## country `1999` `2000`
## <fct> <int> <int>
## 1 Afghanistan 745 2666
## 2 Brazil 37737 80488
## 3 China 212258 213766
table5
## # A tibble: 3 x 3
## country `1999` `2000`
## <fct> <int> <int>
## 1 Afghanistan 19987071 20595360
## 2 Brazil 172006362 174504898
## 3 China 1272915272 1280428583
41 / 139

What would a tidy version of the examples in (b) look like?

table1
## # A tibble: 6 x 4
## country year cases population
## <fct> <int> <int> <int>
## 1 Afghanistan 1999 745 19987071
## 2 Afghanistan 2000 2666 20595360
## 3 Brazil 1999 37737 172006362
## 4 Brazil 2000 80488 174504898
## 5 China 1999 212258 1272915272
## 6 China 2000 213766 1280428583

Here, we have

  • each country as a row per year
  • each column has a single variable
  • each cell has a single value

The tidyr package is designed to make messy data tidy in order to proceed with analysis. It has four primary functions that we will see in turn.

42 / 139

separate

43 / 139

You will run into variables that you want to break into smaller pieces. For instance, take table3 where rate is a ratio of two variables, cases and population, each of which should be in their own columns. How can we achieve that?

library(tidyr)
table3 %>%
separate(col = rate, into = c("cases", "population"), sep = "/",
remove = TRUE, convert = TRUE) -> s.table3
head(s.table3)
## # A tibble: 6 x 4
## country year cases population
## <fct> <int> <int> <int>
## 1 Afghanistan 1999 745 19987071
## 2 Afghanistan 2000 2666 20595360
## 3 Brazil 1999 37737 172006362
## 4 Brazil 2000 80488 174504898
## 5 China 1999 212258 1272915272
## 6 China 2000 213766 1280428583
44 / 139

unite

45 / 139

The unite command does the exact opposite of separate, combining columns

s.table3 %>%
unite("rate", cases, population, sep = ":", remove = FALSE) -> u.table3
head(u.table3)
## # A tibble: 6 x 5
## country year rate cases population
## <fct> <int> <chr> <int> <int>
## 1 Afghanistan 1999 745:19987071 745 19987071
## 2 Afghanistan 2000 2666:20595360 2666 20595360
## 3 Brazil 1999 37737:172006362 37737 172006362
## 4 Brazil 2000 80488:174504898 80488 174504898
## 5 China 1999 212258:1272915272 212258 1272915272
## 6 China 2000 213766:1280428583 213766 1280428583

Here you specify the name of the new column first, and then list the columns to be combined, specify the separator, and whether the original columns should be removed or not.

46 / 139

spread

47 / 139

Revisit table2

table2
## # A tibble: 12 x 4
## country year key value
## <fct> <int> <fct> <int>
## 1 Afghanistan 1999 cases 745
## 2 Afghanistan 1999 population 19987071
## 3 Afghanistan 2000 cases 2666
## 4 Afghanistan 2000 population 20595360
## 5 Brazil 1999 cases 37737
## 6 Brazil 1999 population 172006362
## 7 Brazil 2000 cases 80488
## 8 Brazil 2000 population 174504898
## 9 China 1999 cases 212258
## 10 China 1999 population 1272915272
## 11 China 2000 cases 213766
## 12 China 2000 population 1280428583
48 / 139

I want to flip this into a format where the data frame has two columns per country per year -- cases and population. How can I do this?

table2 %>%
spread(key, value) -> sp.table2
sp.table2
## # A tibble: 6 x 4
## country year cases population
## <fct> <int> <int> <int>
## 1 Afghanistan 1999 745 19987071
## 2 Afghanistan 2000 2666 20595360
## 3 Brazil 1999 37737 172006362
## 4 Brazil 2000 80488 174504898
## 5 China 1999 212258 1272915272
## 6 China 2000 213766 1280428583

Notice that the key column had two unique values -- cases and population -- so two new columns were created. Notice also that tidyr knew to populate the value of each of the new cells with what it finds in the value column. The result is what we call data in a wide format.

49 / 139

gather

50 / 139

gather does the opposite of spread, converting wide data into what we call the long form. Revisit table4

table4
## # A tibble: 3 x 3
## country `1999` `2000`
## <fct> <int> <int>
## 1 Afghanistan 745 2666
## 2 Brazil 37737 80488
## 3 China 212258 213766

Note that each country has two columns, one per year. ... This is untidy ...

51 / 139

... So we'll flip this tibble as follows:

table4 %>%
gather("year", "cases", 2:3) -> g.table4
head(g.table4)
## # A tibble: 6 x 3
## country year cases
## <fct> <chr> <int>
## 1 Afghanistan 1999 745
## 2 Brazil 1999 37737
## 3 China 1999 212258
## 4 Afghanistan 2000 2666
## 5 Brazil 2000 80488
## 6 China 2000 213766

Notice that you had to specify the columns you wanted to gather and we did this via 2:3. If you forget to specify columns it creates something else altogether so be careful.

52 / 139

alternatives to spread/gather: pivot_wider & pivot_longer

53 / 139

pivot_wider

pivot_wider replaces spread and in doing so adds some additional features. Let us see it in action with the fish_encounters data.

data("fish_encounters")
fish_encounters
## # A tibble: 114 x 3
## fish station seen
## <fct> <fct> <int>
## 1 4842 Release 1
## 2 4842 I80_1 1
## 3 4842 Lisbon 1
## 4 4842 Rstr 1
## 5 4842 Base_TD 1
## 6 4842 BCE 1
## 7 4842 BCW 1
## 8 4842 BCE2 1
## 9 4842 BCW2 1
## 10 4842 MAE 1
## # … with 104 more rows
54 / 139

I'd like each station to be a column

fish_encounters %>%
pivot_wider(
names_from = station,
values_from = seen
)
## # A tibble: 19 x 12
## fish Release I80_1 Lisbon Rstr Base_TD BCE BCW BCE2 BCW2 MAE MAW
## <fct> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
## 1 4842 1 1 1 1 1 1 1 1 1 1 1
## 2 4843 1 1 1 1 1 1 1 1 1 1 1
## 3 4844 1 1 1 1 1 1 1 1 1 1 1
## 4 4845 1 1 1 1 1 NA NA NA NA NA NA
## 5 4847 1 1 1 NA NA NA NA NA NA NA NA
## 6 4848 1 1 1 1 NA NA NA NA NA NA NA
## 7 4849 1 1 NA NA NA NA NA NA NA NA NA
## 8 4850 1 1 NA 1 1 1 1 NA NA NA NA
## 9 4851 1 1 NA NA NA NA NA NA NA NA NA
## 10 4854 1 1 NA NA NA NA NA NA NA NA NA
## 11 4855 1 1 1 1 1 NA NA NA NA NA NA
## 12 4857 1 1 1 1 1 1 1 1 1 NA NA
## 13 4858 1 1 1 1 1 1 1 1 1 1 1
## 14 4859 1 1 1 1 1 NA NA NA NA NA NA
## 15 4861 1 1 1 1 1 1 1 1 1 1 1
## 16 4862 1 1 1 1 1 1 1 1 1 NA NA
## 17 4863 1 1 NA NA NA NA NA NA NA NA NA
## 18 4864 1 1 NA NA NA NA NA NA NA NA NA
## 19 4865 1 1 1 NA NA NA NA NA NA NA NA
55 / 139

If a fish is not seen at a station, we see an entry of NA ... let us replace these with 0

fish_encounters %>% pivot_wider(
names_from = station,
values_from = seen,
values_fill = list(seen = 0)
)
## # A tibble: 19 x 12
## fish Release I80_1 Lisbon Rstr Base_TD BCE BCW BCE2 BCW2 MAE MAW
## <fct> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
## 1 4842 1 1 1 1 1 1 1 1 1 1 1
## 2 4843 1 1 1 1 1 1 1 1 1 1 1
## 3 4844 1 1 1 1 1 1 1 1 1 1 1
## 4 4845 1 1 1 1 1 0 0 0 0 0 0
## 5 4847 1 1 1 0 0 0 0 0 0 0 0
## 6 4848 1 1 1 1 0 0 0 0 0 0 0
## 7 4849 1 1 0 0 0 0 0 0 0 0 0
## 8 4850 1 1 0 1 1 1 1 0 0 0 0
## 9 4851 1 1 0 0 0 0 0 0 0 0 0
## 10 4854 1 1 0 0 0 0 0 0 0 0 0
## 11 4855 1 1 1 1 1 0 0 0 0 0 0
## 12 4857 1 1 1 1 1 1 1 1 1 0 0
## 13 4858 1 1 1 1 1 1 1 1 1 1 1
## 14 4859 1 1 1 1 1 0 0 0 0 0 0
## 15 4861 1 1 1 1 1 1 1 1 1 1 1
## 16 4862 1 1 1 1 1 1 1 1 1 0 0
## 17 4863 1 1 0 0 0 0 0 0 0 0 0
## 18 4864 1 1 0 0 0 0 0 0 0 0 0
## 19 4865 1 1 1 0 0 0 0 0 0 0 0
56 / 139

Here is a more interesting example. Look at the following data-set:

us_rent_income
## # A tibble: 104 x 5
## GEOID NAME variable estimate moe
## <chr> <chr> <chr> <dbl> <dbl>
## 1 01 Alabama income 24476 136
## 2 01 Alabama rent 747 3
## 3 02 Alaska income 32940 508
## 4 02 Alaska rent 1200 13
## 5 04 Arizona income 27517 148
## 6 04 Arizona rent 972 4
## 7 05 Arkansas income 23789 165
## 8 05 Arkansas rent 709 5
## 9 06 California income 29454 109
## 10 06 California rent 1358 3
## # … with 94 more rows

Each state has four entries, two estimates, one for income and one for rent, and two margins of error (moe) for income and rent, respectively.

57 / 139

I would like the estimate and margin of error (moe) for income and rent to be individual columns, for each state.

us_rent_income %>%
pivot_wider(
names_from = variable,
values_from = c(estimate, moe)
)
## # A tibble: 52 x 6
## GEOID NAME estimate_income estimate_rent moe_income moe_rent
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 01 Alabama 24476 747 136 3
## 2 02 Alaska 32940 1200 508 13
## 3 04 Arizona 27517 972 148 4
## 4 05 Arkansas 23789 709 165 5
## 5 06 California 29454 1358 109 3
## 6 08 Colorado 32401 1125 109 5
## 7 09 Connecticut 35326 1123 195 5
## 8 10 Delaware 31560 1076 247 10
## 9 11 District of Columbia 43198 1424 681 17
## 10 12 Florida 25952 1077 70 3
## # … with 42 more rows
58 / 139

pivot_longer

pivot_longer is the replacement for gather

See relig_income

relig_income %>%
head()
## # A tibble: 6 x 11
## religion `<$10k` `$10-20k` `$20-30k` `$30-40k` `$40-50k` `$50-75k` `$75-100k`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Agnostic 27 34 60 81 76 137 122
## 2 Atheist 12 27 37 52 35 70 73
## 3 Buddhist 27 21 30 34 33 58 62
## 4 Catholic 418 617 732 670 638 1116 949
## 5 Don’t know/refused 15 14 15 11 10 35 21
## 6 Evangelical Prot 575 869 1064 982 881 1486 949
## # … with 3 more variables: $100-150k <dbl>, >150k <dbl>, Don't know/refused <dbl>
59 / 139
relig_income %>%
group_by(religion) %>%
pivot_longer(
names_to = "income",
values_to = "count",
cols = 2:11
)
## # A tibble: 180 x 3
## # Groups: religion [18]
## religion income count
## <chr> <chr> <dbl>
## 1 Agnostic <$10k 27
## 2 Agnostic $10-20k 34
## 3 Agnostic $20-30k 60
## 4 Agnostic $30-40k 81
## 5 Agnostic $40-50k 76
## 6 Agnostic $50-75k 137
## 7 Agnostic $75-100k 122
## 8 Agnostic $100-150k 109
## 9 Agnostic >150k 84
## 10 Agnostic Don't know/refused 96
## # … with 170 more rows
60 / 139

Another example

The billboard dataset records the billboard rank of songs in the year 2000. It has a form similar to the relig_income data, but the data encoded in the column names is really a number, not a string.

billboard %>%
head(4)
## # A tibble: 4 x 79
## artist track date.entered wk1 wk2 wk3 wk4 wk5 wk6 wk7 wk8 wk9 wk10
## <chr> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2 Pac Baby … 2000-02-26 87 82 72 77 87 94 99 NA NA NA
## 2 2Ge+her The H… 2000-09-02 91 87 92 NA NA NA NA NA NA NA
## 3 3 Door… Krypt… 2000-04-08 81 70 68 67 66 57 54 53 51 51
## 4 3 Door… Loser 2000-10-21 76 76 72 69 67 65 55 59 62 61
## # … with 66 more variables: wk11 <dbl>, wk12 <dbl>, wk13 <dbl>, wk14 <dbl>, wk15 <dbl>,
## # wk16 <dbl>, wk17 <dbl>, wk18 <dbl>, wk19 <dbl>, wk20 <dbl>, wk21 <dbl>, wk22 <dbl>,
## # wk23 <dbl>, wk24 <dbl>, wk25 <dbl>, wk26 <dbl>, wk27 <dbl>, wk28 <dbl>, wk29 <dbl>,
## # wk30 <dbl>, wk31 <dbl>, wk32 <dbl>, wk33 <dbl>, wk34 <dbl>, wk35 <dbl>, wk36 <dbl>,
## # wk37 <dbl>, wk38 <dbl>, wk39 <dbl>, wk40 <dbl>, wk41 <dbl>, wk42 <dbl>, wk43 <dbl>,
## # wk44 <dbl>, wk45 <dbl>, wk46 <dbl>, wk47 <dbl>, wk48 <dbl>, wk49 <dbl>, wk50 <dbl>,
## # wk51 <dbl>, wk52 <dbl>, wk53 <dbl>, wk54 <dbl>, wk55 <dbl>, wk56 <dbl>, wk57 <dbl>,
## # wk58 <dbl>, wk59 <dbl>, wk60 <dbl>, wk61 <dbl>, wk62 <dbl>, wk63 <dbl>, wk64 <dbl>,
## # wk65 <dbl>, wk66 <lgl>, wk67 <lgl>, wk68 <lgl>, wk69 <lgl>, wk70 <lgl>, wk71 <lgl>,
## # wk72 <lgl>, wk73 <lgl>, wk74 <lgl>, wk75 <lgl>, wk76 <lgl>
61 / 139

We want the names to become a variable called week, and the values to become a variable called rank. We also use values_drop_na to drop rows that correspond to missing values. Not every song stays in the charts for all 76 weeks, so the structure of the input data force the creation of unnessary explicit NAs.

billboard %>%
pivot_longer(
cols = starts_with("wk"),
names_to = "week",
values_to = "rank",
values_drop_na = TRUE
)
## # A tibble: 5,307 x 5
## artist track date.entered week rank
## <chr> <chr> <date> <chr> <dbl>
## 1 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk1 87
## 2 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk2 82
## 3 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk3 72
## 4 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk4 77
## 5 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk5 87
## 6 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk6 94
## 7 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk7 99
## 8 2Ge+her The Hardest Part Of ... 2000-09-02 wk1 91
## 9 2Ge+her The Hardest Part Of ... 2000-09-02 wk2 87
## 10 2Ge+her The Hardest Part Of ... 2000-09-02 wk3 92
## # … with 5,297 more rows
62 / 139

Say we want to determine how long each song stayed in the charts. To do this we’ll need to convert the week variable to an integer. We can do that by using two additional arguments: names_prefix strips off the wk prefix, and names_ptypes specifies that week should be an integer:

billboard %>%
pivot_longer(
cols = starts_with("wk"),
names_to = "week",
names_prefix = "wk",
names_transform = list(week = as.integer),
values_to = "rank",
values_drop_na = TRUE
)
## # A tibble: 5,307 x 5
## artist track date.entered week rank
## <chr> <chr> <date> <int> <dbl>
## 1 2 Pac Baby Don't Cry (Keep... 2000-02-26 1 87
## 2 2 Pac Baby Don't Cry (Keep... 2000-02-26 2 82
## 3 2 Pac Baby Don't Cry (Keep... 2000-02-26 3 72
## 4 2 Pac Baby Don't Cry (Keep... 2000-02-26 4 77
## 5 2 Pac Baby Don't Cry (Keep... 2000-02-26 5 87
## 6 2 Pac Baby Don't Cry (Keep... 2000-02-26 6 94
## 7 2 Pac Baby Don't Cry (Keep... 2000-02-26 7 99
## 8 2Ge+her The Hardest Part Of ... 2000-09-02 1 91
## 9 2Ge+her The Hardest Part Of ... 2000-09-02 2 87
## 10 2Ge+her The Hardest Part Of ... 2000-09-02 3 92
## # … with 5,297 more rows
63 / 139

WHO Example

At this website you can find data provided by countries to WHO and estimates of TB burden generated by WHO for the Global Tuberculosis Report. I want you to gather the following data (all are in the csv format):

Read the data dictionary carefully. Then read-in the case notifications data.

The database has too many variables (175 or so) and so I'll have you use just a few for this practice task.

library(readr)
read_csv("data/TB_notifications_2018-02-07.csv") -> tb
64 / 139

Let us use dplyr to select and filter the tb database so that it meets the following criteria: (i) observations from 2012 or earlier years are kept, and (ii) keep only columns 1 through 6 and 25 (new_sp_m04) through 44 (new_sp_fu).

library(dplyr)
tb %>%
select(c(1:6, 25:44)) %>%
filter(year <= 2012) -> tb2
head(tb2)
## # A tibble: 6 x 26
## country iso2 iso3 iso_numeric g_whoregion year new_sp_m04 new_sp_m514
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 Afghanistan AF AFG 004 EMR 1980 NA NA
## 2 Afghanistan AF AFG 004 EMR 1981 NA NA
## 3 Afghanistan AF AFG 004 EMR 1982 NA NA
## 4 Afghanistan AF AFG 004 EMR 1983 NA NA
## 5 Afghanistan AF AFG 004 EMR 1984 NA NA
## 6 Afghanistan AF AFG 004 EMR 1985 NA NA
## # … with 18 more variables: new_sp_m014 <dbl>, new_sp_m1524 <dbl>,
## # new_sp_m2534 <dbl>, new_sp_m3544 <dbl>, new_sp_m4554 <dbl>,
## # new_sp_m5564 <dbl>, new_sp_m65 <dbl>, new_sp_mu <dbl>, new_sp_f04 <dbl>,
## # new_sp_f514 <dbl>, new_sp_f014 <dbl>, new_sp_f1524 <dbl>,
## # new_sp_f2534 <dbl>, new_sp_f3544 <dbl>, new_sp_f4554 <dbl>,
## # new_sp_f5564 <dbl>, new_sp_f65 <dbl>, new_sp_fu <dbl>
65 / 139

Now, new_sp_xx refer to new pulmonary smear-positive cases. The m refers to males and f refers to females. Finally, 04 is the 0-4 age group, 514 is the 5-14 age group, and so on. new_sp_mu is males with an unknown age-group and new_sp_fu is females with an unknown age-group. This is the messy part of the data. Ideally we'd have a column for gender, flagged as Male, Female, Unknown, and a separate column for age-groups flagged 0-4, 5-14, 0-14, and so on. Time to tidy these data.

Since the data are in the wide format we can use gather to flip it into the long format.

tb2 %>%
gather("group", "cases", 7:26) -> tb3
head(tb3)
## # A tibble: 6 x 8
## country iso2 iso3 iso_numeric g_whoregion year group cases
## <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <dbl>
## 1 Afghanistan AF AFG 004 EMR 1980 new_sp_m04 NA
## 2 Afghanistan AF AFG 004 EMR 1981 new_sp_m04 NA
## 3 Afghanistan AF AFG 004 EMR 1982 new_sp_m04 NA
## 4 Afghanistan AF AFG 004 EMR 1983 new_sp_m04 NA
## 5 Afghanistan AF AFG 004 EMR 1984 new_sp_m04 NA
## 6 Afghanistan AF AFG 004 EMR 1985 new_sp_m04 NA
66 / 139

Now we split the new_sp_xx values into three columns

tb3 %>%
separate(col = group, into = c("new", "sp", "sexage"), sep = "_") -> tb4
head(tb4)
## # A tibble: 6 x 10
## country iso2 iso3 iso_numeric g_whoregion year new sp sexage cases
## <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <dbl>
## 1 Afghanistan AF AFG 004 EMR 1980 new sp m04 NA
## 2 Afghanistan AF AFG 004 EMR 1981 new sp m04 NA
## 3 Afghanistan AF AFG 004 EMR 1982 new sp m04 NA
## 4 Afghanistan AF AFG 004 EMR 1983 new sp m04 NA
## 5 Afghanistan AF AFG 004 EMR 1984 new sp m04 NA
## 6 Afghanistan AF AFG 004 EMR 1985 new sp m04 NA
67 / 139

and then split sexage into two columns by specifying that the split should occur after the first value (which in our case will be after m or f).

tb4 %>%
separate(col = sexage, into = c("sex", "agegroup"), sep = 1) -> tb5
head(tb5)
## # A tibble: 6 x 11
## country iso2 iso3 iso_numeric g_whoregion year new sp sex agegroup
## <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr>
## 1 Afghanis… AF AFG 004 EMR 1980 new sp m 04
## 2 Afghanis… AF AFG 004 EMR 1981 new sp m 04
## 3 Afghanis… AF AFG 004 EMR 1982 new sp m 04
## 4 Afghanis… AF AFG 004 EMR 1983 new sp m 04
## 5 Afghanis… AF AFG 004 EMR 1984 new sp m 04
## 6 Afghanis… AF AFG 004 EMR 1985 new sp m 04
## # … with 1 more variable: cases <dbl>
68 / 139

The beauty of dplyr and tidyr is the ability to combine all these steps into a single piped command!

tb %>%
select(c(1:6, 25:44)) %>%
filter(year <= 2012) %>%
gather("group", "cases", 7:26) %>%
separate(col = group, into = c("new", "sp", "sexage"), sep = "_") %>%
separate(col = sexage, into = c("sex", "agegroup"), sep = 1) %>%
select(c(1:6, 9:11)) -> tb.df
head(tb.df)
## # A tibble: 6 x 9
## country iso2 iso3 iso_numeric g_whoregion year sex agegroup cases
## <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <chr> <dbl>
## 1 Afghanistan AF AFG 004 EMR 1980 m 04 NA
## 2 Afghanistan AF AFG 004 EMR 1981 m 04 NA
## 3 Afghanistan AF AFG 004 EMR 1982 m 04 NA
## 4 Afghanistan AF AFG 004 EMR 1983 m 04 NA
## 5 Afghanistan AF AFG 004 EMR 1984 m 04 NA
## 6 Afghanistan AF AFG 004 EMR 1985 m 04 NA
69 / 139

We can clean-up the codes for easier interpretation.

ifelse(tb.df$sex == "m", "Males", "Females") -> tb.df$sex
ifelse(tb.df$agegroup == "04", "0-4",
ifelse(tb.df$agegroup == "014", "0-14",
ifelse(tb.df$agegroup == "1524", "15-24",
ifelse(tb.df$agegroup == "2534", "25-34",
ifelse(tb.df$agegroup == "3544", "35-44",
ifelse(tb.df$agegroup == "4554", "45-54",
ifelse(tb.df$agegroup == "514", "5-14",
ifelse(tb.df$agegroup == "5564", "55-64",
ifelse(tb.df$agegroup == "65", "65+", "Uknown"))))))))
) -> tb.df$agegroup
head(tb.df)
## # A tibble: 6 x 9
## country iso2 iso3 iso_numeric g_whoregion year sex agegroup cases
## <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <chr> <dbl>
## 1 Afghanistan AF AFG 004 EMR 1980 Males 0-4 NA
## 2 Afghanistan AF AFG 004 EMR 1981 Males 0-4 NA
## 3 Afghanistan AF AFG 004 EMR 1982 Males 0-4 NA
## 4 Afghanistan AF AFG 004 EMR 1983 Males 0-4 NA
## 5 Afghanistan AF AFG 004 EMR 1984 Males 0-4 NA
## 6 Afghanistan AF AFG 004 EMR 1985 Males 0-4 NA
70 / 139

How many cases per country in the 1980-1990 period?

tb.df %>%
filter(year >= 1980 | year <= 1990) %>%
group_by(country) %>%
summarise(Ncases = sum(cases, na.rm = TRUE)) %>%
arrange(-Ncases) -> tbtab.01
head(tbtab.01)
## # A tibble: 6 x 2
## country Ncases
## <chr> <dbl>
## 1 India 6492850
## 2 China 5643738
## 3 Indonesia 1886657
## 4 South Africa 1435147
## 5 Bangladesh 1072431
## 6 Viet Nam 915525
71 / 139

What about by sex and age-group?

tb.df %>%
filter(year >= 1980 | year <= 1990) %>%
group_by(country, sex, agegroup) %>%
summarise(Ncases = sum(cases, na.rm = TRUE)) %>%
arrange(-Ncases) -> tbtab.02
head(tbtab.02)
## # A tibble: 6 x 4
## # Groups: country, sex [2]
## country sex agegroup Ncases
## <chr> <chr> <chr> <dbl>
## 1 India Males 35-44 963811
## 2 India Males 25-34 913939
## 3 India Males 45-54 835228
## 4 India Males 15-24 791303
## 5 China Males 65+ 748103
## 6 China Males 45-54 668133

This is just one example of how dplyr and tidyr -- two packages from the tidyverse -- can be used to deal with what seems to be quite messy data. The more you use it, the easier it becomes to figure out solutions to even very complicated messy data setups.

72 / 139

A complex problem for pivot_longer & pivot_wider

world_bank_pop contains data from the World Bank about population per country from 2000 to 2018

world_bank_pop %>%
head(4)
## # A tibble: 4 x 20
## country indicator `2000` `2001` `2002` `2003` `2004` `2005` `2006`
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ABW SP.URB.TO… 4.24e4 4.30e4 4.37e4 4.42e4 4.47e+4 4.49e+4 4.49e+4
## 2 ABW SP.URB.GR… 1.18e0 1.41e0 1.43e0 1.31e0 9.51e-1 4.91e-1 -1.78e-2
## 3 ABW SP.POP.TO… 9.09e4 9.29e4 9.50e4 9.70e4 9.87e+4 1.00e+5 1.01e+5
## 4 ABW SP.POP.GR… 2.06e0 2.23e0 2.23e0 2.11e0 1.76e+0 1.30e+0 7.98e-1
## # … with 11 more variables: 2007 <dbl>, 2008 <dbl>, 2009 <dbl>, 2010 <dbl>,
## # 2011 <dbl>, 2012 <dbl>, 2013 <dbl>, 2014 <dbl>, 2015 <dbl>, 2016 <dbl>,
## # 2017 <dbl>
73 / 139

It would be better if each indicator were a column and each year were a row. So the first thing we could do is move the years from being a columns to being rows ... i.e., pivot_longer

world_bank_pop %>%
pivot_longer(
`2000`:`2017`,
names_to = "year",
values_to = "value"
) -> pop2
pop2
## # A tibble: 19,008 x 4
## country indicator year value
## <chr> <chr> <chr> <dbl>
## 1 ABW SP.URB.TOTL 2000 42444
## 2 ABW SP.URB.TOTL 2001 43048
## 3 ABW SP.URB.TOTL 2002 43670
## 4 ABW SP.URB.TOTL 2003 44246
## 5 ABW SP.URB.TOTL 2004 44669
## 6 ABW SP.URB.TOTL 2005 44889
## 7 ABW SP.URB.TOTL 2006 44881
## 8 ABW SP.URB.TOTL 2007 44686
## 9 ABW SP.URB.TOTL 2008 44375
## 10 ABW SP.URB.TOTL 2009 44052
## # … with 18,998 more rows
74 / 139

Look at the indicator; you have four dimensions here, urban (URB) versus total (TOT), and population (POP) versus growth (GROW)

pop2 %>%
count(indicator)
## # A tibble: 4 x 2
## indicator n
## * <chr> <int>
## 1 SP.POP.GROW 4752
## 2 SP.POP.TOTL 4752
## 3 SP.URB.GROW 4752
## 4 SP.URB.TOTL 4752
75 / 139

Let us use separate to split URB versus TOTAL

pop2 %>%
separate(indicator, c(NA, "area", "variable")) -> pop3
pop3
## # A tibble: 19,008 x 5
## country area variable year value
## <chr> <chr> <chr> <chr> <dbl>
## 1 ABW URB TOTL 2000 42444
## 2 ABW URB TOTL 2001 43048
## 3 ABW URB TOTL 2002 43670
## 4 ABW URB TOTL 2003 44246
## 5 ABW URB TOTL 2004 44669
## 6 ABW URB TOTL 2005 44889
## 7 ABW URB TOTL 2006 44881
## 8 ABW URB TOTL 2007 44686
## 9 ABW URB TOTL 2008 44375
## 10 ABW URB TOTL 2009 44052
## # … with 18,998 more rows
76 / 139

Now we use pivot_wider yo make POP and GROW individual columns

pop3 %>%
pivot_wider(
names_from = variable,
values_from = value
) -> pop4
pop4
## # A tibble: 9,504 x 5
## country area year TOTL GROW
## <chr> <chr> <chr> <dbl> <dbl>
## 1 ABW URB 2000 42444 1.18
## 2 ABW URB 2001 43048 1.41
## 3 ABW URB 2002 43670 1.43
## 4 ABW URB 2003 44246 1.31
## 5 ABW URB 2004 44669 0.951
## 6 ABW URB 2005 44889 0.491
## 7 ABW URB 2006 44881 -0.0178
## 8 ABW URB 2007 44686 -0.435
## 9 ABW URB 2008 44375 -0.698
## 10 ABW URB 2009 44052 -0.731
## # … with 9,494 more rows
77 / 139

Revisiting the WHO data

who %>%
head(4)
## # A tibble: 4 x 60
## country iso2 iso3 year new_sp_m014 new_sp_m1524 new_sp_m2534 new_sp_m3544
## <chr> <chr> <chr> <int> <int> <int> <int> <int>
## 1 Afghanis… AF AFG 1980 NA NA NA NA
## 2 Afghanis… AF AFG 1981 NA NA NA NA
## 3 Afghanis… AF AFG 1982 NA NA NA NA
## 4 Afghanis… AF AFG 1983 NA NA NA NA
## # … with 52 more variables: new_sp_m4554 <int>, new_sp_m5564 <int>,
## # new_sp_m65 <int>, new_sp_f014 <int>, new_sp_f1524 <int>,
## # new_sp_f2534 <int>, new_sp_f3544 <int>, new_sp_f4554 <int>,
## # new_sp_f5564 <int>, new_sp_f65 <int>, new_sn_m014 <int>,
## # new_sn_m1524 <int>, new_sn_m2534 <int>, new_sn_m3544 <int>,
## # new_sn_m4554 <int>, new_sn_m5564 <int>, new_sn_m65 <int>,
## # new_sn_f014 <int>, new_sn_f1524 <int>, new_sn_f2534 <int>,
## # new_sn_f3544 <int>, new_sn_f4554 <int>, new_sn_f5564 <int>,
## # new_sn_f65 <int>, new_ep_m014 <int>, new_ep_m1524 <int>,
## # new_ep_m2534 <int>, new_ep_m3544 <int>, new_ep_m4554 <int>,
## # new_ep_m5564 <int>, new_ep_m65 <int>, new_ep_f014 <int>,
## # new_ep_f1524 <int>, new_ep_f2534 <int>, new_ep_f3544 <int>,
## # new_ep_f4554 <int>, new_ep_f5564 <int>, new_ep_f65 <int>,
## # new_rel_m014 <int>, new_rel_m1524 <int>, new_rel_m2534 <int>,
## # new_rel_m3544 <int>, new_rel_m4554 <int>, new_rel_m5564 <int>,
## # new_rel_m65 <int>, new_rel_f014 <int>, new_rel_f1524 <int>,
## # new_rel_f2534 <int>, new_rel_f3544 <int>, new_rel_f4554 <int>,
## # new_rel_f5564 <int>, new_rel_f65 <int>
78 / 139
who %>% pivot_longer(
cols = new_sp_m014:new_rel_f65,
names_to = c("diagnosis", "gender", "age"),
names_pattern = "new_?(.*)_(.)(.*)",
values_to = "count"
)
## # A tibble: 405,440 x 8
## country iso2 iso3 year diagnosis gender age count
## <chr> <chr> <chr> <int> <chr> <chr> <chr> <int>
## 1 Afghanistan AF AFG 1980 sp m 014 NA
## 2 Afghanistan AF AFG 1980 sp m 1524 NA
## 3 Afghanistan AF AFG 1980 sp m 2534 NA
## 4 Afghanistan AF AFG 1980 sp m 3544 NA
## 5 Afghanistan AF AFG 1980 sp m 4554 NA
## 6 Afghanistan AF AFG 1980 sp m 5564 NA
## 7 Afghanistan AF AFG 1980 sp m 65 NA
## 8 Afghanistan AF AFG 1980 sp f 014 NA
## 9 Afghanistan AF AFG 1980 sp f 1524 NA
## 10 Afghanistan AF AFG 1980 sp f 2534 NA
## # … with 405,430 more rows

Next, even more explicit code since we know age groups should be ordered, and that gender should be a factor ...

79 / 139
who %>% pivot_longer(
cols = new_sp_m014:new_rel_f65,
names_to = c("diagnosis", "gender", "age"),
names_pattern = "new_?(.*)_(.)(.*)",
names_ptypes = list(
gender = factor(levels = c("f", "m")),
age = factor(
levels = c("014", "1524", "2534", "3544", "4554", "5564", "65"),
ordered = TRUE
)
),
values_to = "count"
)
## # A tibble: 405,440 x 8
## country iso2 iso3 year diagnosis gender age count
## <chr> <chr> <chr> <int> <chr> <fct> <ord> <int>
## 1 Afghanistan AF AFG 1980 sp m 014 NA
## 2 Afghanistan AF AFG 1980 sp m 1524 NA
## 3 Afghanistan AF AFG 1980 sp m 2534 NA
## 4 Afghanistan AF AFG 1980 sp m 3544 NA
## 5 Afghanistan AF AFG 1980 sp m 4554 NA
## 6 Afghanistan AF AFG 1980 sp m 5564 NA
## 7 Afghanistan AF AFG 1980 sp m 65 NA
## 8 Afghanistan AF AFG 1980 sp f 014 NA
## 9 Afghanistan AF AFG 1980 sp f 1524 NA
## 10 Afghanistan AF AFG 1980 sp f 2534 NA
## # … with 405,430 more rows
80 / 139

case_when

81 / 139

This is the dplyr alternative to ifelse. Here are a few examples. First, working with categorical variables, where I create a new variable that flags whether the flight originated in BWI, DCA, OR IAH versus some other airport.

cmhflights2017 %>%
mutate(DC_area = case_when(
Origin %in% c("BWI","DCA", "IAH") ~ "DC Area",
!Origin %in% c("BWI","DCA", "IAH") ~ "Not in DC Area")
) -> cw.df
table(cw.df$DC_area, cw.df$Origin)
##
## ATL BNA BOS BWI CLT CMH DAL DCA DEN DFW
## DC Area 0 0 0 1301 0 0 0 674 0 0
## Not in DC Area 3794 692 623 0 3 23893 369 0 1224 1482
##
## DTW EWR FLL HOU IAH LAS LAX LGA MCO MDW
## DC Area 0 0 0 0 760 0 0 0 0 0
## Not in DC Area 755 847 321 16 0 1074 641 173 1488 1992
##
## MSP MSY OAK ORD PHL PHX RSW STL TPA
## DC Area 0 0 0 0 0 0 0 0 0
## Not in DC Area 945 35 365 1089 57 1075 657 655 787
82 / 139

Similarly, we could use numerical variables to create new ones. For example, let us call a flight delayed if DepDelayMinutes is \(> 30\).

cw.df %>%
mutate(delayed = case_when(
DepDelayMinutes > 30 ~ "Yes",
DepDelayMinutes < 30 ~ "No",
is.na(DepDelayMinutes) ~ "Unknown")
) -> cw.df
table(cw.df$delayed)
##
## No Unknown Yes
## 42227 538 4906
83 / 139
84 / 139

Say I have the mtcars data and want to see the distribution of the number of cylinders a car has

data(mtcars)
table(mtcars$cyl)
##
## 4 6 8
## 11 7 14

cyl is stored as a numeric so I can flip this into a factor via

library(forcats)
factor(mtcars$cyl) -> mtcars$f.cyl
fct_recode(mtcars$f.cyl,
Four = '4',
Six = '6',
Eight = '8'
) -> mtcars$f.cyl
table(mtcars$f.cyl)
##
## Four Six Eight
## 11 7 14
85 / 139

If I plot mpg by the car, I get

mtcars$model <- row.names(mtcars)
library(ggplot2)
ggplot(mtcars, aes(x = mpg, y = model)) +
geom_point() +
labs(x = "Miles per gallon",
y = "Car Model") +
theme_minimal()

This isn't very helpful since the models are not arranged in ascending/descending order of miles per gallon. That can be easily fixed:

86 / 139

If I want the plot to be in descending order of mpg,

ggplot(mtcars, aes(x = mpg, fct_reorder(.f = model, .x = mpg))) +
geom_point() +
labs(x = "Miles per gallon",
y = "Car Model") +
theme_minimal()

If I want the plot to be in descending order of mpg,

ggplot(mtcars, aes(x = mpg, fct_reorder(.f = model, .x = -mpg))) +
geom_point() +
labs(x = "Miles per gallon", y
= "Car Model") +
theme_minimal()

Note the use of .f and .x rather tha f and x

87 / 139

I can do this with bar-plots too.

ggplot(mtcars, aes(f.cyl)) + geom_bar(aes(fill = f.cyl)) +
theme_minimal() +
theme(legend.position = "none") +
labs(x = "Number of cylinders",
y = "Frequency")

88 / 139

The bars ordered by descending frequency

ggplot(mtcars, aes(fct_infreq(f.cyl))) +
geom_bar(aes(fill = f.cyl)) +
theme_minimal() +
theme(legend.position = "none") +
labs(x = "Number of cylinders",
y = "Frequency")

Note the use of fct_infreq() for descending

89 / 139

and the bars ordered by ascending frequency

ggplot(mtcars, aes(fct_rev(fct_infreq(f.cyl)))) +
geom_bar(aes(fill = f.cyl)) +
theme_minimal() +
theme(legend.position = "none") +
labs(x = "Number of cylinders",
y = "Frequency")

Note the use of fct_rev(fct_infreq()) for ascending.

90 / 139

Advanced dplyr

(or verbs I often forget exist in the dplyr stable)

91 / 139

select_if()

You can select columns based on logical and other criteria.

cmhflights2017 %>%
select_if(is.character) %>%
names()
## [1] "Reporting_Airline" "IATA_CODE_Reporting_Airline"
## [3] "Tail_Number" "Origin"
## [5] "OriginCityName" "OriginState"
## [7] "OriginStateFips" "OriginStateName"
## [9] "Dest" "DestCityName"
## [11] "DestState" "DestStateFips"
## [13] "DestStateName" "CRSDepTime"
## [15] "DepTime" "DepTimeBlk"
## [17] "WheelsOff" "WheelsOn"
## [19] "CRSArrTime" "ArrTime"
## [21] "ArrTimeBlk" "CancellationCode"
## [23] "FirstDepTime" "Div1Airport"
## [25] "Div1WheelsOn" "Div1WheelsOff"
## [27] "Div1TailNum"
92 / 139
library(lubridate)
cmhflights2017 %>%
select_if(is.Date) %>%
names()
## [1] "FlightDate"
cmhflights2017 %>%
select_if(~is.numeric(.) &
mean(., na.rm = TRUE) > 10
) %>%
names()
## [1] "Year" "DayofMonth"
## [3] "DOT_ID_Reporting_Airline" "Flight_Number_Reporting_Airline"
## [5] "OriginAirportID" "OriginAirportSeqID"
## [7] "OriginCityMarketID" "OriginWac"
## [9] "DestAirportID" "DestAirportSeqID"
## [11] "DestCityMarketID" "DestWac"
## [13] "DepDelayMinutes" "TaxiOut"
## [15] "ArrDelayMinutes" "CRSElapsedTime"
## [17] "ActualElapsedTime" "AirTime"
## [19] "Distance" "CarrierDelay"
## [21] "NASDelay" "LateAircraftDelay"
## [23] "TotalAddGTime" "LongestAddGTime"
## [25] "DivActualElapsedTime" "DivArrDelay"
## [27] "DivDistance" "Div1AirportID"
## [29] "Div1AirportSeqID" "Div1TotalGTime"
## [31] "Div1LongestGTime"
93 / 139

You can also select with other conditions, for example, if the column has fewer than some number of distinct values ...

cmhflights2017 %>%
select_if(~n_distinct(.) < 5) %>%
names()
## [1] "Year" "Quarter" "DepDel15"
## [4] "ArrDel15" "Cancelled" "CancellationCode"
## [7] "Diverted" "Flights" "DivAirportLandings"
## [10] "DivReachedDest" "Div2Airport" "Div2AirportID"
## [13] "Div2AirportSeqID" "Div2WheelsOn" "Div2TotalGTime"
## [16] "Div2LongestGTime" "Div2WheelsOff" "Div2TailNum"
## [19] "Div3Airport" "Div3AirportID" "Div3AirportSeqID"
## [22] "Div3WheelsOn" "Div3TotalGTime" "Div3LongestGTime"
## [25] "Div3WheelsOff" "Div3TailNum" "Div4Airport"
## [28] "Div4AirportID" "Div4AirportSeqID" "Div4WheelsOn"
## [31] "Div4TotalGTime" "Div4LongestGTime" "Div4WheelsOff"
## [34] "Div4TailNum" "Div5Airport" "Div5AirportID"
## [37] "Div5AirportSeqID" "Div5WheelsOn" "Div5TotalGTime"
## [40] "Div5LongestGTime" "Div5WheelsOff" "Div5TailNum"
## [43] "X110"
94 / 139

You can rearrange colums more swiftly too, as shown below where the first two columns I want are explicitly specified and then all remaining columns via everything()

df.ev <- cmhflights2017 %>%
select(FlightDate, Flight_Number_Reporting_Airline, everything()
)
head(df.ev[, c(1:6)], 15)
## FlightDate Flight_Number_Reporting_Airline id Year Quarter Month
## 1 2017-01-02 2419 1 2017 1 1
## 2 2017-01-02 2583 1 2017 1 1
## 3 2017-01-03 977 1 2017 1 1
## 4 2017-01-03 977 1 2017 1 1
## 5 2017-01-03 1119 1 2017 1 1
## 6 2017-01-03 1170 1 2017 1 1
## 7 2017-01-03 1276 1 2017 1 1
## 8 2017-01-03 1276 1 2017 1 1
## 9 2017-01-03 1327 1 2017 1 1
## 10 2017-01-03 1405 1 2017 1 1
## 11 2017-01-03 1405 1 2017 1 1
## 12 2017-01-03 1533 1 2017 1 1
## 13 2017-01-03 1533 1 2017 1 1
## 14 2017-01-03 1736 1 2017 1 1
## 15 2017-01-03 1897 1 2017 1 1
95 / 139

select_all

This is a very handy function as well since you can make sweeping changes very easily

cmhflights2017 %>%
select_all(tolower) -> df.low
names(df.low[, 1:10])
## [1] "id" "year"
## [3] "quarter" "month"
## [5] "dayofmonth" "dayofweek"
## [7] "flightdate" "reporting_airline"
## [9] "dot_id_reporting_airline" "iata_code_reporting_airline"
cmhflights2017 %>%
select_all(tolower) %>%
select_all(~str_replace_all(., "_", "")) -> df.low
names(df.low[, 1:10])
## [1] "id" "year"
## [3] "quarter" "month"
## [5] "dayofmonth" "dayofweek"
## [7] "flightdate" "reportingairline"
## [9] "dotidreportingairline" "iatacodereportingairline"
96 / 139

select_at

This allows you to operate on selected variables if they meet a specific criterion

mtcars %>%
select_at(vars(contains("ar"), starts_with("c")), toupper) %>%
head()
## GEAR CARB CYL
## Mazda RX4 4 4 6
## Mazda RX4 Wag 4 4 6
## Datsun 710 4 1 4
## Hornet 4 Drive 3 1 6
## Hornet Sportabout 3 2 8
## Valiant 3 1 6
97 / 139

rownames_to_column

Some data-sets will have labels for rownames, as in the case of mtcars ... notice the first column is actually the row names

head(mtcars[, c(1:4)])
## mpg cyl disp hp
## Mazda RX4 21.0 6 160 110
## Mazda RX4 Wag 21.0 6 160 110
## Datsun 710 22.8 4 108 93
## Hornet 4 Drive 21.4 6 258 110
## Hornet Sportabout 18.7 8 360 175
## Valiant 18.1 6 225 105
mtcars %>%
rownames_to_column() %>%
head(3)
## rowname mpg cyl disp hp drat wt qsec vs am gear carb f.cyl
## 1 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Six
## 2 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Six
## 3 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Four
## model
## 1 Mazda RX4
## 2 Mazda RX4 Wag
## 3 Datsun 710
98 / 139

mutate_at

This will allow you to mutate specific columns when the condition is met

mtcars %>%
mutate_at(vars(contains("el")), tolower
) -> mut.df
head(mut.df)
## mpg cyl disp hp drat wt qsec vs am gear carb f.cyl
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Six
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Six
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Four
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Six
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 Eight
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 Six
## model
## Mazda RX4 mazda rx4
## Mazda RX4 Wag mazda rx4 wag
## Datsun 710 datsun 710
## Hornet 4 Drive hornet 4 drive
## Hornet Sportabout hornet sportabout
## Valiant valiant
mtcars %>%
mutate_at(vars(contains("pg")), ~(./10)
) -> mut.df
head(mut.df)
## mpg cyl disp hp drat wt qsec vs am gear carb f.cyl
## Mazda RX4 2.10 6 160 110 3.90 2.620 16.46 0 1 4 4 Six
## Mazda RX4 Wag 2.10 6 160 110 3.90 2.875 17.02 0 1 4 4 Six
## Datsun 710 2.28 4 108 93 3.85 2.320 18.61 1 1 4 1 Four
## Hornet 4 Drive 2.14 6 258 110 3.08 3.215 19.44 1 0 3 1 Six
## Hornet Sportabout 1.87 8 360 175 3.15 3.440 17.02 0 0 3 2 Eight
## Valiant 1.81 6 225 105 2.76 3.460 20.22 1 0 3 1 Six
## model
## Mazda RX4 Mazda RX4
## Mazda RX4 Wag Mazda RX4 Wag
## Datsun 710 Datsun 710
## Hornet 4 Drive Hornet 4 Drive
## Hornet Sportabout Hornet Sportabout
## Valiant Valiant
99 / 139

mutate_if

This is asking that if a column name has the string vs, that column should be converted into a factor

mtcars %>%
mutate_if(is.factor, toupper
) -> mut.df
head(mut.df)
## mpg cyl disp hp drat wt qsec vs am gear carb f.cyl
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 SIX
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 SIX
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 FOUR
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 SIX
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 EIGHT
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 SIX
## model
## Mazda RX4 Mazda RX4
## Mazda RX4 Wag Mazda RX4 Wag
## Datsun 710 Datsun 710
## Hornet 4 Drive Hornet 4 Drive
## Hornet Sportabout Hornet Sportabout
## Valiant Valiant
100 / 139

na_if

Here we are asking that if any variable mpg through vs has the value of 110, this value should be converted to NA (a missing value) ... compare values of hp across the left-right blocks of output

mtcars %>%
select(mpg:vs) %>%
na_if(110) -> naif.df
head(naif.df)
## mpg cyl disp hp drat wt qsec vs
## Mazda RX4 21.0 6 160 NA 3.90 2.620 16.46 0
## Mazda RX4 Wag 21.0 6 160 NA 3.90 2.875 17.02 0
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1
## Hornet 4 Drive 21.4 6 258 NA 3.08 3.215 19.44 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1
mtcars %>%
select(mpg:vs) -> nonaif.df
head(nonaif.df)
## mpg cyl disp hp drat wt qsec vs
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1
101 / 139

rowwise

This comes in handy if you have to operate on two or more columns but want the calculation done for each row. Here the average of mpg + qsec is being calculated for each model

mtcars %>%
rowwise() %>%
mutate(average = mean(c(mpg, qsec))) -> row.df
head(row.df[, c(1, 7, 14)])
## # A tibble: 6 x 3
## # Rowwise:
## mpg qsec average
## <dbl> <dbl> <dbl>
## 1 21 16.5 18.7
## 2 21 17.0 19.0
## 3 22.8 18.6 20.7
## 4 21.4 19.4 20.4
## 5 18.7 17.0 17.9
## 6 18.1 20.2 19.2
102 / 139

between

As the name suggests, this allows you to filter for values that fall between the specified range (here mpg of 21 to 24). These boundary values are included in the filtered results

mtcars %>%
filter(between(mpg, 21, 24)
) -> between.df
between.df
## mpg cyl disp hp drat wt qsec vs am gear carb f.cyl
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 Six
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 Six
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 Four
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 Six
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 Four
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 Four
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 Four
## model
## Mazda RX4 Mazda RX4
## Mazda RX4 Wag Mazda RX4 Wag
## Datsun 710 Datsun 710
## Hornet 4 Drive Hornet 4 Drive
## Merc 230 Merc 230
## Toyota Corona Toyota Corona
## Volvo 142E Volvo 142E
103 / 139

near

This allows you to filter within a range of values, the range specified via tol. Here, for example, we are asking for any case where mpg is inside the \(\pm 0.5\) range of \(mpg = 21\), i.e., \(20.5 < x < 21.5\)

mtcars %>%
filter(near(mpg, 21, tol = 0.5)
) -> near.df
near.df
## mpg cyl disp hp drat wt qsec vs am gear carb f.cyl
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Six
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Six
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Six
## Volvo 142E 21.4 4 121 109 4.11 2.780 18.60 1 1 4 2 Four
## model
## Mazda RX4 Mazda RX4
## Mazda RX4 Wag Mazda RX4 Wag
## Hornet 4 Drive Hornet 4 Drive
## Volvo 142E Volvo 142E
mtcars %>%
filter(near(mpg, 21, tol = 0.5 * sd(mpg))
) -> near.df
head(near.df)
## mpg cyl disp hp drat wt qsec vs am gear carb f.cyl
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Six
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Six
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Four
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Six
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 Eight
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 Six
## model
## Mazda RX4 Mazda RX4
## Mazda RX4 Wag Mazda RX4 Wag
## Datsun 710 Datsun 710
## Hornet 4 Drive Hornet 4 Drive
## Hornet Sportabout Hornet Sportabout
## Valiant Valiant
104 / 139

xor

This is just another way of saying filter if mpg < 22 | cyl < 6

mtcars %>%
filter(xor(mpg < 22, cyl < 6)) -> xor.df
head(xor.df)
## mpg cyl disp hp drat wt qsec vs am gear carb f.cyl
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Six
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Six
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Four
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Six
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 Eight
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 Six
## model
## Mazda RX4 Mazda RX4
## Mazda RX4 Wag Mazda RX4 Wag
## Datsun 710 Datsun 710
## Hornet 4 Drive Hornet 4 Drive
## Hornet Sportabout Hornet Sportabout
## Valiant Valiant
mtcars %>%
filter(mpg < 22 | cyl < 6) -> nonxor.df
head(nonxor.df)
## mpg cyl disp hp drat wt qsec vs am gear carb f.cyl
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Six
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Six
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Four
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Six
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 Eight
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 Six
## model
## Mazda RX4 Mazda RX4
## Mazda RX4 Wag Mazda RX4 Wag
## Datsun 710 Datsun 710
## Hornet 4 Drive Hornet 4 Drive
## Hornet Sportabout Hornet Sportabout
## Valiant Valiant
105 / 139

filter_all

This will filter all columns based on the condition specified, for example, findig all instances where baby names include the string "nnie"

library(babynames)
babynames %>%
filter_all(any_vars(str_detect(., pattern = "nnie"))
) -> filt.df
head(filt.df)
## # A tibble: 6 x 5
## year sex name n prop
## <dbl> <chr> <chr> <int> <dbl>
## 1 1880 F Minnie 1746 0.0179
## 2 1880 F Annie 1258 0.0129
## 3 1880 F Jennie 793 0.00812
## 4 1880 F Fannie 560 0.00574
## 5 1880 F Nannie 248 0.00254
## 6 1880 F Winnie 49 0.000502
106 / 139
babynames %>%
select_if(is.numeric) %>%
filter_all(any_vars(. == 1880)
) -> filt.df
head(filt.df)
## # A tibble: 6 x 3
## year n prop
## <dbl> <int> <dbl>
## 1 1880 7065 0.0724
## 2 1880 2604 0.0267
## 3 1880 2003 0.0205
## 4 1880 1939 0.0199
## 5 1880 1746 0.0179
## 6 1880 1578 0.0162

This has retained only those rows where at least one column/variable has a value of 1800

107 / 139

filter_if

This will allow you to filter if the logical condition is met, here if the column is a character column and if any character column has a missing value

starwars %>%
filter_if(is.character, any_vars(is.na(.))
) -> filt.df
head(filt.df)
## # A tibble: 6 x 14
## name height mass hair_color skin_color eye_color birth_year sex gender
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 C-3PO 167 75 <NA> gold yellow 112 none mascu…
## 2 R2-D2 96 32 <NA> white, blue red 33 none mascu…
## 3 R5-D4 97 32 <NA> white, red red NA none mascu…
## 4 Greedo 173 74 <NA> green black 44 male mascu…
## 5 Jabba … 175 1358 <NA> green-tan,… orange 600 herma… mascu…
## 6 Yoda 66 17 white green brown 896 male mascu…
## # … with 5 more variables: homeworld <chr>, species <chr>, films <list>,
## # vehicles <list>, starships <list>
108 / 139

filter_at

This will come in very handy if you want to filter specific variables if a condition is met

mtcars %>%
filter_at(vars(mpg, qsec), all_vars(. > 19)
) -> filt.df
head(filt.df)
## mpg cyl disp hp drat wt qsec vs am gear carb f.cyl
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 Six
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 Four
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 Four
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 Four
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 Four
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 Four
## model
## Hornet 4 Drive Hornet 4 Drive
## Merc 240D Merc 240D
## Merc 230 Merc 230
## Fiat 128 Fiat 128
## Toyota Corolla Toyota Corolla
## Toyota Corona Toyota Corona

This looked for mpg and qsec and then retained rows where either variable had a value > 19

109 / 139
mtcars %>%
filter_at(vars(contains("t")), any_vars(. > 3)
) -> filt.df
head(filt.df)
## mpg cyl disp hp drat wt qsec vs am gear carb f.cyl
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Six
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Six
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Four
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Six
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 Eight
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 Six
## model
## Mazda RX4 Mazda RX4
## Mazda RX4 Wag Mazda RX4 Wag
## Datsun 710 Datsun 710
## Hornet 4 Drive Hornet 4 Drive
## Hornet Sportabout Hornet Sportabout
## Valiant Valiant

Here we asked for columns with the letter t in their name, and any of these selected columns with values > 3

110 / 139
mtcars %>%
filter_at(vars(contains("t")), all_vars(. > 3)
) -> filt.df
head(filt.df)
## mpg cyl disp hp drat wt qsec vs am gear carb f.cyl
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 Six
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 Eight
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 Eight
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 Four
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 Four
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 Six
## model
## Hornet 4 Drive Hornet 4 Drive
## Hornet Sportabout Hornet Sportabout
## Duster 360 Duster 360
## Merc 240D Merc 240D
## Merc 230 Merc 230
## Merc 280 Merc 280

Here we asked for columns with the letter t in their name, and all of these selected columns should have values > 3

111 / 139

summarise_all

Here we are asking that means be calculated for all selected variables -- mpg:qsec -- each on its own, of course

mtcars %>%
select(mpg:qsec) %>%
summarise_all(mean, na.rm = TRUE) -> summ.df
summ.df
## mpg cyl disp hp drat wt qsec
## 1 20.09062 6.1875 230.7219 146.6875 3.596563 3.21725 17.84875
112 / 139

summarise_if

Here we are asking for the means of specific variables -- if they are numeric -- to be calculated

mtcars %>%
summarise_if(is.numeric, mean, na.rm = TRUE) %>%
rename_if(is.numeric, ~paste0("mean_", .)) -> summ.df
summ.df
## mean_mpg mean_cyl mean_disp mean_hp mean_drat mean_wt mean_qsec mean_vs
## 1 20.09062 6.1875 230.7219 146.6875 3.596563 3.21725 17.84875 0.4375
## mean_am mean_gear mean_carb
## 1 0.40625 3.6875 2.8125

rename_if() is being used to append recognizable labels to the calculated values

113 / 139

summarise_at

Here we are asking for the means of specific variables -- those with the word "Delay" in their name -- to be calculated

cmhflights2017 %>%
summarise_at(vars(contains("Delay")), mean, na.rm = TRUE) %>%
rename_at(vars(contains("Delay")), ~paste0("mean_", .)) -> summ.df
summ.df
## mean_DepDelay mean_DepDelayMinutes mean_DepartureDelayGroups mean_ArrDelay
## 1 8.756439 11.54685 -0.03739762 2.601107
## mean_ArrDelayMinutes mean_ArrivalDelayGroups mean_CarrierDelay
## 1 11.07277 -0.3285672 18.72072
## mean_WeatherDelay mean_NASDelay mean_SecurityDelay mean_LateAircraftDelay
## 1 2.676069 12.33165 0.04338477 26.01961
## mean_DivArrDelay
## 1 178.8784

rename_at() is being used to append recognizable labels to the calculated values

114 / 139

top_n

This will give you the top or bottom n values (here n has been set to 7)

cmhflights2017 %>%
group_by(Reporting_Airline) %>%
filter(Dest == "CMH") %>%
summarise(average.delay = mean(ArrDelay,
na.rm = TRUE)) %>%
arrange(-average.delay) %>%
top_n(7)
## # A tibble: 7 x 2
## Reporting_Airline average.delay
## <chr> <dbl>
## 1 F9 7.54
## 2 WN 5.20
## 3 AA 4.92
## 4 OO 4.66
## 5 EV 0.966
## 6 DL -1.01
## 7 UA -2.27
cmhflights2017 %>%
group_by(Reporting_Airline) %>%
filter(Dest == "CMH") %>%
summarise(average.delay = mean(ArrDelay,
na.rm = TRUE)) %>%
arrange(average.delay) %>%
top_n(-7)
## # A tibble: 7 x 2
## Reporting_Airline average.delay
## <chr> <dbl>
## 1 UA -2.27
## 2 DL -1.01
## 3 EV 0.966
## 4 OO 4.66
## 5 AA 4.92
## 6 WN 5.20
## 7 F9 7.54
115 / 139

If you need to separate values accidentally stored in a single column into rows, use separate_rows

library(tibble)
tibble(
x = 1:3,
y = c("a", "d,e,f", "g,h"),
z = c("1", "2,3,4", "5,6")
) -> df
df
## # A tibble: 3 x 3
## x y z
## <int> <chr> <chr>
## 1 1 a 1
## 2 2 d,e,f 2,3,4
## 3 3 g,h 5,6
separate_rows(df, y, z, convert = TRUE)
## # A tibble: 6 x 3
## x y z
## <int> <chr> <int>
## 1 1 a 1
## 2 2 d 2
## 3 2 e 3
## 4 2 f 4
## 5 3 g 5
## 6 3 h 6
116 / 139

dplyr's join

117 / 139

dplyr's join alternative to merge

dplyr has it own commands that execute what merge() did for us in the previous module. Let us see how this alternative route works but first we recreate the data we used with merge.

data.frame(
Score = c(10, 21, 33, 12, 35, 67, 43, 99),
ID = c("A12", "A23", "Z14", "WX1", "Y31", "D66", "C31", "Q22")
) -> data1
data.frame(
Sex = c("Male", "Female", "Male", "Female", "Male", "Female", "Male", "Female"),
ID = c("A12", "A23", "Z14", "WX1", "Y31", "D66", "E52", "H71")
) -> data2
118 / 139

inner_join

This merges based on columns that appear in both data-sets, and will only merge rows that have the same value per column. In brief, only those observations common to both data-sets are retained.

inner_join(data1, data2, by = "ID")
## Score ID Sex
## 1 10 A12 Male
## 2 21 A23 Female
## 3 33 Z14 Male
## 4 12 WX1 Female
## 5 35 Y31 Male
## 6 67 D66 Female

If you do not specify the by = column dplyr defaults to all columns with similar names in each data-set ...

inner_join(data1, data2)
## Score ID Sex
## 1 10 A12 Male
## 2 21 A23 Female
## 3 33 Z14 Male
## 4 12 WX1 Female
## 5 35 Y31 Male
## 6 67 D66 Female
119 / 139

left_join

This merge will result in all rows being retained from the first data-set listed in the command, but will include only those rows from the second data-set that have the same value(s) for the by variable(s) ...

data1 %>%
left_join(data2, by = "ID")
## Score ID Sex
## 1 10 A12 Male
## 2 21 A23 Female
## 3 33 Z14 Male
## 4 12 WX1 Female
## 5 35 Y31 Male
## 6 67 D66 Female
## 7 43 C31 <NA>
## 8 99 Q22 <NA>
data2 %>%
left_join(data1, by = "ID")
## Sex ID Score
## 1 Male A12 10
## 2 Female A23 21
## 3 Male Z14 33
## 4 Female WX1 12
## 5 Male Y31 35
## 6 Female D66 67
## 7 Male E52 NA
## 8 Female H71 NA
120 / 139

right_join

This merge will result in all rows being retained from the second data-set listed in the command, but will include only those rows from the first data-set that have the same value(s) for the by variable(s) ...

data1 %>%
right_join(data2, by = "ID")
## Score ID Sex
## 1 10 A12 Male
## 2 21 A23 Female
## 3 33 Z14 Male
## 4 12 WX1 Female
## 5 35 Y31 Male
## 6 67 D66 Female
## 7 NA E52 Male
## 8 NA H71 Female
data2 %>%
right_join(data1, by = "ID")
## Sex ID Score
## 1 Male A12 10
## 2 Female A23 21
## 3 Male Z14 33
## 4 Female WX1 12
## 5 Male Y31 35
## 6 Female D66 67
## 7 <NA> C31 43
## 8 <NA> Q22 99
121 / 139

full_join

This merge keeps all rows from each data-set, with those only found in onde data-set being given NA values as necessary.

data1 %>%
full_join(data2, by = "ID")
## Score ID Sex
## 1 10 A12 Male
## 2 21 A23 Female
## 3 33 Z14 Male
## 4 12 WX1 Female
## 5 35 Y31 Male
## 6 67 D66 Female
## 7 43 C31 <NA>
## 8 99 Q22 <NA>
## 9 NA E52 Male
## 10 NA H71 Female
122 / 139

semi_join

What if you only want observations that match in both data-sets and only those columns found in one data-set but not in the other? Use a semi-join. In the examples below, you will notice that the column Sex is not included in the first result since it is only found in data2 but is there in the second example since data2 is guiding the semi-Join().

data.frame(
Score = c(10, 21, 33, 12, 35, 67, 43, 99),
ID = c("A12", "A23", "Z14", "WX1", "Y31", "D66", "C31", "Q22")
) -> data1
data.frame(
Score = c(10, 21, 33, 12, 35, 67, NA, NA),
Sex = c("Male", "Female", "Male", "Female", "Male", "Female", "Male", "Female"),
ID = c("A12", "A23", "Z14", "WX1", "Y31", "D66", "E52", "H71")
) -> data2
semi_join(data1, data2)
## Score ID
## 1 10 A12
## 2 21 A23
## 3 33 Z14
## 4 12 WX1
## 5 35 Y31
## 6 67 D66
semi_join(data2, data1, by = "ID")
## Score Sex ID
## 1 10 Male A12
## 2 21 Female A23
## 3 33 Male Z14
## 4 12 Female WX1
## 5 35 Male Y31
## 6 67 Female D66
123 / 139

anti_join

This will only retain records not found in the second data-set listed.

anti_join(data1, data2)
## Score ID
## 1 43 C31
## 2 99 Q22
anti_join(data2, data1)
## Score Sex ID
## 1 NA Male E52
## 2 NA Female H71
124 / 139

vectorized functions in dplyr

125 / 139

Vectorized functions within mutate/transmute

You can create new columns by executing vectorized functions that vectors of some length as the input and return vectors of the same length as the output. These functions are shown below.

tibble::tibble(
id = c(
rep("A", 3),
rep("B", 4),
rep("C", 1),
rep("D", 4)
),
x = c(seq(1, 10, by = 1), -1, -2),
y = c(seq(11, 20, by = 1), -11, -12)
) -> a.df
a.df
## # A tibble: 12 x 3
## id x y
## <chr> <dbl> <dbl>
## 1 A 1 11
## 2 A 2 12
## 3 A 3 13
## 4 B 4 14
## 5 B 5 15
## 6 B 6 16
## 7 B 7 17
## 8 C 8 18
## 9 D 9 19
## 10 D 10 20
## 11 D -1 -11
## 12 D -2 -12
126 / 139

lag() versus lead()

To lag some value implies to assign to the current observation (i.e., row) the preceding value. Similarly, to lead implies to assign to the current observation (i.e., row) the value from a subsequent row. If I lead by 1, then I am assiging the value from the next row, lead by 2 then I am assigning the value from 2 rows ahead, and so on. Say I want to lag the values of x by 1, first regardless of id but then within each id.

a.df %>%
mutate(x.lag = lag(x, 1))
## # A tibble: 12 x 4
## id x y x.lag
## <chr> <dbl> <dbl> <dbl>
## 1 A 1 11 NA
## 2 A 2 12 1
## 3 A 3 13 2
## 4 B 4 14 3
## 5 B 5 15 4
## 6 B 6 16 5
## 7 B 7 17 6
## 8 C 8 18 7
## 9 D 9 19 8
## 10 D 10 20 9
## 11 D -1 -11 10
## 12 D -2 -12 -1
a.df %>%
group_by(id) %>%
mutate(x.lag = lag(x, 1))
## # A tibble: 12 x 4
## # Groups: id [4]
## id x y x.lag
## <chr> <dbl> <dbl> <dbl>
## 1 A 1 11 NA
## 2 A 2 12 1
## 3 A 3 13 2
## 4 B 4 14 NA
## 5 B 5 15 4
## 6 B 6 16 5
## 7 B 7 17 6
## 8 C 8 18 NA
## 9 D 9 19 NA
## 10 D 10 20 9
## 11 D -1 -11 10
## 12 D -2 -12 -1
127 / 139

Similarly, I can also lead by a specific "period"

a.df %>%
mutate(x.lead = lead(x, 2))
## # A tibble: 12 x 4
## id x y x.lead
## <chr> <dbl> <dbl> <dbl>
## 1 A 1 11 3
## 2 A 2 12 4
## 3 A 3 13 5
## 4 B 4 14 6
## 5 B 5 15 7
## 6 B 6 16 8
## 7 B 7 17 9
## 8 C 8 18 10
## 9 D 9 19 -1
## 10 D 10 20 -2
## 11 D -1 -11 NA
## 12 D -2 -12 NA
a.df %>%
group_by(id) %>%
mutate(y.lead = lead(y, 1))
## # A tibble: 12 x 4
## # Groups: id [4]
## id x y y.lead
## <chr> <dbl> <dbl> <dbl>
## 1 A 1 11 12
## 2 A 2 12 13
## 3 A 3 13 NA
## 4 B 4 14 15
## 5 B 5 15 16
## 6 B 6 16 17
## 7 B 7 17 NA
## 8 C 8 18 NA
## 9 D 9 19 20
## 10 D 10 20 -11
## 11 D -1 -11 -12
## 12 D -2 -12 NA
128 / 139

cumulative aggregates

We can also calculate various aggregates by cumulating over/across columns.

c in any row is the smallest value observed thus far

a.df %>%
mutate(c = cummin(x))
## # A tibble: 12 x 4
## id x y c
## <chr> <dbl> <dbl> <dbl>
## 1 A 1 11 1
## 2 A 2 12 1
## 3 A 3 13 1
## 4 B 4 14 1
## 5 B 5 15 1
## 6 B 6 16 1
## 7 B 7 17 1
## 8 C 8 18 1
## 9 D 9 19 1
## 10 D 10 20 1
## 11 D -1 -11 -1
## 12 D -2 -12 -2

c in any row is the largest value observed thus far

a.df %>%
mutate(c = cummax(x))
## # A tibble: 12 x 4
## id x y c
## <chr> <dbl> <dbl> <dbl>
## 1 A 1 11 1
## 2 A 2 12 2
## 3 A 3 13 3
## 4 B 4 14 4
## 5 B 5 15 5
## 6 B 6 16 6
## 7 B 7 17 7
## 8 C 8 18 8
## 9 D 9 19 9
## 10 D 10 20 10
## 11 D -1 -11 10
## 12 D -2 -12 10
129 / 139

c is the average of the current + preceding value(s)

a.df %>%
mutate(c = cummean(x))
## # A tibble: 12 x 4
## id x y c
## <chr> <dbl> <dbl> <dbl>
## 1 A 1 11 1
## 2 A 2 12 1.5
## 3 A 3 13 2
## 4 B 4 14 2.5
## 5 B 5 15 3
## 6 B 6 16 3.5
## 7 B 7 17 4
## 8 C 8 18 4.5
## 9 D 9 19 5
## 10 D 10 20 5.5
## 11 D -1 -11 4.91
## 12 D -2 -12 4.33

c is the cumulative sum of the current + preceding value(s)

a.df %>%
mutate(c = cumsum(x))
## # A tibble: 12 x 4
## id x y c
## <chr> <dbl> <dbl> <dbl>
## 1 A 1 11 1
## 2 A 2 12 3
## 3 A 3 13 6
## 4 B 4 14 10
## 5 B 5 15 15
## 6 B 6 16 21
## 7 B 7 17 28
## 8 C 8 18 36
## 9 D 9 19 45
## 10 D 10 20 55
## 11 D -1 -11 54
## 12 D -2 -12 52
130 / 139

c is the cumulative product of the current value of x and the preceding value of c

a.df %>%
mutate(c = cumprod(x))
## # A tibble: 12 x 4
## id x y c
## <chr> <dbl> <dbl> <dbl>
## 1 A 1 11 1
## 2 A 2 12 2
## 3 A 3 13 6
## 4 B 4 14 24
## 5 B 5 15 120
## 6 B 6 16 720
## 7 B 7 17 5040
## 8 C 8 18 40320
## 9 D 9 19 362880
## 10 D 10 20 3628800
## 11 D -1 -11 -3628800
## 12 D -2 -12 7257600
131 / 139

Say we have information on bank balance we hold by date. If the balance is negative, then we have leaned on an overdraft. We can use cumany() and cumall() to extract rows once a condition is met.

df <- data.frame(
date = as.Date("2020-01-01") + 0:6,
balance = c(100, 50, 25, -25, -50, 30, 120)
)
df
## date balance
## 1 2020-01-01 100
## 2 2020-01-02 50
## 3 2020-01-03 25
## 4 2020-01-04 -25
## 5 2020-01-05 -50
## 6 2020-01-06 30
## 7 2020-01-07 120
Show me all rows starting with when the first overdraft occurs
df %>%
filter(
cumany(balance < 0)
)
## date balance
## 1 2020-01-04 -25
## 2 2020-01-05 -50
## 3 2020-01-06 30
## 4 2020-01-07 120
show me all rows preceding the first overdraft
df %>%
filter(
cumall(!(balance < 0))
)
## date balance
## 1 2020-01-01 100
## 2 2020-01-02 50
## 3 2020-01-03 25
132 / 139

ranking functions

133 / 139

cume_dist()

This function gives a cumulative proportion of all values of x less than or equal to the current value.

a.df %>%
mutate(d = cume_dist(x))
## # A tibble: 12 x 4
## id x y d
## <chr> <dbl> <dbl> <dbl>
## 1 A 1 11 0.25
## 2 A 2 12 0.333
## 3 A 3 13 0.417
## 4 B 4 14 0.5
## 5 B 5 15 0.583
## 6 B 6 16 0.667
## 7 B 7 17 0.75
## 8 C 8 18 0.833
## 9 D 9 19 0.917
## 10 D 10 20 1
## 11 D -1 -11 0.167
## 12 D -2 -12 0.0833

Note that -2 is the smallest value so it is ranked first, and the proportion becomes d = (1/12) = 0.0833.

-1 is bigger than -2, hence ranked second. Now d = (2/12) = 0.167 because two observations have a value of x equal to -1 or less.

And so on ...

134 / 139

row_number()

Say we have multiple observations for some group and want to flag whether a specific observation is the first, second, third, etc. within the group.

a.df %>%
group_by(id) %>%
mutate(flag_row = row_number())
## # A tibble: 12 x 4
## # Groups: id [4]
## id x y flag_row
## <chr> <dbl> <dbl> <int>
## 1 A 1 11 1
## 2 A 2 12 2
## 3 A 3 13 3
## 4 B 4 14 1
## 5 B 5 15 2
## 6 B 6 16 3
## 7 B 7 17 4
## 8 C 8 18 1
## 9 D 9 19 1
## 10 D 10 20 2
## 11 D -1 -11 3
## 12 D -2 -12 4
135 / 139

If you want to flag in ascending or descending order of some variable, than you can run the following and when you do, pay attention to what happens with D since the smallest value of x is flagged as the first observation and NOT the order in which x appears in the data-set.

a.df %>%
group_by(id) %>%
mutate(flag_row = row_number(x))
## # A tibble: 12 x 4
## # Groups: id [4]
## id x y flag_row
## <chr> <dbl> <dbl> <int>
## 1 A 1 11 1
## 2 A 2 12 2
## 3 A 3 13 3
## 4 B 4 14 1
## 5 B 5 15 2
## 6 B 6 16 3
## 7 B 7 17 4
## 8 C 8 18 1
## 9 D 9 19 3
## 10 D 10 20 4
## 11 D -1 -11 2
## 12 D -2 -12 1
136 / 139

min_rank() and percent_rank()

This allows you to flag the minimum rank for an observation as a rank or as a proportion

a.df %>%
mutate(min.rank = min_rank(x))
## # A tibble: 12 x 4
## id x y min.rank
## <chr> <dbl> <dbl> <int>
## 1 A 1 11 3
## 2 A 2 12 4
## 3 A 3 13 5
## 4 B 4 14 6
## 5 B 5 15 7
## 6 B 6 16 8
## 7 B 7 17 9
## 8 C 8 18 10
## 9 D 9 19 11
## 10 D 10 20 12
## 11 D -1 -11 2
## 12 D -2 -12 1
a.df %>%
group_by(id) %>%
mutate(min.rank = min_rank(x))
## # A tibble: 12 x 4
## # Groups: id [4]
## id x y min.rank
## <chr> <dbl> <dbl> <int>
## 1 A 1 11 1
## 2 A 2 12 2
## 3 A 3 13 3
## 4 B 4 14 1
## 5 B 5 15 2
## 6 B 6 16 3
## 7 B 7 17 4
## 8 C 8 18 1
## 9 D 9 19 3
## 10 D 10 20 4
## 11 D -1 -11 2
## 12 D -2 -12 1
137 / 139
a.df %>%
mutate(pct.rank = percent_rank(x))
## # A tibble: 12 x 4
## id x y pct.rank
## <chr> <dbl> <dbl> <dbl>
## 1 A 1 11 0.182
## 2 A 2 12 0.273
## 3 A 3 13 0.364
## 4 B 4 14 0.455
## 5 B 5 15 0.545
## 6 B 6 16 0.636
## 7 B 7 17 0.727
## 8 C 8 18 0.818
## 9 D 9 19 0.909
## 10 D 10 20 1
## 11 D -1 -11 0.0909
## 12 D -2 -12 0
a.df %>%
group_by(id) %>%
mutate(pct.rank = percent_rank(x))
## # A tibble: 12 x 4
## # Groups: id [4]
## id x y pct.rank
## <chr> <dbl> <dbl> <dbl>
## 1 A 1 11 0
## 2 A 2 12 0.5
## 3 A 3 13 1
## 4 B 4 14 0
## 5 B 5 15 0.333
## 6 B 6 16 0.667
## 7 B 7 17 1
## 8 C 8 18 NaN
## 9 D 9 19 0.667
## 10 D 10 20 1
## 11 D -1 -11 0.333
## 12 D -2 -12 0
138 / 139
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow