class: title-slide, center, middle background-image: url(images/ouaerial.jpeg) background-size: cover # .heat[.fancy[Working with dates and times]] ## .heat[.fancy[Ani Ruhil]] --- name: agenda ## .fancy[ .heat[ Agenda ]] Understanding how to work with dates and times using - `base R` - `lubridate` --- class: inverse, middle, center # <br> .fat[.fancy[base R]] --- Start by creating some dates ```r as.Date("1970-1-1") -> basedate Sys.Date() -> today as.Date("2017-12-18") -> tomorrow "12/16/2017" -> yesterday1 "12-16-17" -> yesterday2 "12 16 17" -> yesterday3 ``` How did R read these? ```r str(list(basedate, today, tomorrow, yesterday1, yesterday2, yesterday3)) ``` ``` ## List of 6 ## $ : Date[1:1], format: "1970-01-01" ## $ : Date[1:1], format: "2020-01-01" ## $ : Date[1:1], format: "2017-12-18" ## $ : chr "12/16/2017" ## $ : chr "12-16-17" ## $ : chr "12 16 17" ``` Some are read-in as `dates` but some are read-in as characters `(chr)` --- We can flip the characters into dates so long as we specify the format that characterizes the month, day, and year in each `chr()` ```r as.Date(yesterday1, format = "%m/%d/%Y") -> yesterday1d as.Date(yesterday2, format = "%m-%d-%y") -> yesterday2d as.Date(yesterday3, format = "%m %d %y") -> yesterday3d yesterday1d; yesterday2d; yesterday3d ``` ``` ## [1] "2017-12-16" ``` ``` ## [1] "2017-12-16" ``` ``` ## [1] "2017-12-16" ``` --- There are special `code` switches that indicate date components when you `format = ""` <table> <thead> <tr> <th style="text-align:center;"> Code </th> <th style="text-align:left;"> Code represents ... </th> <th style="text-align:center;"> For example ... </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> %a </td> <td style="text-align:left;"> Day spelled out (abbreviated) </td> <td style="text-align:center;"> Mon </td> </tr> <tr> <td style="text-align:center;"> %A </td> <td style="text-align:left;"> Day spelled out </td> <td style="text-align:center;"> Monday </td> </tr> <tr> <td style="text-align:center;"> %d </td> <td style="text-align:left;"> Day of the month </td> <td style="text-align:center;"> 16 </td> </tr> <tr> <td style="text-align:center;"> %m </td> <td style="text-align:left;"> Month </td> <td style="text-align:center;"> 12 </td> </tr> <tr> <td style="text-align:center;"> %b </td> <td style="text-align:left;"> Month (abbreviated) </td> <td style="text-align:center;"> Dec </td> </tr> <tr> <td style="text-align:center;"> %B </td> <td style="text-align:left;"> Month (fully spelled out) </td> <td style="text-align:center;"> December </td> </tr> <tr> <td style="text-align:center;"> %y </td> <td style="text-align:left;"> Year (2-digits) </td> <td style="text-align:center;"> 17 </td> </tr> <tr> <td style="text-align:center;"> %Y </td> <td style="text-align:left;"> Year (4-digits) </td> <td style="text-align:center;"> 2017 </td> </tr> </tbody> </table> --- So if you ever run into a date field that needs formatting, what we have covered thus far should help you convert it into a proper date formatted variable. Of course, there will always be special cases that need more work. Once you convert them to the date format, you can extract other quantities. ```r weekdays(yesterday1d) ``` ``` ## [1] "Saturday" ``` ```r months(today) ``` ``` ## [1] "January" ``` ```r quarters(today) ``` ``` ## [1] "Q1" ``` --- Sometimes you will have a number representing the date. This number represents the number of days since/before some date of origin. If the number you see is positive, then you are seeing the number of days since the data of origin. If the number is negativem, then you are seeing the number of days before the data of origin. R uses `1970-01-01` but other software packages may (and some do) use different days. Thus, for example, if the dates are listed as ```r c(17651, -2345, 19760) -> x ``` Then these numbers represent the following dates: ```r as.Date(x, origin = "1970-01-01") -> x.dates x.dates ``` ``` ## [1] "2018-04-30" "1963-08-01" "2024-02-07" ``` You can also convert dates into the numbers representing dates. ```r julian(x.dates, origin = as.POSIXct("1970-01-01", tz = "UCT") ) -> x.julian x.julian ``` ``` ## [1] 17651 -2345 19760 ## attr(,"origin") ## [1] "1970-01-01 UTC" ``` [Julian?](http://aa.usno.navy.mil/data/docs/JulianDate.php) --- What if I wanted to create a sequence of dates, starting with Feb 8, 2018 but increasing by 1 day at a time or by every 5th day? ```r as.Date(as.character("2018-02-08"), format = "%Y-%m-%d") -> start.date seq(start.date, by = 1, length.out = 7) -> date.seq1 date.seq1 ``` ``` ## [1] "2018-02-08" "2018-02-09" "2018-02-10" "2018-02-11" "2018-02-12" "2018-02-13" ## [7] "2018-02-14" ``` ```r seq(start.date, by = 5, length.out = 3) -> date.seq5 date.seq5 ``` ``` ## [1] "2018-02-08" "2018-02-13" "2018-02-18" ``` Note that `length.out =` is specifying how many you dates want to create. --- Now, what if I want to know the date 30 days from today? 19 days ago? ```r today + 30 -> date.30 today - 19 -> date.19 ``` How many days lapsed between two dates? ```r as.Date("2017-04-28") -> date1 Sys.Date() -> date2 date2 - date1 -> lapsed.time; lapsed.time ``` ``` ## Time difference of 978 days ``` Say I want to create a vector of dates that starts and ends on specific dates, and the step function is 1 day, a week, 4 months, etc. The step is indicated with the `by = ""` command ```r seq(from = as.Date("2017-12-17"), to = as.Date("2018-12-16"), by = "day") -> my.dates1 seq(from = as.Date("2017-12-17"), to = as.Date("2018-12-16"), by = "week") -> my.dates2 seq(from = as.Date("2017-12-17"), to = as.Date("2018-12-16"), by = "month") -> my.dates3 seq(from = as.Date("2017-12-17"), to = as.Date("2018-12-16"), by = "3 days") -> my.dates4 seq(from = as.Date("2017-12-17"), to = as.Date("2018-12-16"), by = "2 weeks") -> my.dates5 seq(from = as.Date("2017-12-17"), to = as.Date("2018-12-16"), by = "4 months") -> my.dates6 seq(from = as.Date("2017-12-17"), to = as.Date("2019-12-16"), by = "year") -> my.dates7 seq(from = as.Date("2017-12-17"), to = as.Date("2022-12-16"), by = "2 years") -> my.dates8 ``` --- class: inverse, middle, center # <br> .fat[.fancy[lubridate]] <center><img src = "images/hex-lubridate.png"></center> --- This package makes working with dates and times quite easy because it does what base R does but in more intuitive ways and perhaps more flexibly. Let us start with some date fields. ```r today1 = "20171217" today2 = "2017-12-17" today3 = "2017 December 17" today4 = "20171217143241" today5 = "2017 December 17 14:32:41" today6 = "December 17 2017 14:32:41" today7 = "17-Dec, 2017 14:32:41" ``` The formats are quite varied but `lubridate` deals with them quite seamlessly so long as you pay attention to the order -- is year first or last? What about month? day? Is time given in hours, minutes and seconds?. --- .pull-left[ ```r library(lubridate) ymd(today1) ``` ``` ## [1] "2017-12-17" ``` ```r ymd(today2) ``` ``` ## [1] "2017-12-17" ``` ```r ymd(today3) ``` ``` ## [1] "2017-12-17" ``` ] .pull-right[ ```r ymd_hms(today4) ``` ``` ## [1] "2017-12-17 14:32:41 UTC" ``` ```r ymd_hms(today5) ``` ``` ## [1] "2017-12-17 14:32:41 UTC" ``` ```r mdy_hms(today6) ``` ``` ## [1] "2017-12-17 14:32:41 UTC" ``` ```r dmy_hms(today7) ``` ``` ## [1] "2017-12-17 14:32:41 UTC" ``` ] --- If I need to extract data/time elements, I can do that as well. ```r dmy_hms(today7) -> today; today # a date and time set in today7 ``` ``` ## [1] "2017-12-17 14:32:41 UTC" ``` ```r year(today) -> today.y; today.y # The year ``` ``` ## [1] 2017 ``` ```r month(today) -> today.m1; today.m1 # the month, as a number ``` ``` ## [1] 12 ``` ```r month(today, label = TRUE, abbr = TRUE) -> today.m2; today.m2 # labeling the month but with an abbreviation ``` ``` ## [1] Dec ## Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < Oct < Nov < Dec ``` ```r month(today, label = TRUE, abbr = FALSE) -> today.m3; today.m3# fully labelling the month ``` ``` ## [1] December ## 12 Levels: January < February < March < April < May < June < July < ... < December ``` --- .pull-left[ ```r week(today) -> today.w today.w # what week of the year is it? ``` ``` ## [1] 51 ``` ```r yday(today) -> today.doy today.doy # what day of the year is it? ``` ``` ## [1] 351 ``` ```r mday(today) -> today.dom today.dom # what day of the month is it? ``` ``` ## [1] 17 ``` ] .pull-right[ ```r wday(today) -> today.dow1 today.dow1 # what day of the week is it, as a number? ``` ``` ## [1] 1 ``` ```r wday(today, label = TRUE, abbr = TRUE) -> today.dow2 today.dow2 # day of the week, abbreviated label ``` ``` ## [1] Sun ## Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat ``` ```r wday(today, label = TRUE, abbr = FALSE) -> today.dow3 today.dow3 # day of the week fully labelled ``` ``` ## [1] Sunday ## Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < Friday < Saturday ``` ] --- ```r hour(today) -> today.h today.h # what hour is it? ``` ``` ## [1] 14 ``` ```r minute(today) -> today.m today.m # what minute is it? ``` ``` ## [1] 32 ``` ```r second(today) -> today.s today.s # what second is it? ``` ``` ## [1] 41 ``` ```r tz(today) -> today.tz today.tz # what time zone is it? ``` ``` ## [1] "UTC" ``` --- class: inverse, middle, center # .heat[.fancy[ Measuring intervals, durations, and periods of time ]] --- Calculating time lapsed between dates is tricky because you have to take into account daylight savings time<sup>1</sup>, leap years<sup>2</sup>, and leap seconds<sup>3</sup>. These render the length of months, weeks, days, hours, and minutes to be `relative units of time` but seconds tends to be `exact units of time`. As a result, and by design, `lubridate` differentiates between `intervals`, `durations`, and `periods`, each measuring time spans in different ways. .footnote[ [1] `Daylight Saving Time (DST)` is the practice of setting the clocks forward 1 hour from standard time during the summer months, and back again in the fall, in order to make better use of natural daylight. [Source](https://www.timeanddate.com/time/dst/) [2] `Leap years` are needed to keep our modern day Gregorian calendar in alignment with the Earth's revolutions around the sun. It takes the Earth approximately 365.242189 days – or 365 days, 5 hours, 48 minutes, and 45 seconds – to circle once around the Sun. This is called a tropical year, and is measured from the March equinox. However, the Gregorian calendar has only 365 days in a year, so if we didn't add a leap day on February 29 nearly every four years, we would lose almost six hours off our calendar every year. After only 100 years, our calendar would be off by around 24 days! [Source](https://www.timeanddate.com/date/leapyear.html) [3] Two components are used to determine `UTC (Coordinated Universal Time)`: (a) International Atomic Time (TAI): A time scale that combines the output of some 200 highly precise atomic clocks worldwide, and which provides the exact speed for our clocks to tick, and (b) Universal Time (UT1), also known as Astronomical Time, which refers to the Earth's rotation around its own axis, which determines the length of a day. Before the difference between UTC and UT1 reaches 0.9 seconds, a leap second is added to UTC and to clocks worldwide. By adding an additional second to the time count, our clocks are effectively stopped for that second to give Earth the opportunity to catch up. [Source](https://www.timeanddate.com/time/leapseconds.html). ] --- Let us start with a `duration`, the simplest measure of lapsed time since it measures the passing of time in seconds. Say I pick two dates and times ... `05:00am on March 9, 2019` and `05:00am on March 10, 2019`. ```r ymd_hms("2019-03-09 05:00:00", tz = "US/Eastern") -> date1 ymd_hms("2019-03-10 05:00:00", tz = "US/Eastern") -> date2 interval(date1, date2) -> timint timint ``` ``` ## [1] 2019-03-09 05:00:00 EST--2019-03-10 05:00:00 EDT ``` How much time has lapsed between date1 and date2? This we calculate with `as.duration()` ```r as.duration(timint) -> timelapsed.d timelapsed.d ``` ``` ## [1] "82800s (~23 hours)" ``` Why is it 23 hours and not 24 hours? Because daylight savings kicked in at 2:00 AM on March 10, and since the clocks kicked forward by one hour, only 23 hours have passed and not 24 as we might naively expect. Go back and look at the output from `timeint`; see the EST versus EDT? In contrast to duration, if I ask for the `period of time`, what do I get? ```r as.period(timint) -> timelapsed.p timelapsed.p ``` ``` ## [1] "1d 0H 0M 0S" ``` Aha! `as.period()` is imprecise because it tells me 1 day has passed. --- Now, we can use the duration of time lapsed in whatever units we want, but if we want accuracy, we better work with durations, as shown below. .pull-left[ ```r time_length(timelapsed.d, unit = "second") ``` ``` ## [1] 82800 ``` ```r time_length(timelapsed.d, unit = "minute") ``` ``` ## [1] 1380 ``` ```r time_length(timelapsed.d, unit = "hour") ``` ``` ## [1] 23 ``` ] .pull-right[ ```r time_length(timelapsed.d, unit = "day") ``` ``` ## [1] 0.9583333 ``` ```r time_length(timelapsed.d, unit = "week") ``` ``` ## [1] 0.1369048 ``` ```r time_length(timelapsed.d, unit = "month") ``` ``` ## [1] 0.03150685 ``` ```r time_length(timelapsed.d, unit = "year") ``` ``` ## [1] 0.002625571 ``` ] `duration()` is always estimated in seconds but you can change the reporting unit --- If, unfortunately, we rely on periods, well then we have inherited a problem! .pull-left[ ```r time_length(timelapsed.p, unit = "second") ``` ``` ## [1] 86400 ``` ```r time_length(timelapsed.p, unit = "minute") ``` ``` ## [1] 1440 ``` ```r time_length(timelapsed.p, unit = "hour") ``` ``` ## [1] 24 ``` ] .pull-right[ ```r time_length(timelapsed.p, unit = "day") ``` ``` ## [1] 1 ``` ```r time_length(timelapsed.p, unit = "week") ``` ``` ## [1] 0.1428571 ``` ```r time_length(timelapsed.p, unit = "month") ``` ``` ## [1] 0.03287671 ``` ```r time_length(timelapsed.p, unit = "year") ``` ``` ## [1] 0.002739726 ``` ] `only` the `seconds` are correct and nothing else --- class: inverse, middle, center # <br> .fat[.fancy[Calculating & working with time in the flights data]] --- Let us load the CMH flights data we used earlier to learn dplyr. ```r load("data/cmhflights2017.RData") library(janitor) cmhflights2017 %>% clean_names() -> cmh.df ``` Say we had to construct a date field from `year`, `month`, and `day_of_week`. This could be done with `unite()`, as shown below. ```r library(tidyverse) cmh.df %>% unite("fltdate", c("year", "month", "day_of_week"), sep = "-", remove = FALSE) -> cmh.df cmh.df %>% select(fltdate) %>% glimpse() ``` ``` ## Observations: 47,787 ## Variables: 1 ## $ fltdate <chr> "2017-1-1", "2017-1-1", "2017-1-2", "2017-1-2", "2017-1-2", "2017-1-2",… ``` --- What if I wanted to see which month, and then which day of the week has the most number of flights? ```r cmh.df %>% mutate( monthnamed = month(fltdate, label = TRUE, abbr = TRUE), weekday = wday(fltdate, label = TRUE, abbr = TRUE) ) -> cmh.df table(cmh.df$monthnamed) ``` ``` ## ## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec ## 3757 3413 4101 4123 4098 4138 4295 4279 3789 4027 3951 3816 ``` ```r table(cmh.df$weekday) ``` ``` ## ## Sun Mon Tue Wed Thu Fri Sat ## 6920 6747 6805 6795 6756 6808 6956 ``` --- ### Visualizing the same thing ... .pull-left[ ```r library(hrbrthemes) library(viridis) cmh.df %>% group_by(monthnamed) %>% summarise(nflights = n()) %>% ggplot(aes(x = monthnamed, y = nflights, fill = nflights)) + geom_bar(stat = "identity") + labs(x = "Flight Month", y = "Number of flights", fill = "") + theme_ipsum_rc() + scale_fill_viridis(option = "viridis", direction = -1) + theme(legend.position = "bottom") ``` ] .pull-right[ <img src="Module05_files/figure-html/coder1-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ### What about days of the week instead of months? .pull-left[ ```r cmh.df %>% group_by(weekday) %>% summarise(nflights = n()) %>% ggplot(aes(x = weekday, y = nflights, fill = nflights)) + geom_bar(stat = "identity") + labs(x = "Flight Day of the Week", y = "Number of flights", fill = "") + theme_ipsum_rc() + scale_fill_viridis(option = "viridis", direction = -1) + theme(legend.position = "bottom") ``` ] .pull-right[ <img src="Module05_files/figure-html/coder2-1.png" width="100%" style="display: block; margin: auto;" /> ] --- Hmm, I'd like the week to start on Monday so I can better show the obvious -- most flights occur over the weekend. .pull-left[ ```r cmh.df %>% mutate(weekday.f = ordered(weekday, levels = c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun")) ) %>% group_by(weekday.f) %>% summarise(nflights = n()) %>% ggplot(aes(x = weekday.f, y = nflights, fill = nflights)) + geom_bar(stat = "identity") + labs(x = "Flight Day of the Week", y = "Number of flights", fill = "") + theme_ipsum_rc() + scale_fill_viridis(option = "viridis", direction = -1) + theme(legend.position = "bottom") ``` ] .pull-right[ <img src="Module05_files/figure-html/Module05_forClass-11r-1.png" width="100%" style="display: block; margin: auto;" /> ] --- What time of the day do most/least flights depart? Let us start by creating a date-time field. You will see the code below first splits `dep_time` into hours and minutes, then combines them with a `:` separating the two elements, then unites date with `hourmin` before flipping it into `deptime` stored in `ymd_hm()` format. ```r cmh.df %>% separate(crs_dep_time, c("hour", "minute"), sep = 2, remove = FALSE) %>% unite(hourmin, c("hour", "minute"), sep = ":", remove = FALSE) %>% unite(depdatetime, c("fltdate", "hourmin"), sep = " ", remove = FALSE) %>% mutate(deptime = ymd_hm(depdatetime)) -> cmh.df cmh.df %>% select(deptime) %>% glimpse() ``` ``` ## Observations: 47,787 ## Variables: 1 ## $ deptime <dttm> 2017-01-01 08:00:00, 2017-01-01 07:00:00, 2017-01-02 17:50:00, 2017-01… ``` --- Now I want the number of flights by the minute of the hour when the flight departed. . .pull-left[ ```r cmh.df %>% mutate(fltminute = minute(deptime)) %>% group_by(fltminute) %>% summarise(nflights = n()) %>% ggplot(aes(x = fltminute, y = nflights, fill = nflights)) + geom_bar(stat = "identity") + labs(x = "Flight Minute", y = "Number of flights", fill = "") + theme_ipsum_rc() + scale_fill_viridis(option = "viridis", direction = -1) + theme(legend.position = "bottom") ``` ] .pull-right[ <img src="Module05_files/figure-html/Module05_forClass-13r-1.png" width="100%" style="display: block; margin: auto;" /> ] --- Quite clearly, `most flights depart 25 minutes after the hour`, followed by `55 minutes after the hour`, then `at 30 minutes after the hour`, and then `at the hour`. Fair enough, but does this differ by the day of the week? All days are not the same! .pull-left[ ```r cmh.df %>% mutate(weekday.f = ordered(weekday, levels = c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun")) ) %>% mutate(fltminute = minute(deptime)) %>% group_by(weekday.f, fltminute) %>% summarise(nflights = n()) %>% ggplot(aes(x = fltminute, y = nflights, fill = nflights)) + geom_bar(stat = "identity") + labs(x = "Flight Minute", y = "Number of flights", fill = "") + theme_ipsum_rc() + scale_fill_gradient(low = "white", high = "red") + theme(legend.position = "bottom") + facet_wrap(~ weekday.f, ncol = 2) ``` ] .pull-right[ <img src="Module05_files/figure-html/Module05_forClass-14r-1.png" width="100%" style="display: block; margin: auto;" /> ] --- But maybe what we really want is what departure minute is associated with higher/lower average arrival delays (in minutes). Let us explore that. .pull-left[ ```r cmh.df %>% mutate(fltminute = minute(deptime)) %>% group_by(fltminute) %>% summarise(delay = mean(arr_delay, na.rm = TRUE), nflights = n()) %>% ggplot(aes(x = fltminute, y = delay, color = nflights)) + geom_line() + labs(x = "Flight Minute", y = "Mean Arrival Delay", color = "Number of Flights in the Mean") + theme_ft_rc() + scale_color_gradient(low = "white", high = "red") + theme(legend.position = "bottom") ``` ] .pull-right[ <img src="Module05_files/figure-html/Module05_forClass-15r-1.png" width="100%" style="display: block; margin: auto;" /> ] --- Does this vary by airline? .pull-left[ ```r cmh.df %>% separate(crs_dep_time, c("hour", "minute"), sep = 2, remove = FALSE) %>% unite(hourmin, c("hour", "minute"), sep= ":", remove = FALSE) %>% unite(depdatetime, c("fltdate", "hourmin"), sep = " ", remove = FALSE) %>% mutate(deptime = ymd_hm(depdatetime)) %>% mutate(fltminute = minute(deptime)) %>% filter(dest == "CMH") %>% group_by(fltminute, reporting_airline) %>% summarise(delay = mean(dep_delay, na.rm = TRUE), nflights = n()) %>% filter(nflights >= 30) %>% ggplot(aes(x = fltminute, y = delay, color = nflights)) + geom_line() + labs(x = "Scheduled Flight Minute", y = "Mean Departure Delay", color = "Number of Flights in the Mean") + theme_ft_rc() + scale_color_gradient(low = "white", high = "red") + theme(legend.position = "bottom") + facet_wrap(~ reporting_airline) ``` ] .pull-right[ <img src="Module05_files/figure-html/Module05_forClass-17r-1.png" width="100%" style="display: block; margin: auto;" /> ] --- And then here are arrival delays by airlines flying to Columbus. .pull-left[ ```r cmh.df %>% separate(crs_dep_time, c("hour", "minute"), sep = 2, remove = FALSE) %>% unite(hourmin, c("hour", "minute"), sep= ":", remove = FALSE) %>% unite(depdatetime, c("fltdate", "hourmin"), sep = " ", remove = FALSE) %>% mutate(deptime = ymd_hm(depdatetime)) %>% mutate(fltminute = minute(deptime)) %>% filter(dest == "CMH") %>% group_by(fltminute, reporting_airline) %>% summarise(delay = mean(arr_delay, na.rm = TRUE), nflights = n()) %>% filter(nflights >= 30) %>% ggplot(aes(x = fltminute, y = delay, color = nflights)) + geom_line() + labs(x = "Scheduled Flight Minute", y = "Mean Arrival Delay", color = "Number of Flights in the Mean") + theme_ft_rc() + scale_color_gradient(low = "white", high = "red") + theme(legend.position = "bottom") + facet_wrap(~ reporting_airline) ``` ] .pull-right[ <img src="Module05_files/figure-html/Module05_forClass-18r-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .pull-left[ One final query. Say we are curious about `how often a particular aircraft flies`. How might we calculate this time span? ```r cmh.df %>% filter(!is.na(tail_number)) %>% group_by(tail_number) %>% arrange(deptime) %>% mutate(nflew = row_number()) %>% select(deptime, tail_number, nflew) %>% arrange(tail_number, nflew) -> cmh.df2 ``` ] .pull-right[ The preceding code has basically flagged each aircraft's flight sequence (basically when it was first seen, seen for the second time, and so on). Let us now calculate the time lapsed between each flight to/from CMH of each aircraft. ```r cmh.df2 %>% ungroup() %>% arrange(tail_number) %>% group_by(tail_number) %>% mutate(tspan = interval(lag(deptime), deptime), tspan.minutes = as.duration(tspan)/dminutes(1), tspan.hours = as.duration(tspan)/dhours(1), tspan.days = as.duration(tspan)/ddays(1), tspan.weeks = as.duration(tspan)/dweeks(1) ) -> cmh.df2 ``` ] --- ```r mean(cmh.df2$tspan.minutes, na.rm = TRUE) ``` ``` ## [1] 15834.02 ``` ```r median(cmh.df2$tspan.minutes, na.rm = TRUE) ``` ``` ## [1] 612 ``` ```r mean(cmh.df2$tspan.days, na.rm = TRUE) ``` ``` ## [1] 10.99585 ``` ```r median(cmh.df2$tspan.days, na.rm = TRUE) ``` ``` ## [1] 0.425 ``` --- If you look at the data you realize that the overwhelming majority of aircraft are flying to/from CMH within a median span of 609 minutes (a shade over 10 hours). This might differ by airline, or does it? Sure it does! ```r cmh.df %>% arrange(tail_number, deptime) %>% group_by(tail_number, reporting_airline) %>% select(deptime, tail_number, reporting_airline) %>% mutate(tspan = interval(lag(deptime), deptime), tspan.minutes = as.duration(tspan)/dminutes(1), tspan.days = as.duration(tspan)/ddays(1) ) %>% group_by(reporting_airline) %>% summarise(md.tspan = median(tspan.minutes, na.rm = TRUE)) %>% arrange(md.tspan) -> cmh.ts ``` --- .pull-left[ ```r knitr::kable(cmh.ts, booktabs = TRUE, "html", caption = "Median number of minutes between aircraft's flight in databse", col.names = c("Airline", "Median Minutes"), digits = c(0, 0)) %>% kableExtra::kable_styling("striped", full_width = FALSE) ``` <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <caption>Median number of minutes between aircraft's flight in databse</caption> <thead> <tr> <th style="text-align:left;"> Airline </th> <th style="text-align:right;"> Median Minutes </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> OO </td> <td style="text-align:right;"> 243 </td> </tr> <tr> <td style="text-align:left;"> EV </td> <td style="text-align:right;"> 263 </td> </tr> <tr> <td style="text-align:left;"> WN </td> <td style="text-align:right;"> 585 </td> </tr> <tr> <td style="text-align:left;"> DL </td> <td style="text-align:right;"> 686 </td> </tr> <tr> <td style="text-align:left;"> F9 </td> <td style="text-align:right;"> 735 </td> </tr> <tr> <td style="text-align:left;"> UA </td> <td style="text-align:right;"> 738 </td> </tr> <tr> <td style="text-align:left;"> AA </td> <td style="text-align:right;"> 765 </td> </tr> </tbody> </table> ] .pull-right[ ```r ggplot(cmh.ts, aes(y = md.tspan, x = fct_inorder(reporting_airline), fill = md.tspan)) + geom_col() + coord_flip() + scale_fill_viridis(option = "inferno", direction = 1) + theme_ft_rc() + theme(legend.position = "bottom") + labs(x = "Airline", y = "Median Gap between Aircraft's Flight (in minutes)", fill = "") ``` <img src="Module05_files/figure-html/right-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## <br> .salt[.fancy[Some other learning resources]] See the book chapter for more advanced work with dates and times and some practice exercises. See also the [rds chapter for more nuanced operations](https://r4ds.had.co.nz/dates-and-times.html) * [Bonnie Dixon's lecture piped by Noam Ross](http://www.noamross.net/blog/2014/2/10/using-times-and-dates-in-r---presentation-code.html) * [Paul Hiemstra's tutorial on dealing with time in R](http://stcorp.nl/R_course/tutorial_time_in_r.html) * [Earth Data Lab's tutorial on visualizing traffic crime in Denver, CO](https://www.earthdatascience.org/tutorials/visualize-denver-colorado-traffic-crime/) * [Emily Zabor's tutorial on flexing time for survival analysis](https://www.emilyzabor.com/tutorials/survival_analysis_in_r_tutorial.html) * [flutterby's tutorial](http://www.flutterbys.com.au/stats/tut/tut3.4.html) * [A free Coursera course on dates and times](https://www.coursera.org/lecture/data-cleaning/working-with-dates-0rohY) --- class: right, middle <img class="circle" src="https://github.com/aniruhil.png" width="175px"/> # Find me at... [<svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 512 512"><path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"/></svg> @aruhil](http://twitter.com/aruhil) [<svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 512 512"><path d="M326.612 185.391c59.747 59.809 58.927 155.698.36 214.59-.11.12-.24.25-.36.37l-67.2 67.2c-59.27 59.27-155.699 59.262-214.96 0-59.27-59.26-59.27-155.7 0-214.96l37.106-37.106c9.84-9.84 26.786-3.3 27.294 10.606.648 17.722 3.826 35.527 9.69 52.721 1.986 5.822.567 12.262-3.783 16.612l-13.087 13.087c-28.026 28.026-28.905 73.66-1.155 101.96 28.024 28.579 74.086 28.749 102.325.51l67.2-67.19c28.191-28.191 28.073-73.757 0-101.83-3.701-3.694-7.429-6.564-10.341-8.569a16.037 16.037 0 0 1-6.947-12.606c-.396-10.567 3.348-21.456 11.698-29.806l21.054-21.055c5.521-5.521 14.182-6.199 20.584-1.731a152.482 152.482 0 0 1 20.522 17.197zM467.547 44.449c-59.261-59.262-155.69-59.27-214.96 0l-67.2 67.2c-.12.12-.25.25-.36.37-58.566 58.892-59.387 154.781.36 214.59a152.454 152.454 0 0 0 20.521 17.196c6.402 4.468 15.064 3.789 20.584-1.731l21.054-21.055c8.35-8.35 12.094-19.239 11.698-29.806a16.037 16.037 0 0 0-6.947-12.606c-2.912-2.005-6.64-4.875-10.341-8.569-28.073-28.073-28.191-73.639 0-101.83l67.2-67.19c28.239-28.239 74.3-28.069 102.325.51 27.75 28.3 26.872 73.934-1.155 101.96l-13.087 13.087c-4.35 4.35-5.769 10.79-3.783 16.612 5.864 17.194 9.042 34.999 9.69 52.721.509 13.906 17.454 20.446 27.294 10.606l37.106-37.106c59.271-59.259 59.271-155.699.001-214.959z"/></svg> aniruhil.org](https://aniruhil.org) [<svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 512 512"><path d="M476 3.2L12.5 270.6c-18.1 10.4-15.8 35.6 2.2 43.2L121 358.4l287.3-253.2c5.5-4.9 13.3 2.6 8.6 8.3L176 407v80.5c0 23.6 28.5 32.9 42.5 15.8L282 426l124.6 52.2c14.2 6 30.4-2.9 33-18.2l72-432C515 7.8 493.3-6.8 476 3.2z"/></svg> ruhil@ohio.edu](mailto:ruhil@ohio.edu)