Background

Data are often associated with a point in time, a particular

Some issues with points in time:

R has data types to represent

R stores dates as days since January 1, 1970, and date-times as the the number of seconds since midnight on that day in Coordinated Universal Time (UTC).

Date objects are less complicated than data-time objects, so if you only need dates you should stick with date objects.

Base R provides many facilities for dealing with dates and date-times.

The lubridate package provides a useful interface.

library(lubridate)

The Dates and Times chapter of R for Data Science provides more details.

Creating Dates and Times

Today and Now

The lubridate function today() returns today’s date as a Date object:

today()
## [1] "2024-04-01"
class(today())
## [1] "Date"

The lubridate function now() returns the current date-time as a POSIXct object:

now()
## [1] "2024-04-01 07:52:38 CDT"
class(now())
## [1] "POSIXct" "POSIXt"

The printed representation follows the international standard for the representation of dates and times (ISO8601).

Date and date-time objects can be used with addition and subtraction:

now() + 3600  ## one hour from now
## [1] "2024-04-01 08:52:38 CDT"
today() - 7   ## one week ago
## [1] "2024-03-25"

Parsing Dates and Times From Strings

Some common date formats:

d1 <- "2024-04-15"
d2 <- "April 15, 2024"
d3 <- "15 April 2024"
d4 <- "15 April 24"

These can be decoded by the functions ymd(), mdy(), and dmy():

ymd(d1)
## [1] "2024-04-15"
mdy(d2)
## [1] "2024-04-15"
dmy(d3)
## [1] "2024-04-15"
dmy(d4)
## [1] "2024-04-15"

By default, these functions use the current locale settings for interpreting month names or abbreviations.

Sys.getlocale("LC_TIME")
## [1] "en_US.UTF-8"

If you need to parse a French date you might use

dmy("15 Avril, 2024", locale = "fr_FR.UTF-8")
## [1] "2024-04-15"

Date-times can be decoded with functions like mdy_hm:

mdy_hm("April 15, 2024, 6:15 PM")
## [1] "2024-04-15 18:15:00 UTC"

or

mdy_hms("April 15, 2024, 6:15:08 PM")
## [1] "2024-04-15 18:15:08 UTC"

By default these assume the time is specified in the UTC time zone.

Creating Dates and Times from Components

Dates can be created from year, month, and day by make_date():

make_date(2024, 4, 15)
## [1] "2024-04-15"

Creating a date variable from the year, month, and day variables in the New York City flights table:

library(nycflights13)
fl <- mutate(flights,
             date = make_date(year, month, day))

ggplot and other graphics systems know how to make useful axis labels for dates:

ggplot(count(fl, date)) +
    geom_line(aes(x = date, y = n))

Weekday/weekend differenes are clearly visible.

Date-times can be created from year, month, day, hour, minute, and second using make_datetime():

make_datetime(2024, 4, 15, 18, 15)
## [1] "2024-04-15 18:15:00 UTC"

An attempt to recreate the time_hour variable in the flights table:

fl <- mutate(fl,
             th = make_datetime(year, month, day,
                                hour))

This does not quite re-create the time_hour variable:

identical(fl$th, fl$time_hour)
## [1] FALSE
fl$th[1]
## [1] "2013-01-01 05:00:00 UTC"
fl$time_hour[1]
## [1] "2013-01-01 05:00:00 EST"

By default, make_datetime() assumes the time points it is given are in UTC.

The time_hour variable is using local (eastern US) time.

We will look at time zones more later.

Date and Time Components

Components of dates and date-times can be extracted with:

By default, wday() returns an integer:

wday(today())
## [1] 2

But it can also return a label:

wday(today(), label = TRUE)
## [1] Mon
## Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
wday(today(), label = TRUE, abbr = FALSE)
## [1] Monday
## 7 Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < ... < Saturday

Weekday names and abbreviations are obviously locale-specific, and you can specify an alternative to the default current locale:

wday(today(), label = TRUE, abbr = FALSE, locale = "de_DE.UTF-8")
## [1] Montag
## 7 Levels: Sonntag < Montag < Dienstag < Mittwoch < Donnerstag < ... < Samstag
wday(today(), label = TRUE, locale = "de_DE.UTF-8")
## [1] Mo
## Levels: So < Mo < Di < Mi < Do < Fr < Sa

Even the integer value can be tricky:

wday() can be asked to use a different first day, and a global default can be set.

Using wday() and the date variable we can look at the distribution of the number of flights by day of the week:

ggplot(fl, aes(x = wday(date, label = TRUE))) +
    geom_bar(fill = "deepskyblue3")

There were substantially fewer flights on Saturdays but only slightly fewer flights on Sundays.

Rounding

floor_date(), round_date(), and ceiling_date() can be used to round to a particular unit; the most useful are week and quarter.

Flights by week:

The first and last weeks were incomplete:

as.character(wday(ymd("2013-01-01"),
                  label = TRUE, abbr = FALSE))
## [1] "Tuesday"
as.character(wday(ymd("2013-12-31"),
                  label = TRUE, abbr = FALSE))
## [1] "Tuesday"

Time Spans

Subtracting dates or date-times produces difftime objects:

now() - as_datetime(today())
## Time difference of 12.87806 hours
today() - ymd("2024-01-01")
## Time difference of 91 days

Working with different units can be awkward; lubridate provides durations, which always work in seconds:

as.duration(now() - as_datetime(today()))
## [1] "46361.0141499043s (~12.88 hours)"
as.duration(today() - ymd("2024-01-01"))
## [1] "7862400s (~13 weeks)"

Durations can be created with dyears(), ddays(), dweeks(), etc.:

dyears(1)
## [1] "31557600s (~1 years)"
ddays(1)
## [1] "86400s (~1 days)"

Durations can be added to a date or date-time object and can be multiplied by a number:

today()
## [1] "2024-04-01"
today() + ddays(2)
## [1] "2024-04-03"
today() + 2 * ddays(1)
## [1] "2024-04-03"
(n1 <- now())
## [1] "2024-04-01 07:52:41 CDT"
n1 + dminutes(3)
## [1] "2024-04-01 07:55:41 CDT"

Duations represent an exact number of seconds, which can lead to surprises when DST is involved.

In 2024 the switch to DST happened in the US on March 10:

ymd_hm("2024-03-09 23:02", tz = "America/Chicago") + ddays(1)
## [1] "2024-03-11 00:02:00 CDT"

Periods are an alternative that may work more intuitively.

Periods are constructed with years(), months(), days(), etc:

ymd_hm("2024-03-09 23:02", tz = "America/Chicago") + days(1)
## [1] "2024-03-10 23:02:00 CDT"

Time Zones

Date-time objects specify a point in time relative to second zero, minute zero, hour zero, on January 1, 1970 in Coordinated Universal Time (UTC).

Date-time objects can have a time zone associated with them that affects how they are printed.

now() returns a date-time object with the time zone set as the local time zone of the computer.

now()
## [1] "2024-04-01 07:52:41 CDT"

Time zones are complex, they can change on a regular basis (DST) or as a result of politics.

When a date-time object is created from components, by default it is given the UTC time zone.

To create a point in time based on local time information, such as 10 AM on April 15, 2024, in Iowa City, a time zone for interpreting the local time needs to be specified.

The short notations like CDT are not adequate for this: Both the US and Australia have EST, which are quite different.

R uses the Internet Assigned Numbers Authority (IANA) naming convention and data base.

The local time zone is:

Sys.timezone()
## [1] "America/Chicago"

The time point 10:00:00 AM on April 15, 2024 in Iowa City can be specified as

(tm <- make_datetime(2024, 4, 15, 10, tz = "America/Chicago"))
## [1] "2024-04-15 10:00:00 CDT"

Time zones of date-time objects can be changed in two ways:

The available time zone specifications are contained in OlsonNames:

head(OlsonNames())
## [1] "Africa/Abidjan"     "Africa/Accra"       "Africa/Addis_Ababa"
## [4] "Africa/Algiers"     "Africa/Asmara"      "Africa/Asmera"

The instant tm in some other time zones:

with_tz(tm, tz = "UTC")
## [1] "2024-04-15 15:00:00 UTC"
with_tz(tm, tz = "America/New_York")
## [1] "2024-04-15 11:00:00 EDT"
with_tz(tm, tz = "Asia/Shanghai")
## [1] "2024-04-15 23:00:00 CST"
with_tz(tm, tz = "Pacific/Auckland")
## [1] "2024-04-16 03:00:00 NZST"
with_tz(tm, tz = "Asia/Kolkata")
## [1] "2024-04-15 20:30:00 IST"
with_tz(tm, tz = "Canada/Newfoundland")
## [1] "2024-04-15 12:30:00 NDT"
with_tz(tm, tz = "Asia/Katmandu")
## [1] "2024-04-15 20:45:00 +0545"

Some more examples:

## All offsets that are not a full hour:
get_offset <- function(z)
    abs(minute(with_tz(tm, tz = z)) - minute(tm))
offsets <- data.frame(zone = OlsonNames()) |>
    mutate(offset = sapply(zone, get_offset)) |>
    arrange(offset)
filter(offsets, offset != 0)

## Offsets for Australia:
filter(offsets, grepl("Australia", zone))

If we create the th variable for the flights data as

fl <- mutate(flights, th = make_datetime(year, month, day, hour,
                                         tz = "America/New_York"))

then the result matches the date_time variable:

identical(fl$th, fl$time_hour)
## [1] TRUE

The time_hour variable in the weather table reflects actual points in time and, together with origin, can serve as a primary key:

filter(count(weather, origin, time_hour), n > 1)
## # A tibble: 0 × 3
## # ℹ 3 variables: origin <chr>, time_hour <dttm>, n <int>

The month, day, hour variables are confused by the time change.

In November there is a repeat:

count(weather, origin, month, day, hour) |>
    filter(n > 1)
## # A tibble: 3 × 5
##   origin month   day  hour     n
##   <chr>  <int> <int> <int> <int>
## 1 EWR       11     3     1     2
## 2 JFK       11     3     1     2
## 3 LGA       11     3     1     2

and there is a missing hour in March:

select(weather, origin, month, day, hour) |>
    filter(origin == "EWR", month == 3,
           day == 10, hour <= 3)
## # A tibble: 3 × 4
##   origin month   day  hour
##   <chr>  <int> <int> <int>
## 1 EWR        3    10     0
## 2 EWR        3    10     1
## 3 EWR        3    10     3

Things to Look Out For

For dates:

For date-times

Reading

Chapter Dates and Times in R for Data Science.

Exercises

  1. Using the NYC flights data, how many flights were there on Saturdays from Newark (EWR) to Cicago O’Hare (ORD) in 2013?

    1. 413
    2. 522
    3. 601
    4. 733
  2. What day of the week will July 4, 2030, fall on?

    1. Monday
    2. Wednesday
    3. Thursday
    4. Saturday
