class: center, middle, title-slide .title[ # Time Series Plots ] .author[ ### Luke Tierney ] .institute[ ### University of Iowa ] .date[ ### 2025-04-09 ] --- layout: true <link rel="stylesheet" href="stat4580.css" type="text/css" /> <style type="text/css"> .remark-code { font-size: 85%; } </style> ## Time Series Data --- A lot of data is collected over time: -- - economic performace measures; -- - weather, climate data; -- - medical information. -- The objectives can be -- - understanding the past – how we got where we are; -- - understanding patterns in variation; -- - forecasting what will happen next. --- Even if time is not of primary interest, data are often collected with a time stamp. -- Examining the behavior over time can reveal interesting features that may hint at confounding variables: -- - temperature varying with time of day; -- - humidity varying over the year. --- Time series data can be collected at regularly spaced intervals: -- - yearly, e.g. the `gapminder` data -- - monthly, e.g. the `river` data -- - daily, e.g. high temperatures. -- Data might be missing -- - irregularly, e.g. faulty instrument, holidays; -- - regularly, e.g. weekends. -- Time series can also be recorded at irregular times: -- - blood pressure at doctor visits; -- Variables recorded over time can be numerical or categorical. --- layout: false ## Time Series Objects Time series data can be represented as regular data frames with a time variable. -- There are also a number of specialized object classes for dealing with time series. -- A number of packages provide `plot` or `autoplot` methods and other utilities for these objects. -- Some of the most useful: | Class | Package | Features | | ---------- | ---------- | -------------------------------------------------- | |`ts`, `mts`| `base` | regular series; methods for `plot` and `lines`. | | | `forecast` | methods for `autoplot` and `autolayer`. | |`zoo` | `zoo` | irregular and regular series; `autoplot` support. | | | `broom` |`tidy` for converting to tidy data frames. | --- ## `autoplot` and `autolayer` Many data structures provide methods for `autoplot` and `autolayer`. -- From the `autoplot` help page: > `autoplot` uses `ggplot2` to draw a particular plot for an object of > a particular class in a single command. This defines the S3 generic > that other classes and packages can extend. -- From the `autolayer` help page: > `autolayer` uses `ggplot2` to draw a particular layer for an object > of a particular class in a single command. This defines the S3 > generic that other classes and packages can extend. --- layout: true ## Example: River Data --- Creating a `ts` object for the `river` data: ``` r river <- scan("https://www.stat.uiowa.edu/~luke/data/river.dat") (riverTS <- ts(river)) ## Time Series: ## Start = 1 ## End = 128 ## Frequency = 1 ## [1] 8.95361 9.49409 10.19430 10.95660 11.07770 10.98170 10.48970 10.27540 ## [9] 10.16620 9.23484 8.56537 8.51064 ## [ reached getOption("max.print") -- omitted 116 entries ] ``` -- Or, since the data were collected monthly: ``` r (riverTS <- ts(river, frequency = 12)) ## Jan Feb Mar Apr May Jun Jul Aug ## 1 8.95361 9.49409 10.19430 10.95660 11.07770 10.98170 10.48970 10.27540 ## Sep Oct Nov Dec ## 1 10.16620 9.23484 8.56537 8.51064 ## [ reached getOption("max.print") -- omitted 10 rows ] ``` --- Using `tidy` from package `broom` to create a data frame from the `ts` object: ``` r riverDF <- broom::tidy(riverTS) head(riverDF, 3) ## # A tibble: 3 × 2 ## index value ## <dbl> <dbl> ## 1 1 8.95 ## 2 1.08 9.49 ## 3 1.17 10.2 ``` -- Change to more convenient names: ``` r riverDF <- rename(riverDF, time = index, flow = value) head(riverDF, 3) ## # A tibble: 3 × 2 ## time flow ## <dbl> <dbl> ## 1 1 8.95 ## 2 1.08 9.49 ## 3 1.17 10.2 ``` --- layout: true ## Basic Plots for a Single Numeric Time Series --- Numeric time series are usually plotted as a _line chart_. -- This is what the `autoplot` method for `ts` objects provided by package `forecast` does. -- For the river flow data: .pull-left[ <img src="timeseries_files/figure-html/river-ts-autoplot-1.png" style="display: block; margin: auto;" /> ] -- .pull-right[ ``` r library(forecast) p <- autoplot(riverTS) p ``` ] --- Using symbols in addition to lines is sometimes useful: -- .pull-left[ <img src="timeseries_files/figure-html/river-ts-autoplot-syms-1.png" style="display: block; margin: auto;" /> ] -- .pull-right[ ``` r p + geom_point() ``` ] --- layout: true ## Aspect Ratio --- Aspect ratio has a significant effect on how well we perceive slope differences. -- - Banking to 45 degrees: choose an aspect ratio so the magnitude of the important slopes is close to 45 degrees. -- - This usually requires the horizontal axis to be longer than the vertical axis. --- Controlling the aspect ratio: -- - For `base` graphics use the `asp` argument to `plot`. -- - `lattice` uses the `aspect` argument. -- - For `ggplot` use the `ratio` argument to `coord_fixed`. -- ``` r p + coord_fixed(ratio = 1 / 3) ``` <img src="timeseries_files/figure-html/unnamed-chunk-7-1.png" style="display: block; margin: auto;" /> --- Lattice supports several aspect ratio strategies: -- - `"fill"` to avoid unused white space; -- - `"xy"` to use a 45 degree banking rule; -- - `"iso"` for isometric scales, e.g. distances. -- Some work may be needed to avoid excess white space around figures. -- Using multiple panels can help. -- In an interactive setting, zoom and pan support can help. --- Multiple panels: -- .pull-left[ <img src="timeseries_files/figure-html/river-ts-panels-1.png" style="display: block; margin: auto;" /> ] -- .pull-right[ ``` r mutate(riverDF, period = cut(time, breaks = 3)) |> ggplot(aes(time, flow)) + geom_line() + facet_wrap(~ period, scales = "free_x", ncol = 1) ``` ] --- Adding zoom and pan support using [`plotly`](https://plotly.com/r/): -- ``` r library(plotly) pp <- p + geom_line() + coord_fixed(ratio = 1 / 3) ggplotly(pp) ```
--- There is not always a single best aspect ratio. -- The `co2` data set in the `datasets` package contains monthly concentrations of CO2 for the years 1959 to 1997 recorded at the Mauna Loa observatory. -- The `co2` data is stored as an object of class `ts`: ``` r str(co2) ## Time-Series [1:468] from 1959 to 1998: 315 316 316 318 318 ... ``` -- `plot` and `xyplot` have methods for `ts` objects that simplify time series plotting. --- Two aspect ratios: .pull-left[ ``` r xyplot(co2) ``` <img src="timeseries_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" /> ] .pull-right[ ``` r xyplot(co2, aspect = "xy") ``` <img src="timeseries_files/figure-html/unnamed-chunk-12-1.png" style="display: block; margin: auto;" /> Is there a slowdown in the increase? A graph with more recent data is [available](https://gml.noaa.gov/ccgg/trends/). The full data is [available](https://gml.noaa.gov/webdata/ccgg/trends/co2/co2_annmean_mlo.txt) as well. ] --- `autoplot` does not yet provide an easy way to bank to 45 degrees; you need to experiment or calculate an appropriate aspect ratio yourself. The default `autoplot`: .pull-left[ <img src="timeseries_files/figure-html/co2-autoplot-default-1.png" style="display: block; margin: auto;" /> ] .pull-right[ ``` r library(forecast) autoplot(co2) ``` ] --- Aspect ratio adjusted with `coord_fixed`: ``` r r45 <- median(diff(time(co2))) / median(abs(diff(co2))) autoplot(co2) + coord_fixed(ratio = r45) ``` <img src="timeseries_files/figure-html/unnamed-chunk-13-1.png" style="display: block; margin: auto;" /> --- layout: true ## Updated Mauna Loa CO2 Data --- Downloading and reading in the data: ``` r library(lubridate) url <- "https://gml.noaa.gov/webdata/ccgg/trends/co2/co2_mm_mlo.txt" if (! file.exists("co2new.dat") || now() > file.mtime("co2new.dat") + weeks(4)) download.file(url, "co2new.dat") co2new <- read.table("co2new.dat") ``` -- Clean up the variable names: ``` r head(co2new) ## V1 V2 V3 V4 V5 V6 V7 V8 ## 1 1958 3 1958.203 315.71 314.44 -1 -9.99 -0.99 ## 2 1958 4 1958.288 317.45 315.16 -1 -9.99 -0.99 ## 3 1958 5 1958.370 317.51 314.69 -1 -9.99 -0.99 ## 4 1958 6 1958.455 317.27 315.15 -1 -9.99 -0.99 ## 5 1958 7 1958.537 315.87 315.20 -1 -9.99 -0.99 ## 6 1958 8 1958.622 314.93 316.21 -1 -9.99 -0.99 names(co2new) <- c("year", "month", "decimal_date", "average", "deseason", "ndays", "stdd", "umm") ``` --- An initial look: ``` r ggplot(co2new, aes(decimal_date, average)) + geom_line() ``` <img src="timeseries_files/figure-html/unnamed-chunk-16-1.png" style="display: block; margin: auto;" /> --- Missing values are encoded as `-9.99` or `-99.99`. If there are any such values they need to be replaced with R's missing value representation: ``` r filter(co2new, average < -9) ## [1] year month decimal_date average deseason ## [6] ndays stdd umm ## <0 rows> (or 0-length row.names) co2new <- mutate(co2new, average = ifelse(average < -9, NA, average)) ``` --- .pull-left[ This produces a reasonable plot: .hide-code[ ``` r ggplot(co2new, aes(decimal_date, average)) + geom_line() ``` <img src="timeseries_files/figure-html/unnamed-chunk-18-1.png" style="display: block; margin: auto;" /> ] ] -- .pull-right[ The plot would show gaps for the missing values. {{content}} ] -- Missing values could also be interpolated with `na.interp` from package `forecast`. --- Converting the new data to a `ts` object makes adding the original data a little easier: ``` r co2newTS <- ts(co2new$average, start = c(1958, 3), frequency = 12) autoplot(co2newTS) + autolayer(co2) ``` <img src="timeseries_files/figure-html/unnamed-chunk-19-1.png" style="display: block; margin: auto;" /> --- layout: true ## Seasonal Decomposition of Time Series by Loess (STL) --- For time series with a strong seasonal component it can be useful to look at a _Seasonal Decomposition of Time Series by Loess_, or (_STL_). -- The methodology was suggested by Clevaland and coworkers. -- The `stl` function in the `base` package computes such a decomposition. -- It requires a series without missing values. -- ``` r co2newTS_no_NA <- ts(co2new$average, start = c(1958, 3), frequency = 12) co2stl <- stl(co2newTS_no_NA, s.window = 21) head(co2stl$time.series) ## seasonal trend remainder ## Mar 1958 1.1159035 314.9426 -0.3484682 ## Apr 1958 2.2797196 315.0143 0.1559367 ## May 1958 2.7672696 315.0861 -0.3433923 ## Jun 1958 2.2272621 315.1579 -0.1151637 ## Jul 1958 0.8611739 315.2241 -0.2153025 ## Aug 1958 -1.0848309 315.2904 0.7244755 ``` --- .pull-left[ Package `forecast` provides an `autoplot` method for `stl` objects that shows the original data and the three components in separate panels: ``` r autoplot(co2stl) ``` ] .pull-right[ <img src="timeseries_files/figure-html/co2-stl-plot-1.png" style="display: block; margin: auto;" /> ] --- layout: false ## Some Alternative Visualizations Bar charts are sometimes used when comparison to a base line is important; for example for [illustrating climate change](../img/CaG_GlobalTempAnom_1.jpg). <!-- original at https://www.climate.gov/sites/default/files/CaG_GlobalTempAnom_1.jpg Data is available from https://www.climate.gov/maps-data/dataset/global-temperature-anomalies-graphing-tool --> Calendar plots, or calendar heatmaps, are sometimes useful for displaying daily data where day of the week/weekend effects might be important. One example is available [here](../img/calendarHeatmap.png). A [blog post](https://dominikkoch.github.io/Calendar-Heatmap/) shows how to build a calendar plot with `ggplot`. --- layout: true ## GDP Growth Data --- Bar charts are sometimes used for economic time series, in particular growth rate data. -- Some GDP growth rate data is available on this [web page](https://www.multpl.com/us-real-gdp-growth-rate/table/by-quarter): -- Read the data from the web page: ``` r library(rvest) gdpurl <- "http://www.multpl.com/us-real-gdp-growth-rate/table/by-quarter" gdppage <- read_html(gdpurl) gdptbl <- html_table(gdppage)[[1]] head(gdptbl) ## # A tibble: 6 × 2 ## Date Value ## <chr> <chr> ## 1 Dec 31, 2024 2.53% ## 2 Sep 30, 2024 2.72% ## 3 Jun 30, 2024 3.04% ## 4 Mar 31, 2024 2.90% ## 5 Dec 31, 2023 3.20% ## 6 Sep 30, 2023 3.24% ``` --- Clean the data: ``` r library(dplyr) library(lubridate) names(gdptbl) <- c("Date", "Value") gdptbl <- mutate(gdptbl, Value = as.numeric(sub("%", "", Value)), Date = mdy(Date)) head(gdptbl) ## # A tibble: 6 × 2 ## Date Value ## <date> <dbl> ## 1 2024-12-31 2.53 ## 2 2024-09-30 2.72 ## 3 2024-06-30 3.04 ## 4 2024-03-31 2.9 ## 5 2023-12-31 3.2 ## 6 2023-09-30 3.24 ``` --- .pull-left[ Data prior to 1947 is annual, so drop earlier data: ``` r gdptbl <- filter(gdptbl, Date >= make_date(1948, 1, 1)) ``` {{content}} ] -- Rearrange to put `Date` in increasing order: ``` r gdptbl <- arrange(gdptbl, Date) ``` {{content}} -- Create and plot a time series: ``` r library(ggplot2) library(forecast) gdpts <- ts(gdptbl$Value, start = 1948, frequency = 4) autoplot(gdpts) + geom_hline(yintercept = 0, linetype = 2) ``` -- .pull-right[ <img src="timeseries_files/figure-html/gdp-plot-1.png" style="display: block; margin: auto;" /> ] --- .pull-left[ A bar chart with color used to distinguish positive and negative growth can be created from the original table `gdptbl`: ``` r library(ggplot2) mutate(gdptbl, growth = ifelse(Value >= 0, "pos", "neg")) |> ggplot(aes(x = Date, y = Value, fill = growth)) + geom_col() + theme(legend.position = "top") + scale_fill_manual( values = c(pos = "blue", neg = "red")) ``` ] .pull-right[ <img src="timeseries_files/figure-html/gdp-bar-1.png" style="display: block; margin: auto;" /> ] --- .pull-left[ The bar chart can also be created from the time series object: ``` r data.frame(rate = as.numeric(gdpts), time = as.numeric(time(gdpts))) |> mutate(growth = ifelse(rate >= 0, "pos", "neg")) |> ggplot(aes(time, rate, fill = growth)) + geom_col() + theme(legend.position = "top") + scale_fill_manual( values = c(pos = "blue", neg = "red")) ``` ] .pull-right[ <img src="timeseries_files/figure-html/gdp-bar-ts-1.png" style="display: block; margin: auto;" /> ] --- layout: true ## Airline Flights From NYC --- The `nycflights13` data set contains data on all flight leaving New York City in 2013. Similar data is available for other cities and years, e.g. from the [`anyflights` package](https://simonpcouch.github.io/anyflights/). -- We can look at the number of departures for each day: ``` r library(nycflights13) library(ggplot2) nf <- count(flights, year, month, day) ``` -- It will be useful to have a combined `date` field in our data frame. -- The `lubridate` function `make_date` can be used for this: ``` r library(lubridate) nf <- mutate(nf, date = make_date(year, month, day)) ``` --- A line plot of the number of flights: .pull-left[ <img src="timeseries_files/figure-html/flights-1-1.png" style="display: block; margin: auto;" /> ] -- .pull-right-width-40[ ``` r ggplot(nf, aes(date, n)) + geom_line() ``` ] -- .pull-right-width-40[ Weekday variation is clearly visible. ] --- Breaking out the days of the week into separate facets may help: .pull-left[ <img src="timeseries_files/figure-html/flights-2-1.png" style="display: block; margin: auto;" /> ] -- .pull-right[ ``` r nnf <- * mutate(nf, wd = wday(date, label = TRUE)) nnf_thm <- theme(axis.text.x = element_text(angle = -45, hjust = 0), plot.margin = margin(r = 30)) ggplot(nnf, aes(date, n)) + geom_line() + * facet_wrap(~ wd) + nnf_thm ``` {{content}} ] -- July 4 in 2013: ``` r as.character(wday(ymd("2013-07-04"), label = TRUE)) ## [1] "Thu" ``` --- Identifying when some major holidays occur may also be useful: .pull-left[ <img src="timeseries_files/figure-html/flights-3-1.png" style="display: block; margin: auto;" /> ] .pull-right[ ``` r holidays <- data.frame(Holiday = c("4th of July", "Thanksgiving"), date = ymd(c("2013-07-04", "2013-11-28"))) hlayer <- geom_vline(aes(xintercept = date, color = Holiday), linetype = 2, linewidth = 1, data = holidays) ggplot(nf, aes(date, n)) + hlayer + geom_line() ``` ] --- .pull-left[ Holidays in the faceted plot: <img src="timeseries_files/figure-html/flights-holiday-facet-1-1.png" style="display: block; margin: auto;" /> ] .pull-right[ ``` r ggplot(nnf, aes(date, n)) + hlayer + geom_line() + facet_wrap(~ wd) + nnf_thm ``` {{content}} ] -- This marks the holiday weeks. --- .pull-left[ Holidays in the faceted plot: <img src="timeseries_files/figure-html/flights-holiday-facet-2-1.png" style="display: block; margin: auto;" /> -- ] .pull-right[ This shows only the holiday days. ``` r hlayer2 <- geom_vline(aes(xintercept = date, color = Holiday), linewidth = 1, data = mutate(holidays, wd = wday(date, TRUE))) ggplot(nnf, aes(date, n)) + hlayer2 + geom_line() + facet_wrap(~ wd) + nnf_thm ``` ] --- .pull-left[ Holidays in the faceted plot: <img src="timeseries_files/figure-html/flights-holiday-facet-3-1.png" style="display: block; margin: auto;" /> ] .pull-right[ Show holidays as solid lines, days in holiday weeks as dashed lines. ``` r ggplot(nnf, aes(date, n)) + hlayer + hlayer2 + geom_line() + facet_wrap(~ wd) + nnf_thm ``` ] --- A simple calendar plot: .pull-left[ <img src="timeseries_files/figure-html/flights-cal-0-1.png" style="display: block; margin: auto;" /> ] -- .pull-right[ This uses `geom_tile` and faceting. ``` r monthweek <- function(d, w) ceiling((d - w) / 7) + 1 nf <- mutate(nf, wd = wday(date, label = TRUE), wd = fct_rev(wd), mw = monthweek(day, wday(date))) p <- ggplot(nf, aes(x = as.character(mw), y = wd, fill = n)) + * geom_tile(color = "white") + * facet_wrap(~ month(date, label = TRUE)) p ``` ] --- Some theme and color adjustments: .pull-left[ <img src="timeseries_files/figure-html/flights-cal-1-1.png" style="display: block; margin: auto;" /> ] .pull-right[ ``` r p <- p + scale_fill_gradient(low = "white") + ylab("") + xlab("Week of Month") + theme(panel.grid.major = element_blank(), panel.border = element_blank()) p ``` ] --- This layout with days on the vertical axis works well with a single row, which could be extended to a multi-year plot: ``` r p + facet_wrap(~ month(date, TRUE), nrow = 1) ``` <img src="timeseries_files/figure-html/unnamed-chunk-28-1.png" style="display: block; margin: auto;" /> --- A more standard layout: .pull-left[ <img src="timeseries_files/figure-html/flights-cal-3-1.png" style="display: block; margin: auto;" /> ] .pull-right[ ``` r nf2 <- mutate(nf, wd = wday(date, label = TRUE), mw = factor(monthweek(day, wday(date))), mw = fct_rev(mw)) wdlabs <-substr(levels(nf2$wd), 1, 1) ggplot(nf2, aes(x = wd, y = mw, fill = n)) + geom_tile(color = "white") + facet_wrap(~ month(date, TRUE)) + scale_fill_gradient(low = "white") + xlab("") + ylab("Week of Month") + scale_x_discrete(labels = wdlabs) + theme(panel.grid.major = element_blank(), panel.border = element_blank()) ``` ] --- layout: false ## Reading Chapters [_Visualizing time series and other functions of an independent variable_](https://clauswilke.com/dataviz/time-series.html) and [_Visualizing trends_](https://clauswilke.com/dataviz/visualizing-trends.html) in [_Fundamentals of Data Visualization_](https://clauswilke.com/dataviz/).
//adapted from Emi Tanaka's gist at //https://gist.github.com/emitanaka/eaa258bb8471c041797ff377704c8505