class: center, middle, title-slide .title[ # Plots for Multiple Time Series ] .author[ ### Luke Tierney ] .institute[ ### University of Iowa ] .date[ ### 2025-04-13 ] --- layout: true <link rel="stylesheet" href="stat4580.css" type="text/css" /> <style type="text/css"> .remark-code { font-size: 85%; } </style> ## Multiple Time Series --- Some questions for multiple time series: -- - How do the series behave individually? -- - How are the series related? -- - Does one series lead or lag the others? -- Some options for plotting multiple series: -- - separate panels in a trellis display; -- - multiple series in a single plot - this will require standardizing if the scales vary substantially; -- - a multivatiate plot with time represented by connecting line segments or animation. -- Another option for two time series: -- - use a separate `y` axis for each. -- This is usually a bad idea. --- Some data structures for multiple time series: * `mts` objects created by `ts()`. -- - All series have to be numeric. - `autoplot` methods may be sufficient. -- * Regular data frames. -- - The series will have to be plotted in separate layers, - or the data frames will need converting into long form. --- layout: true ## Crimean War Casualties --- The `HistData` package contains the data used by Florence Nightingale to call attention to terrible sanitary conditions for the Britich army in Crimea. ``` r data(Nightingale, package = "HistData") head(select(Nightingale, -Date)) ## Month Year Army Disease Wounds Other Disease.rate Wounds.rate Other.rate ## 1 Apr 1854 8571 1 0 5 1.4 0.0 7.0 ## 2 May 1854 23333 12 0 9 6.2 0.0 4.6 ## 3 Jun 1854 28333 11 0 6 4.7 0.0 2.5 ## 4 Jul 1854 28722 359 0 23 150.0 0.0 9.6 ## 5 Aug 1854 30246 828 1 30 328.5 0.4 11.9 ## 6 Sep 1854 30290 788 81 70 312.2 32.1 27.7 ``` --- A plot of the `Disease`, `Wounds`, and `Other`: .pull-left[ <img src="timeseries2_files/figure-html/night-ts-auto-1.png" style="display: block; margin: auto;" /> ] -- .pull-right[ ``` r nDWO <- select(Nightingale, Disease, Wounds, Other) |> ts(start = c(1854, 4), frequency = 12) autoplot(nDWO) ``` ] --- Using the rate variables produces a very similar pattern: .pull-left[ <img src="timeseries2_files/figure-html/night-rate-ts-auto-1.png" style="display: block; margin: auto;" /> ] -- .pull-right[ ``` r nDWO.rate <- select(Nightingale, ends_with("rate")) |> ts(start = c(1854, 4), frequency = 12) autoplot(nDWO.rate) ``` ] --- The army size does not vary enough over the time where the casualties are high to affect the pattern much: .pull-left[ <img src="timeseries2_files/figure-html/night-army-ts-auto-1.png" style="display: block; margin: auto;" /> ] -- .pull-right[ ``` r nArmy <- select(Nightingale, Army) |> ts(start = c(1854, 4), frequency = 12) autoplot(nArmy) ``` ] --- Instead of showing three series in one plot you can also use faceting. .pull-left[ <img src="timeseries2_files/figure-html/night-auto-facet-1.png" style="display: block; margin: auto;" /> ] -- .pull-right[ ``` r autoplot(nDWO, facets = TRUE) ``` ] --- To create the plot of `Disease`, `Wounds`, and `Other` using basic `ggplot` it is best to first convert the data into long form. -- You can create a long version of the data, using either `pivot_longer` or `broom::tidy`. -- Using `broom:tidy` on the `mts` object: .pull-left[ <img src="timeseries2_files/figure-html/night-ggplot-broom-1.png" style="display: block; margin: auto;" /> ] -- .pull-right[ ``` r broom::tidy(nDWO) |> rename(Date = index, Deaths = value, Cause = series) |> ggplot(aes(x = Date, y = Deaths, color = Cause)) + geom_line() ``` ] --- Using `pivot_longer` on the original data frame: .pull-left[ <img src="timeseries2_files/figure-html/night-ggplot-pivot-longer-1.png" style="display: block; margin: auto;" /> ] -- .pull-right[ ``` r select(Nightingale, Date, Disease, Wounds, Other) |> pivot_longer(-Date, names_to = "Cause", values_to = "Deaths") |> ggplot(aes(x = Date, y = Deaths, color = Cause)) + geom_line() ``` ] --- Using faceting instead of a single plot: .pull-left[ <img src="timeseries2_files/figure-html/night-ggplot-facet-1.png" style="display: block; margin: auto;" /> ] -- .pull-right[ ``` r select(Nightingale, Date, Disease, Wounds, Other) |> pivot_longer(-Date, names_to = "Cause", values_to = "Deaths") |> ggplot(aes(x = Date, y = Deaths)) + geom_line() + facet_wrap(~ Cause, ncol = 1) ``` ] --- To focus on the shape of the curves it is often useful to allow the vertical axis ranges to adjust to the data: .pull-left[ <img src="timeseries2_files/figure-html/night-ggplot-facet-free-y-1.png" style="display: block; margin: auto;" /> ] -- .pull-right[ ``` r select(Nightingale, Date, Disease, Wounds, Other) |> pivot_longer(-Date, names_to = "Cause", values_to = "Deaths") |> ggplot(aes(x = Date, y = Deaths)) + geom_line() + facet_wrap(~ Cause, ncol = 1, * scales = "free_y") ``` ] --- This is particularly useful when the ranges are very different, for example if the `Army` series is included: .pull-left[ <img src="timeseries2_files/figure-html/night-ggplot-facet-free-y-army-1.png" style="display: block; margin: auto;" /> ] -- .pull-right[ ``` r select(Nightingale, Army, Date, Disease, Wounds, Other) |> pivot_longer(-Date, names_to = "series", values_to = "Deaths") |> ggplot(aes(Date, Deaths)) + geom_line() + facet_wrap(~ series, ncol = 1, scales = "free_y") ``` ] --- layout: false ## Some Notes -- Using `autoplot` is often a quick way to get useful plots. -- Using basic `ggplot` graphics is a bit more work but provides more flexibility. --- layout: true ## Many Time Series --- Annual flow for dams in Italy: .pull-left[ <img src="timeseries2_files/figure-html/italy-dams-single-1.png" style="display: block; margin: auto;" /> ] -- .pull-right[ Clearly showing all series in a single plot is not helpful! ``` r data(hydroSIMN, package = "nsRFA") annualflows <- mutate(annualflows, cod = factor(cod)) ggplot(annualflows, aes(x = anno, y = dato, group = cod, color = factor(cod))) + geom_line() ``` ] --- With small multiples some patterns can be seen: .pull-left[ <img src="timeseries2_files/figure-html/italy-dams-facet-1.png" style="display: block; margin: auto;" /> ] -- .pull-right[ ``` r ggplot(annualflows, aes(x = anno, y = dato, group = cod)) + geom_line() + facet_wrap(~ cod) + theme(axis.text.x = element_text(angle = 45, hjust = 1)) ``` ] --- Small multiples can also be arranged in ways that convey spatial information: .pull-left[ <img src="timeseries2_files/figure-html/unemp-geofacet-1.png" style="display: block; margin: auto;" /> ] -- .pull-right[ ``` r library(geofacet) ggplot(state_unemp, aes(year, rate)) + geom_line() + facet_geo(~ state, grid = "us_state_grid2", label = "name") + scale_x_continuous(labels = function(x) paste0("'", substr(x, 3, 4))) + labs(title = "Seasonally Adjusted US Unemployment Rate 2000-2016", caption = "Data Source: bls.gov", x = "Year", y = "Unemployment Rate (%)") + theme(strip.text.x = element_text(size = 6), axis.text.y = element_text(size = 6), axis.text.x = element_blank()) ``` ] --- layout: true ## The Fall and Rise of Income Inequality --- A [2015 post](https://www.npr.org/sections/money/2015/02/11/384988128/the-fall-and-rise-of-u-s-inequality-in-2-graphs) on the NPR Planet Money site uses two graphs to show transitions in income inequality. -- The page includes a link to the <!-- this times out in the link check --> [data source](http://topincomes.parisschoolofeconomics.eu/). -- This has been unavailable for some time. -- I tried to obtain the same data but may not have done it quite right. -- The closest source I could find is <https://wid.world/data/>. -- The data is available locally: ``` r income <- read.table("https://www.stat.uiowa.edu/~luke/data/nprincome.dat") ``` --- Plots over time: .pull-left[ <img src="timeseries2_files/figure-html/income-single-1.png" style="display: block; margin: auto;" /> ] -- .pull-right-width-40[ ``` r select(income, -year) |> ts(start = 1913) |> autoplot() ``` ] -- .pull-right-width-40[ The ranges are too different for this to be effective. ] --- One alternative: standardize the series to one, or 100%, at some point. -- Here the initial value in 1913 is used: .pull-left[ <img src="timeseries2_files/figure-html/income-single-rel-1.png" style="display: block; margin: auto;" /> ] -- .pull-right[ ``` r select(income, -year) |> transmute(L90rel = L90 / L90[1], U01rel = U01 / U01[1]) |> ts(start = 1913) |> autoplot() ``` ] --- Faceted plots, with free `\(y\)` axes, are another option: .pull-left[ <img src="timeseries2_files/figure-html/income-facet-1.png" style="display: block; margin: auto;" /> ] -- .pull-right[ ``` r select(income, -year) |> ts(start = 1913) |> autoplot(facets = TRUE) ``` ] --- With two series another option is a _connected scatter plot_. .pull-left[ <img src="timeseries2_files/figure-html/income-connected-1-1.png" style="display: block; margin: auto;" /> ] -- .pull-right-width-40[ ``` r ggplot(income, aes(U01, L90)) + * geom_path(linewidth = 2) + labs(x = "Average income for the top 1%", y = "Average income for the bottom 90%") ``` ] -- .pull-right-width-40[ This simple form does not show the direction or scale of time. ] --- Color or annotation can be used to show direction and scale of time: .pull-left[ <img src="timeseries2_files/figure-html/income-connected-2-1.png" style="display: block; margin: auto;" /> ] -- .pull-right-width-40[ ``` r library(ggrepel) ggplot(income, aes(U01, L90, color = year)) + geom_path(size = 2) + geom_text_repel( aes(label = ifelse(year %% 5 == 0 | year == max(year), year, "")), color = "grey50") + labs(x = "Average income for the top 1%", y = "Average income for the bottom 90%") ``` ] --- Animated is another option: <!-- **** could look into getting closer to the original NPR version --> .pull-left[ <img src="timeseries2_files/figure-html/income-gganim-1.gif" style="display: block; margin: auto;" /> ] -- .pull-right.small-code[ ``` r p <- ggplot(income, aes(x = U01, y = L90, color = year)) + geom_path(size = 2, color = "lightgrey", data = mutate(income, year = NULL)) + geom_path(size = 2) + labs(title = "Year: {frame_along}", x = "Average income for the top 1%", y = "Average income for the bottom 90%") *library(gganimate) *animate(p + transition_reveal(year), * fps = 5, end_pause = 10) ``` ] --- layout: true ## Stack Loss Data --- Operational data obtained from 21 days of operation of a plant for the oxidation of ammonia (NH3) to nitric acid (HNO3). -- The dependent variable, `stack.loss` is 10 times the percentage of the ingoing ammonia to the plant that escapes from the absorption column unabsorbed. -- .pull-left[ <img src="timeseries2_files/figure-html/stackloss-splom-1.png" style="display: block; margin: auto;" /> ] -- .pull-right[ ``` r library(lattice) splom(stackloss) ``` ] --- The data are collected over time: .pull-left[ <img src="timeseries2_files/figure-html/stackloss-splom-line-1.png" style="display: block; margin: auto;" /> ] -- .pull-right.width-50[ ``` r library(dplyr) sl <- mutate(stackloss, intex = seq_len(nrow(stackloss))) cpal <- colorRampPalette(c("#132B43", "#56B1F7")) cols <- cpal(nrow(stackloss) - 1) pfun <- function(x, y, ...) { n <- length(x) lsegments(x[-n], y[-n], x[-1], y[-1], col = cols, lwd = 3) } splom(stackloss, panel = pfun) ``` ] --- Randomly reordering the indices can help calibrate whether there is a time effect: .pull-left[ <img src="timeseries2_files/figure-html/stackloss-splom-line-rnd-1.png" style="display: block; margin: auto;" /> ] -- .pull-right-width-40[ ``` r splom(sample_n(stackloss, nrow(stackloss)), panel = pfun) ``` ] -- .pull-right-width-40[ This is sometimes called _graphical inference_. ] --- > Hadley Wickham, Dianne Cook, Heike Hofmann, and Andreas Buja, > "Graphical Inference for Infovis," _IEEE Transactions on Visualization > and Computer Graphics_, Vol. 16, No. 6, November/December 2010. -- Picking one plot and embedding the true order among 19 random orderings: .pull-left[ <img src="timeseries2_files/figure-html/stack-loss-embed-1.png" style="display: block; margin: auto;" /> ] -- .pull-right[ True order: plot 20. ] --- The code: ``` r N <- 20 k <- sample(N, 1) ord <- seq_len(nrow(stackloss)) ss <- function(i) { if (i == k) sl <- stackloss else sl <- sample_n(stackloss, nrow(stackloss)) mutate(sl, order = ord, sample = i) } d <- lapply(1 : N, ss) d <- do.call(rbind, d) library(ggplot2) ggplot(d, aes(x = Acid.Conc., y = stack.loss, color = order)) + geom_path(size = 0.8) + facet_wrap(~ sample) + guides(color = "none") ``` --- layout: true ## Lynx-Hare Data --- Trapping data on Canadian lynx and snowshoe hare: <!-- http://people.whitman.edu/~hundledr/courses/M250F03/M250.html http://people.whitman.edu/~hundledr/courses/M250F03/LynxHare.txt --> .pull-left[ <img src="timeseries2_files/figure-html/lynxhare-facets-1.png" style="display: block; margin: auto;" /> ] -- .pull-right-width-40[ Cycle patterns are similar. ] -- .pull-right-width-40[ ``` r lynxhare <- read.table("https://www.stat.uiowa.edu/~luke/data/LynxHare.txt") names(lynxhare) <- c("year", "hare", "lynx") lhts <- ts(select(lynxhare, -year), start = 1845) autoplot(lhts, facets = TRUE) ``` ] --- Lynx cycles often lag behind: .pull-left[ <img src="timeseries2_files/figure-html/lynxhare-single-1.png" style="display: block; margin: auto;" /> ] -- .pull-right[ ``` r autoplot(lhts) ``` ] --- layout: false ## Things to Look For -- Changes in data definition. -- * Administrative definitions, such as employment/unemployment can change over time. -- * Administrative entities, such as countries and states, can change over time. -- * Instruments can change (satelite imagery replacing surface measurements). -- Outliers need not be globally outlying: departures from local patterns may also be important. -- Level shifts and changes in volatility. -- Shocks affecting level or volatility. --- ## Australian Airline Data [Weekly passenger loads](https://otexts.com/fpp2/graphics.html), in thousands, on Ansett Airlines between Australia’s two largest cities (available as `melsyd` in package `fpp`): .pull-left[ <img src="timeseries2_files/figure-html/australian-airlines-1.png" style="display: block; margin: auto;" /> ] -- .pull-right[ Some features: {{content}} ] -- * Strike in 1989. {{content}} -- * Increase in passenger load starting in second half of 1991. {{content}} -- * Dip in economy, rise in business in 1992 corresponding to an experiment with fewer economy/more business seats. {{content}} -- ``` r data(melsyd, package = "fpp") autoplot(melsyd, facets = TRUE) ``` --- ## Weekly SP 500 Closing Returns Weekly closing returns of the SP 500 from 2003 to September, 2012: .pull-left[ <img src="timeseries2_files/figure-html/SP500-1.png" style="display: block; margin: auto;" /> ] -- .pull-right[ Some features: {{content}} ] -- * Increased volatility at the start of the financial crisis. {{content}} -- * Volatility seems to be decreasing, but not very fast. {{content}} -- ``` r library(xts) library(astsa) autoplot(sp500w) ``` --- ## Forecasting -- Studying the past may help predict the future. -- Forecasting a time series usually involves choosing a model and running the model forward. -- Accuracy of forecasts decreases rapidly the farther ahead the forecast is made. -- .pull-left[ An example from the `forecast` package: `WWWusage` is a time series of the numbers of users connected to the Internet. ``` r library(forecast) fit <- ets(WWWusage) fc <- forecast(WWWusage, model = fit) autoplot(fc) ``` ] .pull-right[ <img src="timeseries2_files/figure-html/WWWusage-1.png" style="display: block; margin: auto;" /> ] --- layout: true ## Labeling Trend Lines --- For trend lines using a standard guide can be a little awkward. -- Average life expectancy against year for the continents in the `gapminder` data: -- .pull-left[ <img src="timeseries2_files/figure-html/trend-lines-guide-1.png" style="display: block; margin: auto;" /> ] -- .pull-right[ ``` r library(gapminder) d <- group_by(gapminder, continent, year) |> summarize(avgLifeExp = mean(lifeExp)) |> ungroup() p <- ggplot(d, aes(x = year, y = avgLifeExp, color = continent)) + geom_line(size = 1) p ``` ] --- With the legend on the right, a small improvement is to order the levels to match the order of the right hand curve values: -- .pull-left[ <img src="timeseries2_files/figure-html/trend-lines-guide-reorder-1.png" style="display: block; margin: auto;" /> ] -- .pull-right.small-code[ ``` r d_last <- filter(d, year == max(year)) |> mutate(continent = reorder(continent, -avgLifeExp)) p %+% mutate(d, continent = factor(continent, levels(d_last$continent))) ``` ] --- _Direct labeling_ is often a better choice. -- One option is to use `geom_text` at the ends of the lines. -- This may need some additional adjustments to margins and clipping behavior: -- .pull-left[ <img src="timeseries2_files/figure-html/trend-lines-direct-1.png" style="display: block; margin: auto;" /> ] -- .pull-right[ ``` r p <- p + guides(color = "none") d_last <- filter(d, year == max(year)) p + geom_text(aes(label = continent), data = d_last, size = 5, hjust = -0.1) + coord_cartesian(clip = "off", expand = FALSE) + theme(plot.margin = margin(30, 60, 10, 10)) ``` ] --- Another option is to use a second axis: <!-- Adapted from https://wilkelab.org/SDS375/slides/redundant-coding.html#16 --> -- .pull-left[ <img src="timeseries2_files/figure-html/trend-lines-sec-axis-1.png" style="display: block; margin: auto;" /> ] -- .pull-right[ ``` r p + scale_y_continuous( sec.axis = dup_axis( breaks = d_last$avgLifeExp, labels = d_last$continent, name = NULL)) + scale_x_continuous(expand = c(0, 0)) ``` ] --- It is also possible to place the labels along the lines. The [`geomtextpath` package](https://allancameron.github.io/geomtextpath/) provides support for this. -- .pull-left[ <img src="timeseries2_files/figure-html/trend-lines-text-path-1.png" style="display: block; margin: auto;" /> ] -- .pull-right[ ``` r library(geomtextpath) ggplot(d, aes(x = year, y = avgLifeExp, color = continent)) + geom_textline(aes(label = continent), linewidth = 1, vjust = -0.5, hjust = 0.95) + guides(color = "none") ``` ] --- layout: true ## Bump Charts --- .pull-left[ Top 10 countries in terms of GDP per capita in 2007: .hide-code[ ``` r top_gdp <- filter(gapminder, continent == "Europe", year == max(year)) |> slice_max(gdpPercap, n = 10) |> pull(country) d <- filter(gapminder, country %in% top_gdp) |> group_by(year) |> mutate( rank = min_rank(desc(gdpPercap))) |> ungroup() d_last <- filter(d, year == max(year)) ggplot(d, aes(year, gdpPercap, color = country)) + geom_line() ``` <img src="timeseries2_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> ] ] -- .pull-right[ The dominant feature is the overall trend. {{content}} ] -- One option: show deviations from the overall trend. {{content}} -- Another option: show ranks within years. {{content}} -- This is called a _bump chart_. --- .pull-left[ A simple bump chart: .hide-code[ ``` r ggplot(d, aes(year, rank, color = country)) + geom_line(size = 1) + scale_y_reverse(breaks = 1:10) + geom_text(aes(label = country), data = d_last, size = 5, hjust = -0.1) + coord_cartesian(clip = "off", expand = FALSE) + theme(plot.margin = margin(30, 90, 10, 10)) + guides(color = "none") ``` <img src="timeseries2_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> ] ] -- .pull-right[ The `ggbump` package provides `geom_bump` for drawing smooth bump chart lines: .hide-code[ ``` r library(ggbump) ggplot(d, aes(year, rank, color = country)) + geom_bump(size = 1) + scale_y_reverse(breaks = 1:10) + geom_text(aes(label = country), data = d_last, size = 5, hjust = -0.1) + coord_cartesian(clip = "off", expand = FALSE) + theme(plot.margin = margin(30, 90, 10, 10)) + guides(color = "none") ``` <img src="timeseries2_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> ] ] --- layout: false ## Other Approaches A [talk from 2019](https://visualisingdata.com/2019/04/talk-slides-design-of-time/) describes a wide range of options for visualizing temporal data. -- ## Reading Chapters [_Visualizing time series and other functions of an independent variable_](https://clauswilke.com/dataviz/time-series.html) and [_Visualizing trends_](https://clauswilke.com/dataviz/visualizing-trends.html) in [_Fundamentals of Data Visualization_](https://clauswilke.com/dataviz/).
//adapted from Emi Tanaka's gist at //https://gist.github.com/emitanaka/eaa258bb8471c041797ff377704c8505