Background

The Grammar of Graphics is a language proposed by Leland Wilkinson for describing statistical graphs.

Wilkinson, L. (2005), The Grammar of Graphics, 2nd ed., Springer.

The grammar of graphics has served as the foundation for the graphics frameworks in SPSS, Vega-Lite and several other systems.

ggplot2 represents an implementation and extension of the grammar of graphics for R.

Wickham, H. (2016), ggplot2: Elegant Graphics for Data Analysis, 2nd ed., Springer. 3rd ed. in progress.

On line documentation: https://ggplot2.tidyverse.org/reference/index.html.

Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund (2023), R for Data Science (2nd Edition), O’Reilly.

Data visualization cheatsheet

Winston Chang (2018), R Graphics Cookbook, 2nd edition, O’Reilly. (Book source on GitHub)

The idea is that any basic plot can be built out of a combination of

ggplot2 provides tools for specifying these components and adjusting their features.

Many components and features are provided by default and do not need to be specified explicitly unless the defaults are to be changed.

A Basic Template

The simplest graph needs a data set, a geom, and a mapping:

ggplot(data = <DATA>) + <GEOM>(mapping = aes(<MAPPINGS>))

The appearance of geom objects is controlled by aesthetic features.

Each geom has some required and some optional aesthetics.

For geom_point the required aesthetics are

Optional aesthetics include

geom_point is used to produce a scatter plot.

Scatter Plots Using geom_point

The mpg data set included in the ggpllot2 package includes EPA fuel economy data from 1999 to 2008 for 38 popular models of cars.

mpg
## # A tibble: 234 × 11
##    manufacturer model      displ  year   cyl trans drv     cty   hwy fl    class
##    <chr>        <chr>      <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
##  1 audi         a4           1.8  1999     4 auto… f        18    29 p     comp…
##  2 audi         a4           1.8  1999     4 manu… f        21    29 p     comp…
##  3 audi         a4           2    2008     4 manu… f        20    31 p     comp…
##  4 audi         a4           2    2008     4 auto… f        21    30 p     comp…
##  5 audi         a4           2.8  1999     6 auto… f        16    26 p     comp…
##  6 audi         a4           2.8  1999     6 manu… f        18    26 p     comp…
##  7 audi         a4           3.1  2008     6 auto… f        18    27 p     comp…
##  8 audi         a4 quattro   1.8  1999     4 manu… 4        18    26 p     comp…
##  9 audi         a4 quattro   1.8  1999     4 auto… 4        16    25 p     comp…
## 10 audi         a4 quattro   2    2008     4 manu… 4        20    28 p     comp…
## # ℹ 224 more rows

A simple scatter plot:

ggplot(mpg) +
    geom_point(aes(x = displ,
                   y = hwy))

Map color to vehicle class:

ggplot(mpg) +
    geom_point(aes(x = displ,
                   y = hwy,
                   color = class))

And map shape to number of cylinders:

ggplot(mpg) +
    geom_point(aes(x = displ,
                   y = hwy,
                   color = class,
                   shape = factor(cyl)))

Perception:

Aesthetics can be mapped to a variable or set to a fixed common value.

This can be used to override default settings:

ggplot(mpg) +
    geom_point(aes(x = displ,
                   y = hwy),
               color = "blue",
               shape = 1)

Changing the size aesthetics makes shapes easier to recognize:

ggplot(mpg) +
    geom_point(aes(x = displ,
                   y = hwy,
                   color = class,
                   shape = factor(cyl)),
               size = 3)

Perception: Still too many colors; still have interference.

Available point shapes are specified by number:

Shapes 1-20 have their color set by the color aesthetic and ignore the fill aesthetic.

For shapes 21-25 the color aesthetic specifies the border color and fill specifies the interior color.

Using shape 21 with cyl mapped to the fill aesthetic:

ggplot(mutate(mpg, cyl = factor(cyl))) +
    geom_point(aes(x = displ,
                   y = hwy,
                   fill = cyl),
               shape = 21,
               size = 4)

Perception: Borders, larger symbols, fewer colors help.

Specifying a new default is very different from specifying a constant value as an aesthetic.

Constant aesthetic: Rarely what you want:

ggplot(mpg) +
    geom_point(aes(x = displ,
                   y = hwy,
                   color = "blue"))

Default: Probably what you want:

ggplot(mpg) +
    geom_point(aes(x = displ,
                   y = hwy),
               color = "blue")

Geometric Objects

ggplot2 provides a number of geoms:

geom_abline             geom_area               geom_bar                geom_bin_2d
geom_bin2d              geom_blank              geom_boxplot            geom_col
geom_contour            geom_contour_filled     geom_count              geom_crossbar
geom_curve              geom_density            geom_density_2d         geom_density_2d_filled
geom_density2d          geom_density2d_filled   geom_dotplot            geom_errorbar
geom_errorbarh          geom_freqpoly           geom_function           geom_hex
geom_histogram          geom_hline              geom_jitter             geom_label
geom_line               geom_linerange          geom_map                geom_path
geom_point              geom_pointrange         geom_polygon            geom_qq
geom_qq_line            geom_quantile           geom_raster             geom_rect
geom_ribbon             geom_rug                geom_segment            geom_sf
geom_sf_label           geom_sf_text            geom_smooth             geom_spoke
geom_step               geom_text               geom_tile               geom_violin
geom_vline                                                              

Additional geoms are available in packages like ggforce, ggridges, and others described on the ggplot2 extensions site.

Geoms can be added as layers to a plot.

Mappings common to all, or most, geoms can be specified in the ggplot call:

ggplot(mpg,
       aes(x = displ,
           y = hwy)) +
    geom_smooth() +
    geom_point()

Geoms can also use different data sets.

One way to highlight Europe in a plot of life expectancy against log income for 2007 is to start with a plot of the full data:

library(dplyr)
library(gapminder)
gm_2007 <- filter(gapminder, year == 2007)

(p <- ggplot(gm_2007, aes(x = gdpPercap,
                          y = lifeExp)) +
     geom_point() +
     scale_x_log10())

Then add a layer showing only Europe:

gm_2007_eu <- filter(gm_2007, continent == "Europe")

p + geom_point(data = gm_2007_eu,
               color = "red",
               size = 3)

Statistical Transformations

All geoms use a statistical transformation (stat) to convert raw data to the values to be mapped to the object’s features.

The available stats are

stat_align              stat_bin                stat_bin_2d
stat_bin_hex            stat_bin2d              stat_binhex
stat_boxplot            stat_contour            stat_contour_filled
stat_count              stat_density            stat_density_2d
stat_density_2d_filled  stat_density2d          stat_density2d_filled
stat_ecdf               stat_ellipse            stat_function
stat_identity           stat_qq                 stat_qq_line
stat_quantile           stat_sf                 stat_sf_coordinates
stat_smooth             stat_spoke              stat_sum
stat_summary            stat_summary_2d         stat_summary_bin
stat_summary_hex        stat_summary2d          stat_unique
stat_ydensity                                   

Each geom has a default stat, and each stat has a default geom.

Stats can provide computed variables that can be mapped to aesthetic features.

For stat_bin some of the computed variables are

The density variable can be accessed as after_stat(dentity).

Older approaches that also work but are now discouraged:

By default, geom_histogram uses y = after_stat(count).

ggplot(faithful) +
    geom_histogram(aes(x = eruptions),
                   binwidth = 0.25,
                   fill = "grey",
                   color = "black")

Explicitly specifying y = after_stat(count) produces the same plot:

ggplot(faithful) +
    geom_histogram(aes(x = eruptions,
                       y = after_stat(count)),
                   binwidth = 0.25,
                   fill = "grey",
                   color = "black")

Using y = after_stat(density) produces a density scaled axis.

(p <- ggplot(faithful) +
     geom_histogram(aes(x = eruptions,
                        y = after_stat(density)),
                    binwidth = 0.25,
                    fill = "grey",
                    color = "black"))

stat_function can be used to add a density curve specified as a mixture of two normal densities:

(ms <- mutate(faithful,
              type = ifelse(eruptions < 3,
                            "short",
                            "long")) |>
     group_by(type) |>
     summarize(mean = mean(eruptions),
               sd = sd(eruptions),
               n = n()) |>
     mutate(p = n / sum(n)))
## # A tibble: 2 × 5
##   type   mean    sd     n     p
##   <chr> <dbl> <dbl> <int> <dbl>
## 1 long   4.29 0.411   175 0.643
## 2 short  2.04 0.267    97 0.357
f <- function(x)
    ms$p[1] * dnorm(x, ms$mean[1], ms$sd[1]) +
        ms$p[2] * dnorm(x, ms$mean[2], ms$sd[2])

p + stat_function(fun = f, color = "red")

Position Adjustments

The available position adjustments:

position_dodge        position_dodge2       position_fill
position_identity     position_jitter       position_jitterdodge
position_nudge        position_stack        

A bar chart showing the counts for the different cut categories in the diamonds data:

ggplot(diamonds, aes(x = cut)) +
    geom_bar()

Mapping clarity to fill shows the breakdown by both cut and clarity in a stacked bar chart:

ggplot(diamonds, aes(x = cut,
                     fill = clarity)) +
    geom_bar()

The default position for bar charts is position_stack:

ggplot(diamonds, aes(x = cut,
                     fill = clarity)) +
    geom_bar(position = "stack")

position_dodge produces side-by-side bar charts:

ggplot(diamonds, aes(x = cut,
                     fill = clarity)) +
    geom_bar(position = "dodge")

position_fill rescales all bars to be equal height to help compare proportions within bars.

ggplot(diamonds, aes(x = cut,
                     fill = clarity)) +
    geom_bar(position = "fill")

Using the counts to scale the widths would produce a spine plot, a variant of a mosaic plot.

This is easiest to do with the ggmosaic package.

position_jitter can be used with geom_point to avoid overplotting or break up rounding artifacts.

Another version of the Old Faithful data available as geyser in package MASS has some rounding in the duration variable:

data(geyser, package = "MASS")

## Adjust for different meaning of `waiting` variable
geyser2 <- na.omit(mutate(geyser,
                          duration = lag(duration)))

p <- ggplot(geyser2, aes(x = duration, y = waiting))
p + geom_point()

Jittering can help break up the distracting heaping of values on durations of 2 and 4 minutes.

The default amount of jittering isn’t quite enough in this case:

p + geom_point(position = "jitter")

To jitter only horizontally and by a larger amount you can use

p + geom_point(position =
                   position_jitter(height = 0,
                                   width = 0.1))

Coordinate Systems

Coordinate system functions include

coord_cartesian  coord_equal      coord_fixed      coord_flip
coord_map        coord_munch      coord_polar      coord_quickmap
coord_radial     coord_sf         coord_trans      

The default coordinate system is coord_cartesian.

Cartesian Coordinates

coord_cartesian can be used to zoom in on a particular regiion:

p + geom_point() +
    coord_cartesian(xlim = c(3, 4))

coord_fixed and coord_equal fix the aspect ratio for a cartesian coordinate system.

The aspect ratio is the ratio of the number physical display units per y unit to the number of physical display units per x unit.

The aspect ratio can be important for recognizing features and patterns.

river <- scan("https://www.stat.uiowa.edu/~luke/data/river.dat")
r <- data.frame(flow = river, month = seq_along(river))
ggplot(r, aes(x = month, y = flow)) +
    geom_point() +
    coord_fixed(ratio = 4)

Polar Coordinates

A filled bar chart

(p <- ggplot(diamonds) +
     geom_bar(aes(x = 1, fill = cut),
              position = "fill"))

is turned into a pie chart by changing to polar coordinates:

p + coord_polar(theta = "y")

Coordinate Systems for Maps

Coordinate systems are particularly important for maps.

Polygons for many political and geographic boundaries are available through the map_data function.

Boundaries for the lower 48 US states can be obtained as

usa <- map_data("state")

Polygon vertices are encoded by longitude and latitude.

Plotting these in the default cartesian coordinate system usually does not work well:

usa <- map_data("state")
m <- ggplot(usa, aes(x = long,
                     y = lat,
                     group = group)) +
    geom_polygon(fill = "white",
                 color = "black")
m

Using a fixed aspect ratio is better, but an aspect ratio of 1 does not work well:

m + coord_equal()

The problem is that away from the equator a one degree change in latitude corresponds to a larger distance than a one degree change in longitude.

The ratio of one degree longitude separation to one degree latitude separation for the latitude at the middle of Iowa of 41 degrees is

longlat <- cos(41 / 90 * pi / 2)
longlat
## [1] 0.7547096

A better map is obtained using the aspect ratio 1 / longlat:

m + coord_fixed(1 / longlat)

The best approach is to use a coordinate system designed specifically for maps.

There are many projections used in map making.

The default projection used by coord_map is the Mercator projection.

m + coord_map()

Proper map projections are non-linear; this is easier to see with an Albers projection:

m + coord_map("albers", 20, 50)

Scales

Scales are used for controlling the mapping of values to physical representations such as colors, shapes, and positions.

Scale functions are also responsible for producing guides for translating physical representations back to values, such as

There are currently 131 scale functions; some examples are

scale_color_gradient      scale_shape_manual     scale_x_log10
scale_color_manual        scale_size_area        scale_y_log10
scale_fill_gradient                              scale_x_sqrt
scale_fill_manual                                scale_y_sqrt

An experimental tool to help choosing scales has recently been introduced.

Start with a basic scatter plot:

(p <- ggplot(mpg, aes(x = displ,
                      y = hwy)) +
     geom_point())

Remove the x tick marks and labels (this can also be done with theme settings):

p + scale_x_continuous(labels = NULL,
                       breaks = NULL)

Change the tick locations and labels:

p + scale_x_continuous(labels =
                           paste(c(2, 4, 6), "ltr"),
                       breaks = c(2, 4, 6))

Use a logarithmic axis:

p + scale_x_log10(labels = paste(c(2, 4, 6), "ltr"),
                  breaks = c(2, 4, 6),
                  minor_breaks = c(3, 5, 7))

The Scales section in R for Data Science provides some more details.

Color assignment can also be controlled by scale functions.

For example, for some presidential approval ratings data

pr_appr
##         pres appr party year
## 1      Obama   79     D 2009
## 2     Carter   78     D 1977
## 3    Clinton   68     D 1993
## 4  G.W. Bush   65     R 2001
## 5     Reagan   58     R 1981
## 6 G.H.W Bush   56     R 1989
## 7      Trump   40     R 2017

the default color scale is not ideal:

ggplot(pr_appr,
       aes(x = appr, y = pres, fill = party)) +
    geom_col()

The common assignment of red for Republican and blue for Democrat can be obtained by

ggplot(pr_appr,
       aes(x = appr, y = pres, fill = party)) +
    geom_col() +
    scale_fill_manual(values
                      = c(R = "red", D = "blue"))

A better choice is to use a well-designed color palette:

ggplot(pr_appr,
       aes(x = appr, y = pres, fill = party)) +
    geom_col() +
    colorspace::scale_fill_discrete_diverging(
                    palette = "Blue-Red 2")

Facets

Faceting uses the small multiples approach to introduce additional variables.

For a single variable facet_wrap is usually used:

p <- ggplot(mpg) +
    geom_point(aes(x = displ,
                   y = hwy))
p + facet_wrap(~ class)

For two variables, each with a modest number of categories, facet_grid can be effective:

p + facet_grid(factor(cyl) ~ drv)

To show common data in all facets make sure the data does not contain the faceting variable.

This was used to show muted views of the full data in faceted plots.

A faceted plot of the gapminder data:

library(gapminder)

years_to_keep <- c(1977, 1987, 1997, 2007)
gd <- filter(gapminder,
             year %in% years_to_keep)

ggplot(gd,
       aes(x = gdpPercap,
           y = lifeExp,
           color = continent)) +
    geom_point(size = 2.5) +
    scale_x_log10() +
    facet_wrap(~ year)

Add a muted version of the full data in the background of each panel:

library(gapminder)

years_to_keep <- c(1977, 1987, 1997, 2007)
gd <- filter(gapminder,
             year %in% years_to_keep)
gd_no_year <- mutate(gd, year = NULL)

ggplot(gd,
       aes(x = gdpPercap,
           y = lifeExp,
           color = continent)) +
    geom_point(data = gd_no_year,
               color = "grey80") +
    geom_point(size = 2.5) +
    scale_x_log10() +
    facet_wrap(~ year)

Usually facets use common axis scales, but one or both can be allowed to vary.

A useful approach for showing time series data with a good aspect ratio can be to split the data into facets for non-overlapping portions of the time axis.

pd <- rep(paste(seq(1, by = 32, length.out = 4),
                seq(32, by = 32, length.out = 4),
                sep = " - "),
         each =  32)
rd <- data.frame(month = seq_along(river),
                 flow = river,
                 panel = pd)
ggplot(rd, aes(x = month,
               y = flow)) +
    geom_point() +
    facet_wrap(~ panel,
               scale = "free_x",
               ncol = 1)

Facet arrangement can also be used to convey other information, such as geographic location.

The geofacet package allows facets to be placed in approximate locations of different geographic regions.

An example for data from US states:

library(geofacet)
ggplot(state_unemp, aes(year, rate)) +
    geom_line() +
    facet_geo(~ state,
              grid = "us_state_grid2",
              label = "code") +
    scale_x_continuous(labels =
                           function(x) paste0("'", substr(x, 3, 4))) +
    labs(title = "Seasonally Adjusted US Unemployment Rate 2000-2016",
         caption = "Data Source: bls.gov",
         x = "Year",
         y = "Unemployment Rate (%)") +
    theme(strip.text.x = element_text(size = 6),
          axis.text = element_text(size = 5))

Arrangement according to a calendar can also be useful.

Themes

ggplot2 supports the notion of themes for adjusting non-data appearance aspects of a plot, such as

Theme elements can be customized in several ways:

The full documentation of the theme function lists many customizable elements.

One simple example:

ggplot(mutate(mpg, cyl = factor(cyl))) +
    geom_point(aes(x = displ,
                   y = hwy,
                   fill = cyl),
               shape = 21,
               size = 3) +
    theme(legend.position = "top",
          axis.text = element_text(size = 12),
          axis.title = element_text(size = 14,
                                    face = "bold"))

Another example:

gthm <-
    theme(plot.background =
              element_rect(fill = "lightblue",
                           color = NA),
          panel.background =
              element_rect(fill = "pink"))
p + gthm

Some alternate complete themes provided by ggplot2 are

theme_bw        theme_gray      theme_minimal   theme_void
theme_classic   theme_grey      theme_dark      theme_light

Some examples:

p_bw <- p + theme_bw() + ggtitle("BW")

p_classic <- p + theme_classic() + ggtitle("Classic")

p_min <- p + theme_minimal() + ggtitle("Minimal")

p_void <- p + theme_void() + ggtitle("Void")

library(patchwork)
(p_bw + p_classic) / (p_min + p_void)

The ggthemes package provides some additional themes.

Some examples:

library(ggthemes)

p_econ <- p + theme_economist() + ggtitle("Economist")

p_wsj <- p + theme_wsj() + ggtitle("WSJ")

p_tufte <- p + theme_tufte() + ggtitle("Tufte")

p_few <- p + theme_few() + ggtitle("Few")

(p_econ + p_wsj) / (p_tufte + p_few)

ggthemes also provides theme_map that removes unnecessary elements from maps:

m + coord_map() + theme_map()

The Themes section in R for Data Science provides some more details.

A More Complete Template

ggplot(data = <DATA>) +
    <GEOM>(mapping = aes(<MAPPINGS>),
           stat = <STAT>,
           position = <POSITION>) +
    < ... MORE GEOMS ... > +
    <COORDINATE_ADJUSTMENT> +
    <SCALE_ADJUSTMENT> +
    <FACETING> +
    <THEME_ADJUSTMENT>

Labels and Annotations

A basic plot:

p <- ggplot(mpg, aes(x = displ,
                     y = hwy))
p1 <- p + geom_point(aes(color = factor(cyl)),
                     size = 2.5)
p1

Axis labels are based on the expressions given to aes.

This is convenient for exploration but usually not ideal for a report.

The labs() function can be used to change axis and legend labels:

p1 + labs(x = "Displacement (Liters)",
          y = "Highway Miles Per Gallon",
          color = "Cylinders")

The labs() function can also add a title, subtitle, and caption:

p2 <- p1 +
    labs(x = "Displacement (Liters)",
         y = "Highway Miles Per Gallon",
         color = "Cylinders",
         title = "Gas Mileage and Displacement",
         subtitle = paste("For models which had a new release every year",
                           "between 1999 and 2008"),
         caption = "Data Source: https://fueleconomy.gov/")
p2

Annotations can be used to provide popout that draws a viewer’s attention to particular features.

The annotate() function is one option:

p2 +
    annotate("label", x = 2.8, y = 43,
             label = "Volkswagens") +
    annotate("rect",
             xmin = 1.7, xmax = 2.1,
             ymin = 40, ymax = 45,
             fill = NA, color = "black")

Often more convenient are some geom_mark objects provided by the ggforce package:

library(ggforce)
p2 +
    geom_mark_hull(aes(filter = class == "2seater"),
                   description =
                       paste("2-Seaters have high displacement",
                             "values, but also high fuel efficiency",
                             "for their displacement.")) +
    geom_mark_rect(aes(filter = hwy > 40),
                   description =
                       "These are Volkswagens") +
    geom_mark_circle(aes(filter = hwy == 12),
                     description =
                         "Three pickups and an SUV.")

These annotations can be customized in a number of ways.

Arranging Plots

There are several tools available for assembling ensemble plots.

The patchwork package is a good choice.

A simple example:

p1 <- ggplot(mpg, aes(x = displ,
                      y = hwy)) +
    geom_point()
p2 <- ggplot(mpg, aes(x = cyl,
                      y = hwy,
                      group = cyl)) +
    geom_boxplot()
p3 <- ggplot(mpg, aes(x = cyl)) +
    geom_bar()

library(patchwork)
(p1 + p2) / p3

Animation

The gganimate package can be used to add animation to a ggplot graph.

Start with a plot p for all years in the gapminder data, with year in the background:

p <- gapminder |>
    arrange(desc(pop)) |>
    ggplot(aes(x = gdpPercap, y = lifeExp)) +
    geom_text(aes(x = 5000, y = 55, label = as.character(year)),
              size = 50, color = "grey",
              hjust = "center", vjust = "center") +
    geom_point(aes(size = pop, fill = continent), shape = 21) +
    scale_x_log10(labels = scales::comma) +
    ylim(c(20, 85)) +
    scale_size_area(max_size = 20,
                    labels = scales::comma,
                    breaks = c(0.25 * 10 ^ 9, 0.5 * 10 ^ 9, 10 ^ 9)) +
    scale_fill_manual(values = c(Africa = "deepskyblue",
                                 Asia = "red",
                                 Americas = "green",
                                 Europe = "gold",
                                 Oceania = "brown")) +
    labs(x = "Income", y = "Life expectancy") +
    theme(text = element_text(size = 16)) +
    guides(fill = guide_legend(title = "Continent",
                               override.aes = list(size = 5),
                               order = 1),
           size = guide_legend(title = "Population",
                               label.hjust = 1,
                               order = 2)) +
    theme_minimal() +
        theme(panel.border = element_rect(fill = NA, color = "grey20"))

A GIF animation:

library(gganimate)
animate(p +
        transition_states(
            year,
            transition_length = 2,
            state_length = 0))

A movie:

animate(p +
        transition_states(
            year,
            transition_length = 2,
            state_length = 0,
            wrap = FALSE),
        renderer = ffmpeg_renderer())

Interaction

Plotly

The ggplotly function in the plotly package can be used to add some interactive features to a plot created with ggplot2.

  • In an R session a call to ggplotly() opens may open a browser window with the interactive plot.

  • In an RStudio session the plot appears in the graphics panel.

  • In an Rmarkdown document the interactive plot is embedded in the html file.

Another interactive plotting approach that can be used from R is described in an Infoworld article.

A simple example using ggplotly():

library(ggplot2)
library(plotly)
p <- ggplot(mutate(mpg, cyl = factor(cyl))) +
    geom_point(aes(x = displ,
                   y = hwy,
                   fill = cyl),
               shape = 21,
               size = 3)
ggplotly(p)

Adding a text aesthetic allows the tooltip display to be customized:

p <- ggplot(mutate(mpg, cyl = factor(cyl))) +
    geom_point(aes(x = displ,
                   y = hwy,
                   fill = cyl,
                   text = paste(year,
                                manufacturer,
                                model)),
               shape = 21,
               size = 3)
ggplotly(p, tooltip = "text") |>
    style(hoverlabel = list(bgcolor = "white"))

Ggiraph

The ggiraph package provides another approach.

library(ggplot2)
library(ggiraph)
p <- ggplot(mutate(mpg, cyl = factor(cyl))) +
    geom_point_interactive(
        aes(x = displ,
            y = hwy,
            fill = cyl,
            tooltip = paste(year,
                            manufacturer,
                            model)),
        shape = 21,
        size = 3)
girafe(ggobj = p)

Grammar of Interactive Graphics

There have been several efforts to develop a grammar of interactive graphics, including ggvis and animint; neither seems to be under active development at this time.

A promising approach is Vega-Lite, with a Python interface Altair and an R interface altair to the Python interface.

An example using the altair package:

rub <- read.csv(here::here("rubber.csv"))

library(altair)

chartTH <- alt$Chart(rub)$
    mark_point()$
    encode(x = alt$X("H:Q", scale = alt$Scale(domain = range(rub$H))),
           y = alt$Y("T:Q", scale = alt$Scale(domain = range(rub$T))))

brush <- alt$selection_interval()

chartTH_brush <- chartTH$add_selection(brush)

chartTH_selection <-
    chartTH_brush$encode(color = alt$condition(brush,
                                               "Origin:N",
                                               alt$value("lightgray")))

chartAT <- chartTH_selection$
    encode(x = alt$X("T:Q", scale = alt$Scale(domain = range(rub$T))),
           y = alt$Y("A:Q", scale = alt$Scale(domain = range(rub$A))))

chartAT | chartTH_selection

The resulting linked plots:

Notes

Reading

Chapters Data visualization and Graphics for communication in R for Data Science, O’Reilly.

Chapter Make a plot in Data Visualization.

Chapter ggplot2 in Introduction to Data Science: Data Analysis and Prediction Algorithms with R.

Interactive Tutorial

An interactive learnr tutorial for these notes is available.

You can run the tutorial with

STAT4580::runTutorial("ggplot")

You can install the current version of the STAT4580 package with

remotes::install_gitlab("luke-tierney/STAT4580")

You may need to install the remotes package from CRAN first.

Exercises

  1. In the following expression, which value of the shape aesthetic produces a plot with points represented as triangles outlined in black colored according to the number of cylinders?
```r
library(ggplot2)
ggplot(mpg, aes(x = displ, y = hwy, fill = factor(cyl))) +
    geom_point(size = 4, shape = ---)
```
a. 15
b. 17
c. 21
d. 24
  1. It can sometimes be useful to plot text labels in a scatterplot instead of points. Consider the plot set up as

    library(ggplot2)
    library(dplyr)
    data(gapminder, package = "gapminder")
    p <- filter(gapminder, year == 2007) |>
        group_by(continent) |>
        summarize(gdpPercap = mean(gdpPercap), lifeExp = mean(lifeExp)) |>
        ggplot(aes(x = gdpPercap, y = lifeExp))

    Which of the following produces a plot with continent names on white rectangles?

    1. p + geom_text(aes(label = continent))
    2. p + geom_label(aes(label = continent))
    3. p + geom_label(label = continent)
    4. p + geom_text(text = continent)
  2. The following code plots a kernel density estimate for the eruptions variable in the faithful data set:

    library(ggplot2)
    ggplot(faithful, aes(x = eruptions)) + geom_density(bw = 0.1)

    Look at the help page for geom_density. Which of the following best describes what specifying a value for bw does:

    1. Changes the kernel used to construct the estimate.
    2. Changes the smoothing bandwidth to make the result more or less smooth.
    3. Changes the stat used to stat_bw.
    4. Has no effect on the retult.
  3. This code creates a map of Iowa counties.

    library(ggplot2)
    p <- ggplot(map_data("county", "iowa"),
                aes(x = long, y = lat, group = group)) +
        geom_polygon(, fill = "White", color = "black")

    Which of these produces a plot with an aspect ratio that best matches the map on this page?

    1. p + coord_fixed(0.5)
    2. p + coord_fixed(0.75)
    3. p + coord_fixed(1.35)
    4. p + coord_fixed(1.95)
  4. Consider the two plots created by this code (print the values of p1 and p2 to see the plots):

    library(ggplot2)
    data(gapminder, package = "gapminder")
    p1 <- ggplot(gapminder, aes(x = log(gdpPercap), y = lifeExp)) +
        geom_point() +
        scale_x_continuous(name = "")
    p2 <- ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
        geom_point() +
        scale_x_log10(labels = scales::comma, name = "") 

    Which of these statements is true?

    1. The x axis labels are identical in both plots.
    2. The x axis labels in p2 are in dollars; the labels in p1 are in log dollars.
    3. The x axis labels in p1 are in dollars; the labels in p2 are in log dollars.
    4. There are no labels on the x axis in p2.
  5. Consider the plot created by

    library(ggplot2)
    data(gapminder, package = "gapminder")
    p <- ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
        geom_point() +
        scale_x_log10(labels = scales::comma) 

    Which of these expressions produces a plot with a white background?

    1. p
    2. p + theme_grey()
    3. p + theme_classic()
    4. p + ggthemes::theme_economist()
  6. There are many different ways to change the x axis label in ggplot. Consider the plot created by

    library(ggplot2)
    p <- ggplot(mpg, aes(x = displ, y = hwy)) +
        geom_point()

    Which of the following does not change the x axis label to Displacement?

    1. p + labs(x = "Displacement")
    2. p + scale_x_continuous("Displacement")
    3. p + xlab("Displacement")
    4. p + theme(axis.title.x = "Displacement")
