Color is very effective when used well.

But using color well is not easy.

Some of the issues:

An internet “controversy” in 2015: The Dress (and a follow-up article)

Color Spaces

RGB and HSV Color Spaces

Computer monitors and projectors work in terms of red, green, and blue light.

Amounts of red green and blue (and alpha level) are stored as integers in the range between 0 and 255 (8-bit bytes).

cols <- c("red", "green", "blue", "yellow", "cyan", "magenta")
rgbcols <- col2rgb(cols); colnames(rgbcols) <- cols
rgbcols
##       red green blue yellow cyan magenta
## red   255     0    0    255    0     255
## green   0   255    0    255  255       0
## blue    0     0  255      0  255     255

Colors are often encoded in hexadecimal form (base 16).

rgb(1, 0, 0)   ## pure red
## [1] "#FF0000"
rgb(0, 0, 1)   ## pure blue
## [1] "#0000FF"
rgb(255, 0, 0, maxColorValue = 255)
## [1] "#FF0000"
rgb(0, 0, 255, maxColorValue = 255)
## [1] "#0000FF"

Hue, saturation, value (HSV) is a simple transformation of RGB.

rgb2hsv(rgbcols)
##   red     green      blue    yellow cyan   magenta
## h   0 0.3333333 0.6666667 0.1666667  0.5 0.8333333
## s   1 1.0000000 1.0000000 1.0000000  1.0 1.0000000
## v   1 1.0000000 1.0000000 1.0000000  1.0 1.0000000

HSV is a little more convenient since it allows the hue to be controlled separately.

But saturation and value attributes are not particularly useful for specifying colors that work well perceptually.

A color wheel of fully saturated colors:

wheel <- function(col, radius = 1, ...)
    pie(rep(1, length(col)),
        col = col, radius = radius, ...)
wheel(rainbow(6))

Removing saturation:

library(colorspace)
wheel(desaturate(rainbow(6)))

Fully saturated yellow is brighter than red, which is brighter than blue.

HCL Color Space

The rainbow palette of the color wheel is often a default in visualization systems.

A blog post illustrates why this is a bad idea.

The rainbow hues are evenly spaced in the color spectrum, but chroma and luminance are not.

Luminance in particular is not monotone across the palette.

rgb2hcl <- function(col) {
    ## ignores alpha
    col <- RGB(t(col[1 : 3, ]) / 255)
    col <- as(col, "polarLUV")
    col <- t(col@coords[, 3 : 1, drop = FALSE])
    rownames(col) <- tolower(rownames(col))
    col
}
col2hcl <- function(col) rgb2hcl(col2rgb(col))
pal <- function(col, border = "light gray", ...) {
    n <- length(col)
    plot(0, 0, type = "n", xlim = c(0, 1), ylim = c(0, 1),
         axes = FALSE, xlab = "", ylab = "", ...)
    rect((0 : (n - 1)) / n, 0, (1 : n) / n, 1, col = col, border = border)
}

par(mfrow = c(1, 2))
pal(rainbow(6), main = "Saturated Rainbow")
pal(desaturate(rainbow(6)), main = "Desaturated")

specplot(rainbow(6), ldw = 4)

The hue, chroma, luminance (HCL) space allows separate control of:

  • Hue, the color.

  • Chroma, the amount of the color.

  • Luminance, or perceived brightness.

HCL makes it easier to create perceptually uniform color palettes.

A palette with constant chroma, evenly spaced hues and evenly spaced luminance values:

rain6 <- hcl(seq(0, 360 * 5 / 6, len = 6), 50, seq(60, 80, len = 6))
par(mfrow = c(1, 2))
pal(rain6, main = "Uniform Rainbow")
pal(desaturate(rain6), main = "Desaturated")

specplot(rain6, lwd = 4)

For a fully saturated red, varying only chroma to reduce the amount of color:

red_hcl <- list(h = 12.17395, c = 179.04076, l = 53.24059)
specplot(hcl(red_hcl$h, red_hcl$c * seq(0, 1, len = 10), red_hcl$l), lwd = 4)

For a given hue, not all combinations of chroma and luminance are possible.

In particular, for low luminance values the available chroma range is limited.

The ggplot book contains this visualization of the HCL space.

  • Hue is mapped to angle.
  • Chroma is mapped to radius.
  • Luminance is mapped to facets.

The origins with zero chroma are shades of grey.

HCL is a transformation of the CIEluv color space designed for perceptual uniformity.

The definition of the luminance takes into account the light sensitivity of a standard human observer at various wave lengths.

Light sensitivity for different wave lengths in daylight conditions (photopic vision) and under dark adapted conditions (scotopic vision):

Munsell Color Space

Another color space, similar to HCL, is the Munsell system developed in the early 1900s.

This system uses a Hue, Value, Chroma encoding.

The munsell package provides an R interface and is used in ggplot.

Munsell specifications are of the form "H V/C", such as 5R 5/10.

Possible hues are

library(munsell, exclude = "desaturate")
mnsl_hues()
##  [1] "2.5R"  "5R"    "7.5R"  "10R"   "2.5YR" "5YR"   "7.5YR" "10YR"  "2.5Y" 
## [10] "5Y"    "7.5Y"  "10Y"   "2.5GY" "5GY"   "7.5GY" "10GY"  "2.5G"  "5G"   
## [19] "7.5G"  "10G"   "2.5BG" "5BG"   "7.5BG" "10BG"  "2.5B"  "5B"    "7.5B" 
## [28] "10B"   "2.5PB" "5PB"   "7.5PB" "10PB"  "2.5P"  "5P"    "7.5P"  "10P"  
## [37] "2.5RP" "5RP"   "7.5RP" "10RP"

V should be an integer between 0 and 10.

C should be an even integer less than 24, but not all combinations are possible.

Adjusting colors in the value, chroma, and hue dimensions:

my_blue <- "5PB 5/8"
plot_mnsl(c(
    lighter(my_blue, 2),              my_blue,  darker(my_blue, 2),
    munsell::desaturate(my_blue, 2),  my_blue,  saturate(my_blue, 2),
    rygbp(my_blue, 2),                my_blue,  pbgyr(my_blue, 2)))

Creating scales:

plot_mnsl(sapply(0 : 6, darker, col = "5PB 7/4")) + facet_wrap(~ num, nrow = 1)

Examining available colors:

hue_slice("5R")

value_slice(5)

Complementary colors:

complement_slice("5R")

Opponent Process Theory

The Opponent Process Model of vision says that the brain divides the visual signal among three opposing contrast pairs:

The black/white pair corresponds to luminance in HCL

Hue and chroma in HCL span the two chromatic axes.

The luminance axis has higher resolution than the two chromatic axes.

The major form of color vision deficiency reflects an inability to distinguish differences along the red/green axis.

Impairment along the yellow/blue axis does occur as well but is much rarer.

Contrast and Comparisons

Vision reacts to differences, not absolutes.

Small differences in shading or hue can be recognized when objects are contiguous but be much harder to see when they are separated.

Simultaneous brightness contrast: a grey patch on a dark background looks lighter than the same grey patch on a light background.

plot(0, 0, type = "n", xlim = c(0, 1), ylim = c(0, 1),
     axes = FALSE, xlab = "", ylab = "")
rect(0, 0, 0.5, 1, col = "lightgrey", border = NA)
rect(0.5, 0, 1, 1, col = "darkgrey", border = NA)
rect(0.2, 0.3, 0.3, 0.7, col = "grey", border = NA)
rect(0.7, 0.3, 0.8, 0.7, col = "grey", border = NA)

An example we saw earlier:

Some more are available here, including:

Using luminance or grey scale alone does not work well for encoding categorical variables against a key.

Grey scale can be effective for showing continuous transitions in pseudo-color images.

filled.contour(volcano, color.palette = grey.colors)

Grey scale is less effective for segmented maps, or choropleth maps; only a few levels can be accurately decoded.

Interactions with Size, Background and Proximity

For small items more contrast and more saturated colors are needed:

x <- runif(6, 0.1, 0.9)
y <- runif(6, 0.1, 0.9)
cols <- c("red", "green", "blue", "yellow", "cyan", "magenta")
f <- function(size = 1, black = FALSE) {
    plot(x, y, type = "n", xlim = c(0, 1), ylim = c(0, 1))
    if (black) rect(0, 0, 1, 1, col = "black")
    text(x, y, cols, col = cols, cex = size)
}
opar <- par(mfrow = c(2, 2))
f(1)
f(4)
f(1, TRUE)
f(4, TRUE)

par(opar)

Variations in luminance are particularly helpful for seeing fine structure, such as small text or small symbols:

plot(0, type = "n", xlim = c(0, 1), ylim = c(0, 1),
     axes = FALSE, xlab = "", ylab = "")
rect(0, 0, 1, 1, col = hcl(0)) ## defaults: c = 35, l = 85
qbf <- "The quick brown fox jumps ..."
text(0.5, 0.3, label = qbf, col = hcl(180))       ## hue
text(0.5, 0.5, label = qbf, col = hcl(0, c = 70)) ## chroma
text(0.5, 0.7, label = qbf, col = hcl(0, l = 50)) ## luminance

Chrominance (hue and chroma) differences alone are not sufficient for small items.

Ware recommends a luminance contrast of at least 3:1 for small text; 10:1 is preferable.

Small areas also need variation in more than hue:

Contrasting borders can help for larger areas with similar luminance:

Color Specification in R

A large number of named colors are available (currently 657).

Some examples:

col2rgb("red")
##       [,1]
## red    255
## green    0
## blue     0
col2rgb("forestgreen")
##       [,1]
## red     34
## green  139
## blue    34
col2rgb("deepskyblue")
##       [,1]
## red      0
## green  191
## blue   255
col2rgb("firebrick")
##       [,1]
## red    178
## green   34
## blue    34

These will show some details:

colors()
demo(colors)

The available named colors follow a widely used standard.

These colors include the 140 web colors supported on modern browsers.

Individual colors can also be specified using rgb() or hcl() or as hexadecimal specifications.

library(colorspace)
hex2RGB("#FF0000")
##      R G B
## [1,] 1 0 0

Using color spaces:

rgb(1, 0, 0)
## [1] "#FF0000"
rgb(255, 0, 0, max = 255)
## [1] "#FF0000"
rgb2hsv(col2rgb("red"))
##   [,1]
## h    0
## s    1
## v    1

Converting to HCL:

rgb2hcl <- function(col) {
    ## ignores alpha
    col <- RGB(t(col[1 : 3, ]) / 255)
    col <- as(col, "polarLUV")
    col <- t(col@coords[, 3 : 1, drop = FALSE])
    rownames(col) <- tolower(rownames(col))
    col
}
col2hcl <- function(col) rgb2hcl(col2rgb(col))

col2hcl("red")
##        [,1]
## h  12.17395
## c 179.04076
## l  53.24059
col2hcl("green")
##       [,1]
## h 127.7235
## c 135.7811
## l  87.7351
col2hcl("blue")
##        [,1]
## h 265.87278
## c 130.67593
## l  32.29567
col2hcl("yellow")
##        [,1]
## h  85.87351
## c 107.06462
## l  97.13951
col2hcl("cyan")
##        [,1]
## h 192.16714
## c  72.09794
## l  91.11330
col2hcl("magenta")
##        [,1]
## h 307.72618
## c 137.40166
## l  60.32351

hcl(12.17, 179.04, 53.24)
## [1] "#FF0000"

Color pickers can help:

When a set of colors is needed to encode variable values it is usually best to use a suitable palette.

Color Palettes

Color palettes are collections of colors that work well together.

It is useful to distinguish three kinds of palettes:

Tools for selecting palettes include:

A blog post with some further options.

Some current US government work on color palettes; more extensive notes and code.

R color palette functions:

These all take the number of colors as an argument, as well as some additional optional arguments.

The hcl.color() function provides access to the palettes defined in the colorspace package.

colorRampPalette() can be used to create a palette function that interpolates between a set of colors using

rwb <- colorRampPalette(
    c("red", "white", "blue"))
rwb(5)
## [1] "#FF0000" "#FF7F7F" "#FFFFFF" "#7F7FFF" "#0000FF"
filled.contour(volcano,
               color.palette = rwb,
               asp = 1)

With more perceptually comparable extremes (from the Blue-Red palette of HCL Wizard):

rwb1 <- colorRampPalette(
    c("#8E063B", "white", "#023FA5"))
filled.contour(volcano,
               color.palette = rwb1,
               asp = 1)

An alternative uses the muted function from package scales:

rwb2 <- colorRampPalette(
    c(scales::muted("red"),
      "white",
      scales::muted("blue")))
filled.contour(volcano,
               color.palette = rwb2,
               asp = 1)

Most base and lattice functions allow a vector of colors to be specified.

Some, like filled.contour() and levelplot() allow a palette function to be provided.

ggplot provides a framework for specifying palette functions to use with scale_color_xyz() and scale_fill_xyz() functions.

Packages like colorspace and viridis provide additional scale_color_xyz() and scale_fill_xyz() functions.

RColorBrewer Palettes

The available palettes:

library(RColorBrewer)
display.brewer.all()

Palettes in the first group are sequential.

The second group are qualitative.

The third group are diverging.

The "Blues" palette:

display.brewer.pal(9, "Blues")

As RGB values:

brewer.pal(9, "Blues")
## [1] "#F7FBFF" "#DEEBF7" "#C6DBEF" "#9ECAE1" "#6BAED6" "#4292C6" "#2171B5"
## [8] "#08519C" "#08306B"

The palettes are limited to a maximum number of levels.

To obtain more levels you can interpolate.

brewer.pal(10, "Blues")
## Warning in brewer.pal(10, "Blues"): n too large, allowed maximum for palette Blues is 9
## Returning the palette you asked for with that many colors
## [1] "#F7FBFF" "#DEEBF7" "#C6DBEF" "#9ECAE1" "#6BAED6" "#4292C6" "#2171B5"
## [8] "#08519C" "#08306B"

pbrbl <- colorRampPalette(brewer.pal(9, "Blues"), interpolate = "spline")
pbrbl
## function (n) 
## {
##     x <- ramp(seq.int(0, 1, length.out = n))
##     if (ncol(x) == 4L) 
##         rgb(x[, 1L], x[, 2L], x[, 3L], x[, 4L], maxColorValue = 255)
##     else rgb(x[, 1L], x[, 2L], x[, 3L], maxColorValue = 255)
## }
## <bytecode: 0x5588859e1090>
## <environment: 0x5588833c0150>
pbrbl(10)
##  [1] "#F7FBFF" "#E0ECF7" "#CCDEF1" "#ADD0E5" "#81BBDA" "#57A1CF" "#3687C0"
##  [8] "#1A69B0" "#064D98" "#08306B"

Colorspace Palettes

The colorspace package provides a wide range of pre-defined palettes:

library(colorspace)
hcl_palettes(plot = TRUE)

A particular number of colors from one of these palettes can be obtained with

qualitative_hcl(4, palette = "Dark 3")
## [1] "#E16A86" "#909800" "#00AD9A" "#9183E6"

The functions sequential_hcl() and diverging_hcl() are analogous.

For use with ggplot2 the package provides scale functions like scale_fill_discrete_qualitative() and scale_color_continuous_sequential().

A package vignette provides more details and background.

Viridis Palettes

These are provided in package viridisLite.

Palette functions are viridis(), mako(), etc..

They are also available via the hcl.colors() function.

For use in ggplot they can be specified in the viridis color scale functions.

Palettes in R Graphics

ggplot uses scale_color_xyz() or scale_fill_xyz().

For discrete scales the choices for xyz include

For continuous scales the choices for xyz include

Others are available in packages such as colorspace.

The default qualitative and sequential discrete palettes:

library(gapminder)
gap_2007 <- filter(gapminder, year == 2007) |> slice_max(pop, n = 20)
p <- mutate(gap_2007, country = reorder(country, pop)) |>
    ggplot(aes(x = gdpPercap, y = lifeExp, fill = continent)) +
    scale_size_area(max_size = 10) +
    scale_x_log10() +
    geom_point(size = 4, shape = 21) +
    guides(fill = guide_legend(override.aes = list(size = 4)))
p1 <- p + ggtitle("Hue")
p2 <- p + scale_fill_viridis_d() + ggtitle("Viridis")
library(patchwork)
p1 + p2

Discrete examples for brewer, colorspace and manual:

p1 <- p + scale_fill_brewer(palette = "Set1") +
    ggtitle("Brewer Set1")
p2 <- p + scale_fill_brewer(palette = "Set2") +
    ggtitle("Brewer Set2")
p3 <- p + scale_fill_discrete_qualitative("Dark 3") +
    ggtitle("Colorspace Dark 3")
p4 <- p + scale_fill_manual(values = c(Africa = "red", Asia = "blue",
                                       Americas = "green", Europe = "grey")) +
    ggtitle("Manual")
(p1 + p2) / (p3 + p4)

The default for continuous scales is gradient from a dark blue to a light blue:

V <- data.frame(x = rep(seq_len(nrow(volcano)), ncol(volcano)),
                y = rep(seq_len(ncol(volcano)), each = nrow(volcano)),
                z = as.vector(volcano))
p <- ggplot(V, aes(x, y, fill = z)) + geom_raster() + coord_fixed()
p

Some alternatives:

p1 <- p + scale_fill_gradient2(
              low = "red", mid = "white", high = "blue",
              midpoint = median(volcano)) +
    ggtitle("Red-White-Blue Gradient")
p2 <- p + scale_fill_viridis_c() +
    ggtitle("Viridis")
p3 <- p + scale_fill_gradientn(
              colors = terrain.colors(8)) +
    ggtitle("Terrain")

vbins <- seq(80, by = 20, length.out = 7)
nc <- length(vbins) - 1
p4 <- ggplot(mutate(V, z = fct_rev(cut(z, vbins))),
             aes(x, y, fill = z)) +
    geom_raster() +
    scale_fill_manual(values = rev(terrain.colors(nc))) +
    ggtitle("Discretized Terrain")
(p1 + p2) / (p3 + p4)

Discretizing a continuous range to a modest number of levels can make decoding values from a legend easier.

Reduced Color Vision

Color vision deficiency affects about 10% of males, a smaller percentage of females.

The most common form is reduced ability to distinguish red and green.

Some web sites provide tools to simulate how a visualization would look to a color vision deficient viewer.

The R packages dichromat, colorspace, and colorblindr provide tools for simulating how colors would look to a color vision deficient viewer for three major types of color vision deficiency:

An article explaining the color vision impairment simulation is available here

Using some tools from packages colorspace and colorblinder we can simulate what a plot would look like in grey scale and to someone with some of the major types of color impairment.

A plot with the default discrete color palette:

p <- ggplot(gap_2007, aes(gdpPercap, lifeExp, color = continent)) +
    geom_point(size = 4) +
    scale_x_log10() +
    guides(color = guide_legend(override.aes = list(size = 4)))
p

library(colorblindr)
library(colorspace)
library(grid)
color_check <- function(p) {
    p1 <- edit_colors(p + ggtitle("Desaturated"), desaturate)
    p2 <- edit_colors(p + ggtitle("deutan"), deutan)
    p3 <- edit_colors(p + ggtitle("protan"), protan)
    p4 <- edit_colors(p + ggtitle("tritan"), tritan)
    gridExtra::grid.arrange(p1, p2, p3, p4, nrow = 2)
}
color_check(p)

For the Viridis palette:

pv <- p + scale_color_viridis_d()
pv

color_check(pv)

The swatchplot() function in the colorspace package can be used with the cvd = TRUE argument to simulate how specific palettes work for different color vision deficiencies:

colorspace::swatchplot(rainbow(6), cvd = TRUE)

colorspace::swatchplot(hcl.colors(6), cvd = TRUE)

Two Issues to Watch Out For

Missing Values

It is common for default settings to not assign a color for missing values.

In a choropleth map with (made-up) data where one state’s value is missing this might not be noticed.

m <- map_data("state")
d <- data.frame(region = unique(m$region),
                val = ordered(sample(1 : 4, 49, replace = TRUE)))
m <- left_join(m, d, "region")
pm <- ggplot(m) +
    geom_polygon(aes(long, lat, group = group, fill = val)) +
    coord_map() +
    ggthemes::theme_map()

dnm <- mutate(m, val = replace(val, region == "michigan", NA))
pm %+% dnm

Unless the viewer is very familiar with US geography.

Or is from Michigan.

In a scatterplot there are even fewer cues:

gnc <- mutate(gap_2007, continent = replace(continent, country == "China", NA))
pv +
    pv %+% gnc
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

Specifying na.value = "red", or some other color, will make sure NA values are visible:

(pm %+% dnm +
 scale_fill_viridis_d(na.value = "red") + theme(legend.position = "top")) +
    (pv %+% gnc + scale_color_viridis_d(na.value = "red"))
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.

Using outlines can also help:

p1 <- pm %+% dnm +
    geom_polygon(aes(long, lat, group = group),
                 fill = NA, color = "black", linewidth = 0.1) +
    theme(legend.position = "top")
p2 <- pv %+% gnc +
    geom_point(shape = 21, fill = NA, color = "black", size = 4)
p1 + p2
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

A final plot might handle missing values differently, but for initial explorations it is a good idea to make sure they are clearly visible.

Aligning Diverging Palettes

Diverging palettes are very useful for showing deviations above or below a baseline.

par(mfrow = c(1, 2))
RColorBrewer::display.brewer.pal(7, "PRGn")
RColorBrewer::display.brewer.pal(6, "PRGn")

For a diverging palette to work properly, the palette base line needs to be aligned with the data baseline.

How to do this will depend on the palette, but you do need to keep this in mind when using a diverging palette.

Just using scale_fill_brewer is not enough when the value range is not symmetric around the baseline:

m <- map_data("state")
d <- data.frame(region = unique(m$region),
                val = ordered(sample((1 : 6) - 3, 49, replace = TRUE)))
m <- left_join(m, d, "region")
p <- ggplot(m) +
    geom_polygon(aes(long, lat, group = group, fill = val)) +
    coord_map() +
    ggthemes::theme_map() +
    theme(legend.position = "right")
p + scale_fill_brewer(palette = "PRGn")

Setting the scale limits explicitly forces a 7-category symmetric scale that aligns the zero value with the middle color:

lims <- as.character(-3 : 3)
p + scale_fill_brewer(palette = "PRGn",
                      limits = lims)

This shows a category in the legend for -3 that does not appear in the map.

This is often what you want.

But if you want to drop the -3 category, one option is to use a manual scale:

vals <- RColorBrewer::brewer.pal(7, "PRGn")
names(vals) <- lims
p + scale_fill_manual(values = vals[-1])

Bivariate Palettes

It is possible to encode two variables in a palette.

Some sample palettes:

Bivariate palettes are sometimes used in bivariate choropleth maps.

Some recommendations from Cynthia Brewer are available here.

A discussion of a recent example.

Unless one variable is binary, and the palette is very well chosen, it is hard to decode a visualization using a binary palette without constantly referring to the key.

Culture, Tradition, and Conventions

Colors can have different meanings in different cultures and at different times.

Conventions can also give colors particular meanings:

Traffic Lights

Traffic lights use red/green, even though this is a major axis of color vision deficiency.

The convention comes from railroads.

The red used generally contains some orange and the green contains blue to help with red/green color vision deficiency.

Position provides an alternate encoding. Orientations do vary.

Microarray Heatmaps

  • Microarrays are used for the analysis of gene-level changes and differences in bio-medical research.
  • Dyes are used that result in genes with a high response appearing red and genes with a low response appearing green.
  • In keeping with this physical characteristic of microarrays, a common visualization of the data is as a red/green heat map.

Red States and Blue States

It is now standard in the US to refer to Republican-leaning states as red states and Democrat-leaning states as blue states.

This is a fairly recent convention, dating back to the 2000 presidential election.

Prior to 1980 it was somewhat more traditional to use red for more left-leaning Democrats.

A map of the 1960 election results uses these more traditional colors.

In 1996 the New York Times used blue for Democrat, red for Republican, but the Washington Post used the opposite color scheme.

The long, drawn out process of the 2000 election may have contributed to fixing the color schema at the current convention.

Notes

Points need more saturation, luminance than areas.

False color images may benefit from discretizing.

Bivariate encodings (e.g. x = hue, y = luminance) are possible but tricky and not often a good idea. Best if at least one is binary.

Providing a second encoding, e.g. shape, position can help for color vision deficient viewers and photocopying.

In area plots and maps it is important to distinguish between base line values and missing values.

If observed values only cover part of a possible range, it is sometimes appropriate to use a color coding that applies to the entire possible range.

For diverging palettes, some care may be needed to make sure the neutral color and the neutral value are properly aligned.

Using a well-designed palette is usually better than creating your own.

Choosing a palette can consider many factors, including appearance and branding.

References

Few, Stephen. “Practical rules for using color in charts.” Visual Business Intelligence Newsletter 11 (2008): 25. (PDF)

Harrower, M. A. and Brewer, C. M. (2003). ColorBrewer.org: An online tool for selecting color schemes for maps. The Cartographic Journal, 40, 27–37. ColorBrewer web site. The RColopBrewer package provides an R interface.

Ihaka, R. (2003). Colour for presentation graphics, in K. Hornik, F. Leisch, and A. Zeileis (eds.), Proceedings of the 3rd International Workshop on Distributed Statistical Computing, Vienna, Austria. PDF. See also the colorspace package and vignette.

Lumley, T. (2006). Color coding and color blindness in statistical graphics. ASA Statistical Computing & Graphics Newsletter, 17(2), 4–7. PDF.

Munzner, T. (2014), Visualization Analysis and Design, Chapter 10.

Lisa Charlotte Muth (2021). 4-part series of blog posts on choosing color scales. Part 1; Part 2; Part 3; Part 4.

Lisa Charlotte Muth (2022). A detailed guide to colors in data vis style guides. Blog post.

Treinish, Lloyd A. “Why Should Engineers and Scientists Be Worried About Color?.” IBM Thomas J. Watson Research Center, Yorktown Heights, NY (2009): 46. (pdf)

Ware, C. (2012), Information Visualization: Perception for Design, 3rd ed, Chapters 3 & 4.

Zeileis, A., Murrell, P. and Hornik, K. (2009). Escaping RGBland: Selecting colors for statistical graphics, Computational Statistics & Data Analysis, 53(9), 3259-–3270 (PDF).

Achim Zeileis, Paul Murrell (2019). HCL-Based Color Palettes in grDevices. R Blog post.

Achim Zeileis et al. (2020). “colorspace: A Toolbox for Manipulating and Assessing Colors and Palettes.” Journal of Statistical Software, 96(1), 1-49. doi:10.18637/jss.v096.i01.

Coloring Political Statements

POLITIFACT reviews the accuracy of statements by politicians and publishes summaries of the results.

A 2016 post on Daily Kos included a visualization of the results for a number of politicians.

Kaiser Fung posted a critique at JunkCharts and proposed an alternative.

I scraped the data as of April 11, 2017, from POLITIFACT; they are available here.

if (! file.exists("polfac.dat"))
    download.file("https://stat.uiowa.edu/~luke/data/polfac.dat",
                  "polfac.dat")
pft <- read.table("polfac.dat")
vcp <- prop.table(as.matrix(pft), 1)[, 6 : 1]
colnames(vcp) <- gsub("\\.", " ", colnames(vcp))

head(vcp)
##          Pants on Fire     False Mostly False  Half True Mostly True       True
## Trump       0.16279070 0.3281654    0.1989664 0.14470284  0.12403101 0.04134367
## Bachmann    0.26229508 0.3606557    0.1311475 0.09836066  0.06557377 0.08196721
## Cruz        0.06779661 0.2796610    0.3050847 0.12711864  0.16101695 0.05932203
## Gingrich    0.13924051 0.1898734    0.2025316 0.25316456  0.12658228 0.08860759
## Palin       0.09523810 0.3015873    0.1428571 0.14285714  0.09523810 0.22222222
## Santorum    0.08333333 0.2833333    0.2000000 0.21666667  0.11666667 0.10000000

The Daily Kos chart is ordered by the percentage of statements that are more false than true. A function to produce a bar chart with a specified color palette:

## lattice version
polbars <- function(col = cm.colors(6)) {
    barchart(vcp[order(rowSums(vcp[, 1 : 3])), ], auto.key = TRUE,
             par.settings = list(superpose.polygon = list(col = col)))
}

## ggplot version

gvcp <- as.data.frame(vcp) |>
    rownames_to_column("Name") |>
    pivot_longer(-1, names_to = "Grade", values_to = "prop") |>
    mutate(Grade = fct_rev(fct_inorder(Grade)))
nm <- mutate(gvcp, Grade = ordered(Grade)) |>
    filter(Grade <= "Half True") |>
    group_by(Name) |>
    summarize(prop = sum(prop)) |>
    arrange(desc(prop)) |>
    pull(Name)
gvcp <- mutate(gvcp, Name = factor(Name, nm))

pvcp <- ggplot(gvcp, aes(Name, prop, fill = Grade)) +
    geom_col(position = "fill", width = 0.7) +
    coord_flip() +
    theme(legend.position = "top",
          plot.margin = margin(r = 50),
          legend.text = element_text(size = 10)) +
    scale_y_continuous(labels = scales::percent, expand = c(0, 0)) +
    guides(fill = guide_legend(title = NULL, nrow = 1, reverse = TRUE)) +
    labs(x = "", y = "")

polbars <- function(col = cm.colors(6))
    pvcp + scale_fill_manual(values = rev(col))

polbars()

The original Daily Kos chart seems to use a slightly modified version of the Color Brewer Spectral palette, a diverging palette.

polbars(brewer.pal(6, "Spectral"))

dkcols <- brewer.pal(6, "Spectral")
dkcols[4] <- "lightgrey"
polbars(dkcols)

The JunkCharts plot uses another diverging palette, close to the Blue-Red palette available in hclwizard.

rwbcols <- c("#4A6FE3", "#8595E1", "#B5BBE3", "#E2E2E2",
             "#E6AFB9", "#E07B91", "#D33F6A")
polbars(rwbcols)

polbars(rev(rwbcols))

Another diverging palette:

polbars(brewer.pal(7, "PiYG"))

A sequential palette:

polbars(rev(brewer.pal(6, "Oranges")))

Reading

Section Perception and Data Visualization in Data Visualization.

Chapter Color scales in Fundamentals of Data Visualization.

Exercises

  1. A color can be specified in hexadecimal notation. Given such a color specification you can find out what it looks like by using it in a simple plot, or by using the Google color picker. Which of the following best describe the color #B22222?

    1. a shade of green
    2. a shade of blue
    3. orange
    4. a shade of red
  2. The following shows how to view the colors in the RColorBrewer palette named Reds with 7 colors:

    library(RColorBrewer)
    display.brewer.pal(7, "Reds")

    Which of the following RColorBrewer palettes is diverging?

    1. Blues
    2. PuRd
    3. Set1
    4. RdGy
