Strip Plots

Basics

A variant of the dot plot is known as a strip plot. A strip plot for the city temperature data is

p1 <- stripplot(~ temp, data = citytemps)
p2 <- ggplot(citytemps) + geom_point(aes(x = temp, y = "All"))
grid.arrange(p1, p2, nrow = 1)

One way to reduce the vertical space is to use the chunk option fig.height = 2, which produces

The strip plot can reveal gaps and outliers.

After looking at the plot we might want to examine the high and low values:

filter(citytemps, temp > 85)
##             city temp
## 1       Asuncion   95
## 2        Caracas   90
## 3  Dar es Salaam   86
## 4       Kinshasa   86
## 5          Lagos   91
## 6        Managua   88
## 7 Rio de Janeiro   88
## 8      São Paulo   90
filter(citytemps, temp < 10)
##       city temp
## 1   Anadyr    3
## 2  Calgary    1
## 3 Edmonton  -11
## 4     Kyiv    9
## 5    Minsk    0
## 6   Moscow    3
## 7 Winnipeg    8

Multiple Samples

The strip plot is most useful for showing subsets corresponding to a categorical variable.

A strip plot for the yields for different varieties in the barley data is

ggplot(barley) + geom_point(aes(x = yield, y = variety))

Scalability

Scalability in this form is limited due to over-plotting.

A simple strip plot of price within the different cut levels in the diamonds data is not very helpful:

ggplot(diamonds) + geom_point(aes(x = price, y = cut))

Several approaches are available to reduce the impact of over-plotting:

reduce the point size;
random displacement of points, called jittering;
making the points translucent, or alpha blending.

Combining all three produces

ggplot(diamonds) +
    geom_point(aes(x = price, y = cut),
               size = 0.2, position = "jitter", alpha = 0.2)

Skewness of the price distributions can be seen in this plot, though other approaches will show this more clearly.

A peculiar feature reveled by this plot is the gap below 2000. Examining the subset with price < 2000 shows the gap is roughly symmetric around 1500:

ggplot(filter(diamonds, price < 2000)) +
    geom_point(aes(x = price, y = cut),
               size = 0.2, position = "jitter", alpha = 0.2)

Some Notes

With a good combination of point size choice, jittering, and alpha blending the strip plot for groups of data can scale to several hundred thousand observations and ten to twenty of groups.
Strip plots can reveal gaps, outliers, and data outside of the expected range.
Skewness and multi-modality can be seen, but other visualizations show these more clearly.
Storage needed for vector graphics images grows linearly with the number of observations.

Base graphics provides stripchart:

stripchart(yield ~ variety, data = barley)

Lattice provides stripplot:

stripplot(variety ~ yield, data = barley)

Strip Plots

Some Features to Look For

Strip Plots

Basics

Multiple Samples

Scalability

Some Notes