General Issues
Make sure your file names and file references use identical
spelling, including upper/lower case. Your code will fail on a
case-sensitive file system if you don’t.
Make sure to commit your work to your local repository and push
your commits to GitLab. We can only see what is on GitLab, not what is
on your computer. You can check what we see by going to the GitLab web
interface.
Include your name and the date in the header of your
.Rmd file using author: and date:
tags. You can use an inline chunk to have the date computed when the
document is rendered. Your header should look something like this:
---
title: "Assignment 3"
author: "Fred Frog"
date: "`r Sys.Date()`"
output: html_document
---
If you want to increase the font size for the body text in your
HTML output one option is to add this line after your document
header:
<style type="text/css"> body{ font-size: 12pt; } </style>
Do not use markdown headers for this. Markdown headers
(lines starting with one or more # characters) should only
be used for section and subsection headers.
1. Choosing Between Faceting and Color
The faceted plot shows each of the seven groups in a sub-plot, or
facet, using the same axis scales for all plots.
library(ggplot2)
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
facet_wrap(~ class, nrow = 2)

The plots are small and there is some over-plotting. The
over-plotting could be reduced by reducing the point size.
A single plot that maps class to color benefits from a
larger point size to improve discriminability of the colors:
ggplot(mpg, aes(x = displ, y = hwy, color = class)) +
geom_point(size = 2.5)

The number of colors is large, which makes discrimination more
difficult, even with the increased point size. But once groups are
identified, their relative positions are easier to see in the colored
plot as all comparisons are within a common set of scales.
Faceting reduces plot size and thus increases over-plotting for
larger data sets. Reducing point size is an option that can be effective
if color and shape are not being used as channels. A significant
drawback of faceting is that some group comparisons are moved from
common scale comparisons to unaligned scale comparisons. This can
sometimes be alleviated somewhat by showing a muted image of the
complete data in the background.
Overall, color may have a slight edge in this data set. But it should
be kept in mind that color is not effective on all display devices or
for all viewers.
In larger data sets color becomes less effective as there will be a
considerable amount of over-plotting, given the point size needed to
support good color discrimination. Faceting will also suffer from more
over-plotting in larger data sets for a given point size, but there is
more flexibility to reduce point size. The shape of the data also plays
a role, so both approaches are worth considering.
2. Faceting with Muted Full Data
The full data can be added as a background layer in a muted color,
such as a light grey:
library(ggplot2)
library(dplyr)
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point(data = mutate(mpg, class = NULL), color = "lightgrey") +
geom_point() +
facet_wrap(~ class, nrow = 2)

With the full data group-to-whole comparisons are again on aligned
scales. For example, with the full data in the background it is easy to
see that the 2-seaters are quite different than the other cars. Seeing
this in the basic faceted plot shown above is also possible, but it
requires some work.
3. Gun Murders in US States
if (! file.exists("murders.csv"))
download.file("https://www.stat.uiowa.edu/~luke/data/murders.csv",
"murders.csv")
murders <- read.csv("murders.csv")
The following graph shows a plot of the total number of gun murders
against the population of each state and the District of Columbia. Log
axes are used as the distributions of both variables are highly skewed.
The points are colored to show the region associated with each
state.
ggplot(murders, aes(x = population, y = total, color = region)) +
geom_point(size = 2.5) +
scale_x_log10() +
scale_y_log10()

The relationship between the number of murders and the population
size appears to be close to linear. The states in the southern region
are mostly towards the top of the set of points: for a given population
size the number of murders in southern states appears to be higher than
in others.
4. Comparing Some Visualizations
All three plots clearly show that the 5 cylinder group is the
smallest. Distinguishing the sizes of the other groups is more
challenging.
Plot B uses aligned scales. It is easy to see the ordering, even
though the values for 8, 6, and 4 cylinders are quite close.
Plot C relies on length comparisons; it seems possible to recognize
that the 8 cylinder group is the smallest among the 4, 6, and 8 cylinder
groups, but determining which of the 4 and 6 cylinder groups is smaller
is very hard.
Plot A relies on area comparisons. The sizes of the 4, 6, and 8
cylinder groups are very hard to distinguish.
For comparing the group sizes in this data set Plot B is best,
followed by Plot C, and then Plot A.
