You will submit your homework as an R Markdown (.Rmd)
file by committing to your git repository and pushing to
GitLab. We will knit this file to produce the
.html output file (you do not need to submit the
.html, but you should make sure that it can be produced
successfully).
We will review both your .Rmd file and the
.html file. To receive full credit:
You must submit your .Rmd file on time. It must be
named exactly as specified, and it must knit without errors to produce a
.html file.
The .html file should read as a well written report,
with all results and graphs supported by text explaining what they are
and, when appropriate, what conclusions can be drawn. Your report should
not contain any extraneous material, such as leftovers from a
template.
The R code in your .Rmd file must be clear,
readable, and follow the coding
standards.
The text in your .Rmd file must be readable and use
R markdown properly, as shown in the class template
file.
Create a new folder called HW5 in your repository.
Use exactly this spelling with upper case letters. You
can do this in the RStudio IDE, with R’s dir.create
function, or using a shell.
In this folder, create a new Rmarkdown file called
hw5.Rmd. Again use exactly this spelling.
RStudio will give you a template, or you can use the one available here. Commit your new file to your repository. (If
you are using git in a shell you will need to use
git add before git commit).
In this file present your answers to the following problems. Your presentation should follow the pattern and guidelines in the class template file.
This problem refers to the data provided in the
nycflights13 package. Airport codes for the three New York
City airports can be computed from the origin variable in
the flights table packages using unique(). Use
filter() and select() on the
airports table to create a table containing the airport
codes and the airport names for these three airports and show the result
as a nicely formatted table.
Continuing with the nycflights13 data, using the
flights table compute average and median departure delays
for each of the three New York City airports, omitting missing values.
Present the results as a nicely formatted table and comment on the
results.
Use density plots to compare the distributions of the air time (as
recorded in the air_time variable) for flights originating
from each of the three New York City airports. What differences do you
see?
There are several options for displaying the densities:
color or fill and
alpha to distinguish the distributions;Consider all three approaches and comment on their advantages and disadvantages.
The default bandwidth used by geom_density()
and geom_density_ridges() may be too narrow; a larger
bandwidth of, say, 50 may be better. The bw argument can be
used to specify a different bandwidth. These examples specify a narrower
bandwidth for the barley data:
library(ggplot2)
data(barley, package = "lattice")
ggplot(barley, aes(x = yield)) +
geom_density(bw = 1) +
facet_wrap(~site, ncol = 1)
library(ggridges)
ggplot(barley, aes(x = yield, y = site)) +
geom_density_ridges(aes(height = after_stat(density)),
stat = "density", bw = 1)
The data set heights in package dslabs
contains self-reported heights for a number of female and male students.
You can load the data set with
data(heights, package = "dslabs")
Construct a density plot showing the densities of the height
distributions for males and for females. Also construct an eCDF plot
showing the empirical cumulative distributions for the heights of the
two groups. (You can do this using stat_ecdf and mapping
x to height and color to
sex).
Comment on what features are easier to see in one plot or the other.
You can create an HTML file in RStudio using the Knit
tab on the editor window. You can also use the R command
rmarkdown::render("hw5.Rmd")
with your working directory set to HW5.
Commit your changes to your hw5.Rmd file to your local
git repository. You do not heed to commit your HTML file.
Submit your work by pushing your local repository changes to your remote repository on the UI GitLab site. After doing this, it is a good idea to check your repository on the UI GitLab site to make sure everything has been submitted successfully