You will submit your homework as an R Markdown (.Rmd
) file by committing to your git
repository and pushing to GitLab. We will knit this file to produce the .html
output file (you do not need to submit the .html
, but you should make sure that it can be produced successfully).
We will review both your .Rmd
file and the .html
file. To receive full credit:
You must submit your .Rmd
file on time. It must be named exactly as specified, and it must knit without errors to produce a .html
file.
The .html
file should read as a well written report, with all results and graphs supported by text explaining what they are and, when appropriate, what conclusions can be drawn. Your report should not contain any extraneous material, such as leftovers from a template.
The R code in your .Rmd
file must be clear, readable, and follow the coding standards.
The text in your .Rmd
file must be readable and use R markdown properly, as shown in the class template file.
Create a new folder called HW5
in your repository. Use exactly this spelling with upper case letters. You can do this in the RStudio IDE, with R’s dir.create
function, or using a shell.
In this folder, create a new Rmarkdown file called hw5.Rmd
. Again use exactly this spelling. RStudio will give you a template, or you can use the one available here. Commit your new file to your repository. (If you are using git
in a shell you will need to use git add
before git commit
).
In this file present your answers to the following problems. Your presentation should follow the pattern and guidelines in the class template file.
This problem refers to the data provided in the nycflights13
package. Airport codes for the three New York City airports can be computed from the origin
variable in the flights
table packages using unique()
. Use filter()
and select()
on the airports
table to create a table containing the airport codes and the airport names for these three airports and show the result as a nicely formatted table.
Continuing with the nycflights13
data, using the flights
table compute average and median departure delays for each of the three New York City airports, omitting missing values. Present the results as a nicely formatted table and comment on the results.
Use density plots to compare the distributions of the air time (as recorded in the air_time
variable) for flights originating from each of the three New York City airports. What differences do you see?
There are several options for displaying the densities:
color
or fill
and alpha
to distinguish the distributions;Consider all three approaches and comment on their advantages and disadvantages.
The default bandwidth used by geom_density()
and geom_density_ridges()
may be too narrow; a larger bandwidth of, say, 50 may be better. The bw
argument can be used to specify a different bandwidth. These examples specify a narrower bandwidth for the barley
data:
library(ggplot2)
data(barley, package = "lattice")
ggplot(barley, aes(x = yield)) +
geom_density(bw = 1) +
facet_wrap(~site, ncol = 1)
library(ggridges)
ggplot(barley, aes(x = yield, y = site)) +
geom_density_ridges(aes(height = after_stat(density)),
stat = "density", bw = 1)
In Problem 4 of Assignment 4 you created a strip plot showing highway fuel economy values for each of the years from 2000 through 2023. Compare your result to three other options:
Comment on the advantages and disadvanteges of each approach in this case.
You can create an HTML file in RStudio using the Knit
tab on the editor window. You can also use the R command
rmarkdown::render("hw5.Rmd")
with your working directory set to HW5
.
Commit your changes to your hw5.Rmd
file to your local git repository. You do not heed to commit your HTML file.
Submit your work by pushing your local repository changes to your remote repository on the UI GitLab site. After doing this, it is a good idea to check your repository on the UI GitLab site to make sure everything has been submitted successfully