In this lab your task is to improve a plot that violates many data visualization best practices. We want you to get creative and make a visualisation that tells a (much!) better story than the original plot.

Learning goals

Telling a story with data.
Data visualization best practices.
Reshaping data.

Complete the following steps before you join the live workshop!

Workshop prep

You have two tasks you should complete before the workshop. It is crucial you do these before the workshop to ensure a smooth (and enjoyable!) experience during the workshop:

Confirm GitHub org membership: Confirm that you are a member of the course GitHub organization. If not, let me know asap, before the night before the workshop.
Meet your teammates: You can find your team assignment for the rest of the semester here. We’re also setting up a channel for your team on, well, Teams! Say hi to everyone there before the workshop. And don’t forget to let your tutor know your team name!

Complete the following steps during the live workshop with your team.

Warm up with your team

This is (likely) the first time you’re getting to meet your teammates “in person”. Take 5 minutes to go around and introduce yourselves: name, year, program, where you’re joining from, whatever else you like. Cap off your self introduction by pointing out one error in the following visual.

Once the introductions are over, give a number to each team member. In this lab, team members will take turns sharing their screen and working on an exercise in the common team repo, commit and push their changes, and then the next team member will take over and pull the changes before they make any further changes to their lab. In the lab instructions you’ll see markers for

EVERYONE (for tasks everyone should do concurrently) or
TEAM MEMBER X (for tasks only team member X should do while sharing their screen and others contributing verbally but not typing anything).

Getting started

Repository

EVERYONE: Go to course GitHub organization and locate your Lab 02 repo, which should be named lab-02-sad-plot-YOUR_TEAMNAME. Grab the URL of the repo, and clone it in RStudio Cloud. Refer to HW 00 if you would like to see step-by-step instructions for cloning a repo into an RStudio project.

First, open the R Markdown document lab-02.Rmd and knit it. Make sure it compiles without errors. The output will be a markdown document (.md) file with the same name.

Packages

EVERYONE: Before getting started with the Exercises, run the following code in the Console to load this package.

library(tidyverse)

Data: Instructional staff employment trends

The American Association of University Professors (AAUP) is a nonprofit membership association of faculty and other academic professionals. This report compiled by the AAUP shows trends in instructional staff employees between 1975 and 2011, and contains an image very similar to the one given below.

EVERYONE: Let’s start by loading the data used to create this plot.

staff <- read_csv("data/instructional-staff.csv")

Each row in this dataset represents a faculty type, and the columns are the years for which we have data. The values are percentage of hires of that type of faculty for each year.

## # A tibble: 5 x 12
##   faculty_type `1975` `1989` `1993` `1995` `1999` `2001` `2003` `2005` `2007`
##   <chr>         <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
## 1 Full-Time T…   29     27.6   25     24.8   21.8   20.3   19.3   17.8   17.2
## 2 Full-Time T…   16.1   11.4   10.2    9.6    8.9    9.2    8.8    8.2    8  
## 3 Full-Time N…   10.3   14.1   13.6   13.6   15.2   15.5   15     14.8   14.9
## 4 Part-Time F…   24     30.4   33.1   33.2   35.5   36     37     39.3   40.5
## 5 Graduate St…   20.5   16.5   18.1   18.8   18.7   19     20     19.9   19.5
## # … with 2 more variables: `2009` <dbl>, `2011` <dbl>

In order to recreate this visualization we need to first reshape the data to have one variable for faculty type and one variable for year. In other words, we will convert the data from the long format to wide format.

Exercises

1️⃣ TEAM MEMBER 1 should share their screen, write the answer to Exercises 1, and then commit and and push their changes. Everyone else: participate, help out, but no typing in the R Markdown document and no committing/pushing!

If the long data will have a row for each year/faculty type combination, and there are 5 faculty types and 11 years of data, how many rows will the data have? Discuss as a team and write down your answer.

🧶 ✅ ⬆️ At this point TEAM MEMBER 1 should knit the Rmd, commit, and push their changes to GitHub with an appropriate commit message. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.

2️⃣ TEAM MEMBER 2 should now share their screen and pull ⬇️ before doing anything else. They should then write the answers to Exercises 2 and 3, and then commit and and push their changes. Everyone else: participate, help out, but no typing in the R Markdown document and no committing/pushing!

We do the wide to long conversation using pivot_longer(). The animation below show how this function works, as well as its counterpart pivot_wider().

Quick reminder: the function has the following arguments:

pivot_longer(data, cols, names_to = "name")

The first argument is data as usual.
The second argument, cols, specifies the columns to pivot into longer format.
The third argument, names_to, is the name of the column where column names of pivoted variables go (character string).
The fourth argument, values_to is the name of the column where data in pivoted variables go (character string).

Fill in the blanks in the following code chunk to pivot the staff data longer and save it as a new data frame called staff_long.

staff_long <- ___ %>%
  ___(
    cols = ___, 
    names_to = "___",
    values_to = "___"
    )

Inspect staff_long to check if your guess regarding number of rows from Exercise 1 was correct.

🧶 ✅ ⬆️ At this point TEAM MEMBER 2 should knit the Rmd, commit, and push their changes to GitHub with an appropriate commit message. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.

TEAM MEMBER 3 should now share their screen and pull ⬇️ before doing anything else. They should then write the answers to Exercises 4 and 5, and then commit and and push their changes. Everyone else: participate, help out, but no typing in the R Markdown document and no committing/pushing!

We’ll plot instructional staff employment trends as a line plot. A possible approach for creating a line plot where we color the lines by faculty type is the following, but it doesn’t quite look right. What seems to be the issue?

staff_long %>%
  ggplot(aes(x = year, y = value, color = faculty_type)) +
  geom_line()

Next, add a group aesthetic to the plot (grouping by faculty_type) and plot again. What does the plot reveal about instructional staff employment trends over the years?

🧶 ✅ ⬆️ At this point TEAM MEMBER 3 should knit the Rmd, commit, and push their changes to GitHub with an appropriate commit message. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.

TEAM MEMBER 4 should now share their screen and pull ⬇️ before doing anything else. They should then write the answers to Exercise 6, and then commit and and push their changes. Everyone else: participate, help out, but no typing in the R Markdown document and no committing/pushing! (If your team has fewer than 4 people, just move back to the first member.)

Improve the line plot from the previous exercise by fixing up its labels (title, axis labels, and legend label) as well as any other components you think could benefit from improvement.

🧶 ✅ ⬆️ At this point TEAM MEMBER 4 should knit the Rmd, commit, and push their changes to GitHub with an appropriate commit message. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.

TEAM MEMBER 5 should now share their screen and pull ⬇️ before doing anything else. They should then write the answers to Exercise 7 and 8, and then commit and and push their changes. Everyone else: participate, help out, but no typing in the R Markdown document and no committing/pushing! (If your team has fewer than 5 people, just move back to the first member.)

Suppose the objective of this plot was to show that the proportion of part-time faculty have gone up over time compared to other instructional staff types. What changes would you propose making to this plot to tell this story? Write down your idea(s). The more precise you are, the easier the next step will be. Get creative, and think about how you can modify the dataset to give you new/different variables to work with.
Implement at least one of these ideas you came up with in the previous exercise. You should produce an improved data visualisation and accompany your visualisation with a brief paragraph describing the choices you made in your improvement, specifically discussing what you didn’t like in the original plot and why, and how you addressed them in the visualisation you created.

🧶 ✅ ⬆️ At this point TEAM MEMBER 5 should knit the Rmd, commit, and push their changes to GitHub with an appropriate commit message. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.

Aim to make it to this point during the workshop.

Lab 02 - Take a sad plot and make it better