Complete the following steps before you join the live workshop!

You have two tasks you should complete before the workshop:

- Watch this week’s videos.
- Read the study description below.
- Familiarise yourself with the dataset by reading the data dictionary and reference here.
- Clone your lab repo and complete the first two exercises.

Many college courses conclude by giving students the opportunity to evaluate the course and the instructor anonymously. However, the use of these student evaluations as an indicator of course quality and teaching effectiveness is often criticized because these measures may reflect the influence of non-teaching related characteristics, such as the physical appearance of the instructor. The article titled, “Beauty in the classroom: instructors’ pulchritude and putative pedagogical productivity” (Hamermesh and Parker, 2005) found that instructors who are viewed to be better looking receive higher instructional ratings.1 Daniel S. Hamermesh, Amy Parker, Beauty in the classroom: instructors pulchritude and putative pedagogical productivity, Economics of Education Review, Volume 24, Issue 4, August 2005, Pages 369-376, ISSN 0272-7757, 10.1016/j.econedurev.2004.07.013. http://www.sciencedirect.com/science/article/pii/S0272775704001165.

For this assignment you will analyze the data from this study in order to learn what goes into a positive professor evaluation.

The data were gathered from end of semester student evaluations for a large sample of professors from the University of Texas at Austin. In addition, six students rated the professors’ physical appearance.2 This is a slightly modified version of the original data set that was released as part of the replication data for Data Analysis Using Regression and Multilevel/Hierarchical Models (Gelman and Hill, 2007). The result is a data frame where each row contains a different course and columns represent variables about the courses and professors.

During the workshop we’ll use the following three packages:

**tidyverse**: for data wrangling and visualisation**tidymodels**: for modelling**openintro**: for the data

- Visualize the distribution of
`score`

. Is the distribution skewed? What does that tell you about how students rate courses? Is this what you expected to see? Why, or why not? Include any summary statistics and visualizations you use in your response. - Visualize and describe the relationship between
`score`

and`bty_avg`

using`geom_point()`

to represent the data. Then, visualise again using`geom_jitter()`

for the points. What does “jitter” mean? What was misleading about the initial scatterplot?

Complete the following steps during the live workshop with your team.

**Hint:** Linear model is in the form \(\hat{y} = b_0 + b_1 x\).

- Fit a linear model called
`score_bty_fit`

to predict average professor evaluation`score`

from average beauty rating (`bty_avg`

). Print the regression output using`tidy()`

. Based on the regression output, write the linear model. - Replot your jittered visualization from Exercise 3, and add the regression line to this plot in
`"orange"`

color. Turn off the shading for the uncertainty of the line. - Interpret the slope of the linear model in context of the data.
- Interpret the intercept of the linear model in context of the data. Comment on whether or not the intercept makes sense in this context.

**Hint:** Use `glance()`

to obtain \(R^2\) and other similar statistics for your model.

- Determine the \(R^2\) of the model and interpret it in context of the data.

Next, we’ll assess the model fit using a graphical diagnostic. To do so we need to calculate the predicted evaluation scores for each professor in the dataset as well as the residuals for each observation. We use the `augment()`

function for this:

`<- augment(score_bty_fit$fit) score_bty_aug `

Let’s take a look at what’s in this augmented dataset:

`names(score_bty_aug)`

```
## [1] "score" "bty_avg" ".fitted" ".resid" ".std.resid"
## [6] ".hat" ".sigma" ".cooksd"
```

First, we have the variables used to build the model: `score`

and `bty_avg`

. We also have the predicted values (`.fitted`

) and the residuals (`.resid`

). We’ll talk about a few of the other variables in the augmented data frame later in the course, and some others you will encounter in future courses.

**Hint:** You can use `geom_hline()`

with `linetype = “dashed”`

for this.

- Make a plot of residuals vs. predicted values for the model above. Use
`geom_jitter()`

instead of`geom_point()`

, and overlay a dashed horizontal line at`y = 0`

. Then, comment on whether the linear model is appropriate for modeling the relationship between evaluation scores and beauty scores.

🧶 ✅ ⬆️ Knit, *commit, and push your changes to GitHub with an appropriate commit message. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.*

- Fit a new linear model called
`score_rank_fit`

to predict average professor evaluation`score`

based on`rank`

of the professor and print out the regression output using`tidy()`

. Based on the regression output, interpret the slope and intercept in context of the data. - Fit a new linear model called
`score_gender_fit`

to predict average professor evaluation`score`

based on`gender`

of the professor. and print out the regression output using`tidy()`

. Based on the regression output, interpret the slope and intercept in context of the data. But we’ll do things a bit differently this time, using inline code for the values of the intercept and the slope! Below are some tips for extracting these values from the model output to use in your inline code.

Let’s start with the intercept. You have two options for extracting the value of the intercept from the regression output. Remember, the output looks like this:

`tidy(score_gender_fit)`

```
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 4.09 0.0387 106. 0
## 2 gendermale 0.142 0.0508 2.78 0.00558
```

So the intercept is in the `estimate`

column, and it’s the first element in there.

```
# Option 1
tidy(score_gender_fit)$estimate[1]
```

`## [1] 4.092821`

We can also extract it using a dplyr pipeline:

```
# Option 2
tidy(score_gender_fit) %>%
filter(term == "(Intercept)") %>% # filter for the intercept
select(estimate) %>% # select the estimate column
pull() # pull value out of data frame
```

`## [1] 4.092821`

Regardless of which option you use, you might consider storing the value in an object that you can easily refer to in your inline code, e.g.

You can hide the code for chunks like this where you are simply preparing objects for later use by adding `echo = FALSE`

in the code chunk options, that is, where you label your code chunk, separated by a comma, i.e.

`{r label, echo = FALSE}`

```
<- tidy(score_gender_fit) %>%
score_gender_intercept filter(term == "(Intercept)") %>%
select(estimate) %>%
pull()
```

And then, you can use the `score_gender_intercept`

in inline code, e.g.

`The intercept of the model is 'r score_gender_intercept'...`

which will render to

`The intercept of the model is 4.0928205...`

There is still one small issue here though, the number of decimal places reported. It would be better to round the value in our narrative, for which we can use the `round()`

function. This function takes two arguments: the first one is the value (or vector of values) you want to round, and the second one is the number of digits.

`The intercept of the model is 'r round(score_gender_intercept, 2)'...`

which will render to

`The intercept of the model is 4.09...`

🧶 ✅ ⬆️ Knit, *commit, and push your changes to GitHub with an appropriate commit message. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.*

- Next, we fit a multiple linear regression model, predicting average professor evaluation
`score`

based on average beauty rating (`bty_avg`

) and`gender`

. Name the model`score_bty_gender_fit`

. Interpret the intercept and the slopes of`bty_avg`

and`gender`

. - What percent of the variability in
`score`

is explained by the model`score_bty_gender_fit`

. Once again, use inline code in your answer. - What is the equation of the line corresponding to
*just*male professors? - For two professors who received the same beauty rating, which gender tends to have the higher course evaluation score?
- How does the relationship between beauty and evaluation score vary between male and female professors?

Aim to make it to this point during the workshop.

- How do the adjusted \(R^2\) values of
`score_bty_fit`

and`score_bty_gender_fit`

compare? What does this tell us about how useful`gender`

is in explaining the variability in evaluation scores when we already have information on the beauty score of the professor. - Compare the slopes of
`bty_avg`

under the two models (`score_bty_fit`

and`score_bty_gender_fit`

). Has the addition of`gender`

to the model changed the parameter estimate (slope) of`bty_avg`

? - Create a new model called
`score_bty_gender_rank_fit`

predicting average professor evaluation`score`

based on average beauty rating (`bty_avg`

),`gender`

, and`rank`

. How do the adjusted \(R^2\) values of`score_bty_gender_life`

and`score_bty_gender_rank_fit`

compare? What does this tell us about how useful`rank`

is in explaining the variability in evaluation scores when we already have information on the beauty score and gender of the professor.