Complete the following steps before you join the live workshop!
You have two tasks you should complete before the workshop:
Many college courses conclude by giving students the opportunity to evaluate the course and the instructor anonymously. However, the use of these student evaluations as an indicator of course quality and teaching effectiveness is often criticized because these measures may reflect the influence of non-teaching related characteristics, such as the physical appearance of the instructor. The article titled, “Beauty in the classroom: instructors’ pulchritude and putative pedagogical productivity” (Hamermesh and Parker, 2005) found that instructors who are viewed to be better looking receive higher instructional ratings.1 Daniel S. Hamermesh, Amy Parker, Beauty in the classroom: instructors pulchritude and putative pedagogical productivity, Economics of Education Review, Volume 24, Issue 4, August 2005, Pages 369-376, ISSN 0272-7757, 10.1016/j.econedurev.2004.07.013. http://www.sciencedirect.com/science/article/pii/S0272775704001165.
For this assignment you will analyze the data from this study in order to learn what goes into a positive professor evaluation.
The data were gathered from end of semester student evaluations for a large sample of professors from the University of Texas at Austin. In addition, six students rated the professors’ physical appearance.2 This is a slightly modified version of the original data set that was released as part of the replication data for Data Analysis Using Regression and Multilevel/Hierarchical Models (Gelman and Hill, 2007). The result is a data frame where each row contains a different course and columns represent variables about the courses and professors.
During the workshop we’ll use the following three packages:
score
. Is the distribution skewed? What does that tell you about how students rate courses? Is this what you expected to see? Why, or why not? Include any summary statistics and visualizations you use in your response.score
and bty_avg
using geom_point()
to represent the data. Then, visualise again using geom_jitter()
for the points. What does “jitter” mean? What was misleading about the initial scatterplot?Complete the following steps during the live workshop with your team.
Hint: Linear model is in the form \(\hat{y} = b_0 + b_1 x\).
score_bty_fit
to predict average professor evaluation score
from average beauty rating (bty_avg
). Print the regression output using tidy()
. Based on the regression output, write the linear model."orange"
color. Turn off the shading for the uncertainty of the line.Hint: Use glance()
to obtain \(R^2\) and other similar statistics for your model.
Next, we’ll assess the model fit using a graphical diagnostic. To do so we need to calculate the predicted evaluation scores for each professor in the dataset as well as the residuals for each observation. We use the augment()
function for this:
<- augment(score_bty_fit$fit) score_bty_aug
Let’s take a look at what’s in this augmented dataset:
names(score_bty_aug)
## [1] "score" "bty_avg" ".fitted" ".resid" ".std.resid"
## [6] ".hat" ".sigma" ".cooksd"
First, we have the variables used to build the model: score
and bty_avg
. We also have the predicted values (.fitted
) and the residuals (.resid
). We’ll talk about a few of the other variables in the augmented data frame later in the course, and some others you will encounter in future courses.
Hint: You can use geom_hline()
with linetype = “dashed”
for this.
geom_jitter()
instead of geom_point()
, and overlay a dashed horizontal line at y = 0
. Then, comment on whether the linear model is appropriate for modeling the relationship between evaluation scores and beauty scores.🧶 ✅ ⬆️ Knit, commit, and push your changes to GitHub with an appropriate commit message. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.
score_rank_fit
to predict average professor evaluation score
based on rank
of the professor and print out the regression output using tidy()
. Based on the regression output, interpret the slope and intercept in context of the data.score_gender_fit
to predict average professor evaluation score
based on gender
of the professor. and print out the regression output using tidy()
. Based on the regression output, interpret the slope and intercept in context of the data. But we’ll do things a bit differently this time, using inline code for the values of the intercept and the slope! Below are some tips for extracting these values from the model output to use in your inline code.Let’s start with the intercept. You have two options for extracting the value of the intercept from the regression output. Remember, the output looks like this:
tidy(score_gender_fit)
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 4.09 0.0387 106. 0
## 2 gendermale 0.142 0.0508 2.78 0.00558
So the intercept is in the estimate
column, and it’s the first element in there.
# Option 1
tidy(score_gender_fit)$estimate[1]
## [1] 4.092821
We can also extract it using a dplyr pipeline:
# Option 2
tidy(score_gender_fit) %>%
filter(term == "(Intercept)") %>% # filter for the intercept
select(estimate) %>% # select the estimate column
pull() # pull value out of data frame
## [1] 4.092821
Regardless of which option you use, you might consider storing the value in an object that you can easily refer to in your inline code, e.g.
You can hide the code for chunks like this where you are simply preparing objects for later use by adding echo = FALSE
in the code chunk options, that is, where you label your code chunk, separated by a comma, i.e.
{r label, echo = FALSE}
<- tidy(score_gender_fit) %>%
score_gender_intercept filter(term == "(Intercept)") %>%
select(estimate) %>%
pull()
And then, you can use the score_gender_intercept
in inline code, e.g.
The intercept of the model is 'r score_gender_intercept'...
which will render to
The intercept of the model is 4.0928205...
There is still one small issue here though, the number of decimal places reported. It would be better to round the value in our narrative, for which we can use the round()
function. This function takes two arguments: the first one is the value (or vector of values) you want to round, and the second one is the number of digits.
The intercept of the model is 'r round(score_gender_intercept, 2)'...
which will render to
The intercept of the model is 4.09...
🧶 ✅ ⬆️ Knit, commit, and push your changes to GitHub with an appropriate commit message. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.
score
based on average beauty rating (bty_avg
) and gender
. Name the model score_bty_gender_fit
. Interpret the intercept and the slopes of bty_avg
and gender
.score
is explained by the model score_bty_gender_fit
. Once again, use inline code in your answer.Aim to make it to this point during the workshop.
score_bty_fit
and score_bty_gender_fit
compare? What does this tell us about how useful gender
is in explaining the variability in evaluation scores when we already have information on the beauty score of the professor.bty_avg
under the two models (score_bty_fit
and score_bty_gender_fit
). Has the addition of gender
to the model changed the parameter estimate (slope) of bty_avg
?score_bty_gender_rank_fit
predicting average professor evaluation score
based on average beauty rating (bty_avg
), gender
, and rank
. How do the adjusted \(R^2\) values of score_bty_gender_life
and score_bty_gender_rank_fit
compare? What does this tell us about how useful rank
is in explaining the variability in evaluation scores when we already have information on the beauty score and gender of the professor.