class: center, middle, inverse, title-slide # Visualising data with ggplot2 ##
Introduction to Data Science ###
introds.org
###
Dr. Mine Çetinkaya-Rundel --- layout: true <div class="my-footer"> <span> <a href="https://introds.org" target="_blank">introds.org</a> </span> </div> --- class: middle # ggplot2 ❤️ 🐧 --- ## ggplot2 `\(\in\)` tidyverse .pull-left[ <img src="img/ggplot2-part-of-tidyverse.png" width="80%" style="display: block; margin: auto;" /> ] .pull-right[ - **ggplot2** is tidyverse's data visualization package - Structure of the code for plots can be summarized as ```r ggplot(data = [dataset], mapping = aes(x = [x-variable], y = [y-variable])) + geom_xxx() + other options ``` ] --- ## Data: Palmer Penguins Measurements for penguin species, island in Palmer Archipelago, size (flipper length, body mass, bill dimensions), and sex. .pull-left-narrow[ <img src="img/penguins.png" width="80%" style="display: block; margin: auto;" /> ] .pull-right-wide[ ```r library(palmerpenguins) glimpse(penguins) ``` ``` ## Rows: 344 ## Columns: 7 ## $ species <fct> Adelie, Adelie, Adelie, Adelie, Adel… ## $ island <fct> Torgersen, Torgersen, Torgersen, Tor… ## $ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38… ## $ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17… ## $ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 19… ## $ body_mass_g <int> 3750, 3800, 3250, NA, 3450, 3650, 36… ## $ sex <fct> male, female, female, NA, female, ma… ``` ] --- .panelset[ .panel[.panel-name[Plot] <img src="w2-d03-ggplot2_files/figure-html/unnamed-chunk-6-1.png" width="70%" style="display: block; margin: auto;" /> ] .panel[.panel-name[Code] ```r ggplot(data = penguins, mapping = aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) + geom_point() + labs(title = "Bill depth and length", subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins", x = "Bill depth (mm)", y = "Bill length (mm)", colour = "Species") ``` ``` ## Warning: Removed 2 rows containing missing values (geom_point). ``` ] ] --- class: middle # Coding out loud --- .midi[ > **Start with the `penguins` data frame** ] .pull-left[ ```r *ggplot(data = penguins) ``` ] .pull-right[ <img src="w2-d03-ggplot2_files/figure-html/unnamed-chunk-7-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .midi[ > Start with the `penguins` data frame, > **map bill depth to the x-axis** ] .pull-left[ ```r ggplot(data = penguins, * mapping = aes(x = bill_depth_mm)) ``` ] .pull-right[ <img src="w2-d03-ggplot2_files/figure-html/unnamed-chunk-8-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .midi[ > Start with the `penguins` data frame, > map bill depth to the x-axis > **and map bill length to the y-axis.** ] .pull-left[ ```r ggplot(data = penguins, mapping = aes(x = bill_depth_mm, * y = bill_length_mm)) ``` ] .pull-right[ <img src="w2-d03-ggplot2_files/figure-html/unnamed-chunk-9-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .midi[ > Start with the `penguins` data frame, > map bill depth to the x-axis > and map bill length to the y-axis. > **Represent each observation with a point** ] .pull-left[ ```r ggplot(data = penguins, mapping = aes(x = bill_depth_mm, y = bill_length_mm)) + * geom_point() ``` ] .pull-right[ <img src="w2-d03-ggplot2_files/figure-html/unnamed-chunk-10-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .midi[ > Start with the `penguins` data frame, > map bill depth to the x-axis > and map bill length to the y-axis. > Represent each observation with a point > **and map species to the colour of each point.** ] .pull-left[ ```r ggplot(data = penguins, mapping = aes(x = bill_depth_mm, y = bill_length_mm, * colour = species)) + geom_point() ``` ] .pull-right[ <img src="w2-d03-ggplot2_files/figure-html/unnamed-chunk-11-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .midi[ > Start with the `penguins` data frame, > map bill depth to the x-axis > and map bill length to the y-axis. > Represent each observation with a point > and map species to the colour of each point. > **Title the plot "Bill depth and length"** ] .pull-left[ ```r ggplot(data = penguins, mapping = aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) + geom_point() + * labs(title = "Bill depth and length") ``` ] .pull-right[ <img src="w2-d03-ggplot2_files/figure-html/unnamed-chunk-12-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .midi[ > Start with the `penguins` data frame, > map bill depth to the x-axis > and map bill length to the y-axis. > Represent each observation with a point > and map species to the colour of each point. > Title the plot "Bill depth and length", > **add the subtitle "Dimensions for Adelie, Chinstrap, and Gentoo Penguins"** ] .pull-left[ ```r ggplot(data = penguins, mapping = aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) + geom_point() + labs(title = "Bill depth and length", * subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins") ``` ] .pull-right[ <img src="w2-d03-ggplot2_files/figure-html/unnamed-chunk-13-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .midi[ > Start with the `penguins` data frame, > map bill depth to the x-axis > and map bill length to the y-axis. > Represent each observation with a point > and map species to the colour of each point. > Title the plot "Bill depth and length", > add the subtitle "Dimensions for Adelie, Chinstrap, and Gentoo Penguins", > **label the x and y axes as "Bill depth (mm)" and "Bill length (mm)", respectively** ] .pull-left[ ```r ggplot(data = penguins, mapping = aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) + geom_point() + labs(title = "Bill depth and length", subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins", * x = "Bill depth (mm)", y = "Bill length (mm)") ``` ] .pull-right[ <img src="w2-d03-ggplot2_files/figure-html/unnamed-chunk-14-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .midi[ > Start with the `penguins` data frame, > map bill depth to the x-axis > and map bill length to the y-axis. > Represent each observation with a point > and map species to the colour of each point. > Title the plot "Bill depth and length", > add the subtitle "Dimensions for Adelie, Chinstrap, and Gentoo Penguins", > label the x and y axes as "Bill depth (mm)" and "Bill length (mm)", respectively, > **label the legend "Species"** ] .pull-left[ ```r ggplot(data = penguins, mapping = aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) + geom_point() + labs(title = "Bill depth and length", subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins", x = "Bill depth (mm)", y = "Bill length (mm)", * colour = "Species") ``` ] .pull-right[ <img src="w2-d03-ggplot2_files/figure-html/unnamed-chunk-15-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .midi[ > Start with the `penguins` data frame, > map bill depth to the x-axis > and map bill length to the y-axis. > Represent each observation with a point > and map species to the colour of each point. > Title the plot "Bill depth and length", > add the subtitle "Dimensions for Adelie, Chinstrap, and Gentoo Penguins", > label the x and y axes as "Bill depth (mm)" and "Bill length (mm)", respectively, > label the legend "Species", > **and add a caption for the data source.** ] .pull-left[ ```r ggplot(data = penguins, mapping = aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) + geom_point() + labs(title = "Bill depth and length", subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins", x = "Bill depth (mm)", y = "Bill length (mm)", colour = "Species", * caption = "Source: Palmer Station LTER / palmerpenguins package") ``` ] .pull-right[ <img src="w2-d03-ggplot2_files/figure-html/unnamed-chunk-16-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .midi[ > Start with the `penguins` data frame, > map bill depth to the x-axis > and map bill length to the y-axis. > Represent each observation with a point > and map species to the colour of each point. > Title the plot "Bill depth and length", > add the subtitle "Dimensions for Adelie, Chinstrap, and Gentoo Penguins", > label the x and y axes as "Bill depth (mm)" and "Bill length (mm)", respectively, > label the legend "Species", > and add a caption for the data source. > **Finally, use a discrete colour scale that is designed to be perceived by viewers with common forms of colour blindness.** ] .pull-left[ ```r ggplot(data = penguins, mapping = aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) + geom_point() + labs(title = "Bill depth and length", subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins", x = "Bill depth (mm)", y = "Bill length (mm)", colour = "Species", caption = "Source: Palmer Station LTER / palmerpenguins package") + * scale_colour_viridis_d() ``` ] .pull-right[ <img src="w2-d03-ggplot2_files/figure-html/unnamed-chunk-17-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .panelset[ .panel[.panel-name[Plot] <img src="w2-d03-ggplot2_files/figure-html/unnamed-chunk-18-1.png" width="70%" style="display: block; margin: auto;" /> ] .panel[.panel-name[Code] ```r ggplot(data = penguins, mapping = aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) + geom_point() + labs(title = "Bill depth and length", subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins", x = "Bill depth (mm)", y = "Bill length (mm)", colour = "Species", caption = "Source: Palmer Station LTER / palmerpenguins package") + scale_colour_viridis_d() ``` ``` ## Warning: Removed 2 rows containing missing values (geom_point). ``` ] .panel[.panel-name[Narrative] .pull-left-wide[ .midi[ Start with the `penguins` data frame, map bill depth to the x-axis and map bill length to the y-axis. Represent each observation with a point and map species to the colour of each point. Title the plot "Bill depth and length", add the subtitle "Dimensions for Adelie, Chinstrap, and Gentoo Penguins", label the x and y axes as "Bill depth (mm)" and "Bill length (mm)", respectively, label the legend "Species", and add a caption for the data source. Finally, use a discrete colour scale that is designed to be perceived by viewers with common forms of colour blindness. ] ] ] ] --- ## Argument names .tip[ You can omit the names of first two arguments when building plots with `ggplot()`. ] .pull-left[ ```r ggplot(data = penguins, mapping = aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) + geom_point() + scale_colour_viridis_d() ``` ] .pull-right[ ```r ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) + geom_point() + scale_colour_viridis_d() ``` ] --- class: middle # Aesthetics --- ## Aesthetics options Commonly used characteristics of plotting characters that can be **mapped to a specific variable** in the data are - `colour` - `shape` - `size` - `alpha` (transparency) --- ## Colour .pull-left[ ```r ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm, * colour = species)) + geom_point() + scale_colour_viridis_d() ``` ] .pull-right[ <img src="w2-d03-ggplot2_files/figure-html/unnamed-chunk-19-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Shape Mapped to a different variable than `colour` .pull-left[ ```r ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm, colour = species, * shape = island)) + geom_point() + scale_colour_viridis_d() ``` ] .pull-right[ <img src="w2-d03-ggplot2_files/figure-html/unnamed-chunk-20-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Shape Mapped to same variable as `colour` .pull-left[ ```r ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm, colour = species, * shape = species)) + geom_point() + scale_colour_viridis_d() ``` ] .pull-right[ <img src="w2-d03-ggplot2_files/figure-html/unnamed-chunk-21-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Size .pull-left[ ```r ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm, colour = species, shape = species, * size = body_mass_g)) + geom_point() + scale_colour_viridis_d() ``` ] .pull-right[ <img src="w2-d03-ggplot2_files/figure-html/unnamed-chunk-22-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Alpha .pull-left[ ```r ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm, colour = species, shape = species, size = body_mass_g, * alpha = flipper_length_mm)) + geom_point() + scale_colour_viridis_d() ``` ] .pull-right[ <img src="w2-d03-ggplot2_files/figure-html/unnamed-chunk-23-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .pull-left[ **Mapping** ```r ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm, * size = body_mass_g, * alpha = flipper_length_mm)) + geom_point() ``` <img src="w2-d03-ggplot2_files/figure-html/unnamed-chunk-24-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ **Setting** ```r ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm)) + * geom_point(size = 2, alpha = 0.5) ``` <img src="w2-d03-ggplot2_files/figure-html/unnamed-chunk-25-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Mapping vs. setting - **Mapping:** Determine the size, alpha, etc. of points based on the values of a variable in the data - goes into `aes()` - **Setting:** Determine the size, alpha, etc. of points **not** based on the values of a variable in the data - goes into `geom_*()` (this was `geom_point()` in the previous example, but we'll learn about other geoms soon!) --- class: middle # Faceting --- ## Faceting - Smaller plots that display different subsets of the data - Useful for exploring conditional relationships and large data --- .panelset[ .panel[.panel-name[Plot] <img src="w2-d03-ggplot2_files/figure-html/unnamed-chunk-26-1.png" width="70%" style="display: block; margin: auto;" /> ] .panel[.panel-name[Code] ```r ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm)) + geom_point() + * facet_grid(species ~ island) ``` ``` ## Warning: Removed 2 rows containing missing values (geom_point). ``` ] ] --- ## Various ways to facet .question[ In the next few slides describe what each plot displays. Think about how the code relates to the output. **Note:** The plots in the next few slides do not have proper titles, axis labels, etc. because we want you to figure out what's happening in the plots. But you should always label your plots! ] --- ```r ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm)) + geom_point() + * facet_grid(species ~ sex) ``` <img src="w2-d03-ggplot2_files/figure-html/unnamed-chunk-27-1.png" width="60%" style="display: block; margin: auto;" /> --- ```r ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm)) + geom_point() + * facet_grid(sex ~ species) ``` <img src="w2-d03-ggplot2_files/figure-html/unnamed-chunk-28-1.png" width="60%" style="display: block; margin: auto;" /> --- ```r ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm)) + geom_point() + * facet_wrap(~ species) ``` <img src="w2-d03-ggplot2_files/figure-html/unnamed-chunk-29-1.png" width="60%" style="display: block; margin: auto;" /> --- ```r ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm)) + geom_point() + * facet_grid(. ~ species) ``` <img src="w2-d03-ggplot2_files/figure-html/unnamed-chunk-30-1.png" width="60%" style="display: block; margin: auto;" /> --- ```r ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm)) + geom_point() + * facet_wrap(~ species, ncol = 2) ``` <img src="w2-d03-ggplot2_files/figure-html/unnamed-chunk-31-1.png" width="60%" style="display: block; margin: auto;" /> --- ## Faceting summary - `facet_grid()`: - 2d grid - `rows ~ cols` - use `.` for no split - `facet_wrap()`: 1d ribbon wrapped according to number of rows and columns specified or available plotting area --- ## Facet and color .pull-left-narrow[ ```r ggplot( penguins, aes(x = bill_depth_mm, y = bill_length_mm, * color = species)) + geom_point() + facet_grid(species ~ sex) + * scale_color_viridis_d() ``` ] .pull-right-wide[ <img src="w2-d03-ggplot2_files/figure-html/unnamed-chunk-32-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Face and color, no legend .pull-left-narrow[ ```r ggplot( penguins, aes(x = bill_depth_mm, y = bill_length_mm, color = species)) + geom_point() + facet_grid(species ~ sex) + scale_color_viridis_d() + * guides(color = FALSE) ``` ] .pull-right-wide[ <img src="w2-d03-ggplot2_files/figure-html/unnamed-chunk-33-1.png" width="100%" style="display: block; margin: auto;" /> ]