class: center, middle, inverse, title-slide # Tidy data ##
Introduction to Data Science ###
introds.org
###
Dr. Mine Çetinkaya-Rundel --- layout: true <div class="my-footer"> <span> <a href="https://introds.org" target="_blank">introds.org</a> </span> </div> --- ## Tidy data >Happy families are all alike; every unhappy family is unhappy in its own way. > >Leo Tolstoy -- .pull-left[ **Characteristics of tidy data:** - Each variable forms a column. - Each observation forms a row. - Each type of observational unit forms a table. ] -- .pull-right[ **Characteristics of untidy data:** !@#$%^&*() ] --- ## .question[ What makes this data not tidy? ] <img src="img/hyperwar-airplanes-on-hand.png" width="70%" style="display: block; margin: auto;" /> .footnote[ Source: [Army Air Forces Statistical Digest, WW II](https://www.ibiblio.org/hyperwar/AAF/StatDigest/aafsd-3.html) ] --- .question[ What makes this data not tidy? ] <br> <img src="img/hiv-est-prevalence-15-49.png" width="70%" style="display: block; margin: auto;" /> .footnote[ Source: [Gapminder, Estimated HIV prevalence among 15-49 year olds](https://www.gapminder.org/data) ] --- .question[ What makes this data not tidy? ] <br> <img src="img/us-general-economic-characteristic-acs-2017.png" width="85%" style="display: block; margin: auto;" /> .footnote[ Source: [US Census Fact Finder, General Economic Characteristics, ACS 2017](https://factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?pid=ACS_17_5YR_DP03&src=pt) ] --- ## Displaying vs. summarising data .panelset[ .panel[.panel-name[Output] .pull-left[ ``` ## # A tibble: 87 x 3 ## name height mass ## <chr> <int> <dbl> ## 1 Luke Skywalker 172 77 ## 2 C-3PO 167 75 ## 3 R2-D2 96 32 ## 4 Darth Vader 202 136 ## 5 Leia Organa 150 49 ## 6 Owen Lars 178 120 ## # … with 81 more rows ``` ] .pull-right[ ``` ## # A tibble: 3 x 2 ## gender avg_ht ## <chr> <dbl> ## 1 feminine 165. ## 2 masculine 177. ## 3 <NA> 181. ``` ] ] .panel[.panel-name[Code] .pull-left[ ```r starwars %>% select(name, height, mass) ``` ] .pull-right[ ```r starwars %>% group_by(gender) %>% summarize( avg_ht = mean(height, na.rm = TRUE) ) ``` ] ] ]