Syllabus

Overview

This is a half-semester graduate seminar. It’s self-contained, but goes well with its companion on Data Management, taught in the second half on the semester.

Course objectives

This class will teach you how to use modern, widely-used tools to create insightful, effective, reproducible visualizations of social science data. We will also put the theory and practice of visualization into context. By that I mean that we will think about different ways of looking at social science data, about where data comes from in the first place, and the implications of choosing to represent it in different ways.

By the end of the course you will

  • Understand the basic principles behind effective data visualization.
  • Have a practical sense for why some graphs and figures work well, while others may fail to inform or actively mislead.
  • Know how to create a wide range of plots in R using ggplot2.
  • Know how to refine plots for effective presentation.

Core Texts

I strongly recommend (but do not require) you buy two books:

  • Kieran Healy, Data Visualization: A Practical Introduction (Princeton: Princeton University Press, 2019), http://socviz.co/. [Draft version free online; print version at Amazon and all good bookshops.]
  • Hadley Wickham and Garrett Grolemund, R for Data Science: Import, Tidy, Transform, Visualize, and Model Data (Sebastopol, California: O’Reilly Media, 2017), http://r4ds.had.co.nz/. [Free online; print version at Amazon and all good bookshops.]

You should also be aware of:

Other Material

We may also read other material as we go. I will make it available to you beforehand. It will include material from the following books, amongst other sources:

  • Whitney Battle-Baptiste and Britt Rusert, W. E. B. Du Bois’s Data Portraits: Visualizing Black America (New York: Princeton Architectural Press, 2018).
  • Scott Berinato, Good Charts: The HBR Guide to Making Smarter, More Persuasive Data Visualizations (Cambridge, MA: Harvard Business Review Press, 2016).
  • Alberto Cairo, The Truthful Art: Data, Charts, and Maps for Communication (Berkeley, California: New Riders, 2016).
  • William S. Cleveland, Visualizing Data (Hobart Press, 1994).
  • Stephen Few, Now You See It: Simple Visualization Techniques for Quantitative Analysis (Oakland, CA: Analytics Press, 2009).
  • Ellen Lupton, Thinking with Type: A Critical Guide for Designers, Writers, Editors, & Students, Second. (New York: Princeton Architectural Press, 2010).
  • Tamara Munzer, Visualization Analysis and Design, AK Peters Visualization Series (Boca Raton, FL: CRC Press, 2014).
  • Edward R. Tufte, The Visual Display of Quantitative Information (Cheshire, CT: Graphics Press, 1983).
  • Colin Ware, Visual Thinking for Design (Waltham, MA: Morgan Kaufman, 2008).
  • Nathan Yau, Visualize This: The Flowingdata Guide to Design, Visualization, and Statistics (New York: Wiley, 2011).

Software

We will do all of our visualization work in this class using R. We will use RStudio to manage our code and projects.

You will need to install some software first. Here is what to do:

  1. Get the most recent version of R. R is free and available for Windows, Mac, and Linux operating systems. Download the version of R compatible with your operating system. If you are running Windows or MacOS, you should choose one of the precompiled binary distributions (i.e., ready-to-run applications) linked at the top of the R Project’s webpage.

  2. Once R is installed, download and install R Studio. R Studio is an “Integrated Development Environment”, or IDE. This means it is a front-end for R that makes it much easier to work with. R Studio is also free, and available for Windows, Mac, and Linux platforms.

  3. Install the tidyverse library and several other add-on packages for R. These libraries provide useful functionality that we will take advantage of throughout the book. You can learn more about the tidyverse’s family of packages at its website.

    To install the tidyverse, make sure you have an Internet connection and then launch R Studio. Type the following lines of code at R’s command prompt, located in the window named “Console”, and hit return. In the code below, the <- arrow is made up of two keystrokes, first < and then the short dash or minus symbol, -.

my_packages <- c("tidyverse", "broom", "cowplot", "drat",
                 "gapminder", "GGally", "ggforce", "ggrepel", "ggridges", "gridExtra",
                 "here", "interplot", "margins", "maps", "mapproj",
                 "mapdata", "MASS", "quantreg", "rlang", "scales",
                 "survey", "srvyr", "devtools")

install.packages(my_packages, repos = "http://cran.rstudio.com")

R Studio should then download and install these packages for you. It may take a little while to download everything.

With these packages available, you can then install one last library of material that’s useful specifically for this course:

install.packages("socviz", repos = "http://cran.rstudio.com")

Schedule

As the weeks go by, consult the Schedule Page for more information on weekly topics, problem sets, readings, and other materials. The schedule is likely to change as we go. Links to readings, assignments, and other materials from class will be posted on that page.