Introduction to the Grammar of Graphics
Reading: The Age That Women Have Babies
Get into groups of 2-3 people and discuss the following questions:


A variable is a data item that can vary or change, and it can take on different values.
There are two types of scales a variable could be:
Categorical Variable
Quantitative Variable
Type of variable that represents categories or groups. It takes one of a limited, usually fixed, number of possible values, assigning each unit of observation to a particular group or nominal category on the basis of some qualitative property.
Binary: Only takes two possible values, and are typically represented as 1 or 0.
Nominal: Categories have no clear order, and are mutually exclusive
Ordinal: Have a clear ordering of the categories
Numerical data point (AKA numerical variable) that can be measured or counted, allowing for mathematical operations to be applied.
Discrete: Can take any countable value represented as an integer (whole number),
Continuous: Can take any measurable value within a given range, including fractions and decimals.
While the grammar of graphics is an excellent tool that lets us specify and construct a visual display of data, it does not tell us what graphic to use.
| Purpose | Example Chart Types |
|---|---|
| Change over time | Timeline (line chart), area chart, slope chart |
| Showing part-to-whole | Stacked bar, donut chart, treemap |
| Comparisons | Bar chart , column chart, dot plot |
| Distributions (quantitative variable) | Histogram, boxplot, violin plot, density |
| Ranking | Bar chart (sorted), lollipop chart |
| Relationships | Scatter plot, bubble chart, connected scatter |
| Correlations | Scatter plot, matrix chart |
| Geospatial | Choropleth, symbol map |
| Flow | Arrow charts, Sankey diagram |
| Text | Word cloud |
For each graph, identify the type of variables used (i.e. quantitative or categorical), and the analytical task that can be done with such graph.
From a conceptual perspective, making graphics involves mapping data to geometric objects and their visual properties.
We believe that the GG is a good starting point that gives us a framework (mental map) and a vocabulary to create graphics.
This grammar allows creation of graphics to be:
consistent
reusable
modular
Formalized by Leland Wilkinson (1999)


An aesthetic mapping links a variable in the data to a visual channel that can encode its variation.


The geometry describes how to translate the observations into marks on the page.
Explain why the following graphs do not implement grammar of graphics correctly.





ggplot2()A plot can be decomposed into three primary elements:
Here’s an example of code and output using ggplot2 in R.
# A tibble: 344 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen NA NA NA NA
5 Adelie Torgersen 36.7 19.3 193 3450
6 Adelie Torgersen 39.3 20.6 190 3650
7 Adelie Torgersen 38.9 17.8 181 3625
8 Adelie Torgersen 39.2 19.6 195 4675
9 Adelie Torgersen 34.1 18.1 193 3475
10 Adelie Torgersen 42 20.2 190 4250
# ℹ 334 more rows
# ℹ 2 more variables: sex <fct>, year <int>
Question: In terms of the way they are constructed…
What do these plots have in common? How do they differ?
01:20
Question: What do these plots have in common? How do they differ?
01:20
Question: What are the aesthetic mappings and geometries used here?
01:30
This is the process of how data is encoded to a visual through the framework that grammar of graphics offers.
Make sure to turn in your worksheet before leaving with your name on it please!!
Read The Persistent Grip of Social Class on College Admissions
