1 Learning R

  • There are two ways to learn R: the easy way and the tedious way.
  • The problem is that the easy way doesn’t work.
  • You have to practice the examples and work through them manually. Type them out, even if you’re just copying at the beginning. It really will help you get used to how the language works.
  • You will benefit a lot from taking almost any R tutorial, whether from a textbook or online. The syllabus has links.
  • For example, Try R.

2 Visualization Principles Again

  • Visualizing data is not just a matter of good taste.
  • Basic perceptual processes play a very strong role.
  • These have consequences for how we will want to encode data when we visualize it—i.e., how and whether we choose to represent numbers or categories as shapes, colors, lengths, etc.

2.1 Perception

  • We more easily see edges, contrasts, and movement.
  • We judge relative differences rather than absolute values.
  • We tend to infer relationships between elements based on gestalt-like rules.

2.2 Hermann Grid Effect

Hermann Grid Effect

Hermann Grid Effect

2.3 Contrast Effects

Contrast Effects

Contrast Effects

2.4 Color makes things Complex

Color Makes things More Complex

Color Makes things More Complex

2.5 What stands out?

What stands out

What stands out

  • (Courtesy Miriah Meyer.)

2.6 Gestalt Rules

Gestalt Rules

Gestalt Rules

  • Bang Wong, Nature Methods 7 863 (2010)
Gestalt Rules

Gestalt Rules

3 Picking Out a Data Point

3.1 Pick it out by Shape?

  • Highlight by shape

3.2 Pick it out by Color?

  • Highlight by color

3.3 Pick it out by Size?

  • Highlight by size

3.4 How about all three?

  • Highlight by all three

3.5 Multiple channels of comparison become uninterpretable very fast

3.6 Unless your data has a lot of structure

The data on the graph are the reason for the existence of the graph.

Cleveland (1994, 25)

4 Writing Plots

4.1 Go get the Gapminder Data

gapminder.url <- "https://raw.githubusercontent.com/socviz/soc880/master/data/gapminder.csv"
my.data <- read.csv(url(gapminder.url))
dim(my.data)
## [1] 1704    6
head(my.data)
##   country continent year lifeExp      pop gdpPercap
## 1 Algeria    Africa 1952  43.077  9279525  2449.008
## 2 Algeria    Africa 1957  45.685 10270856  3013.976
## 3 Algeria    Africa 1962  48.303 11000948  2550.817
## 4 Algeria    Africa 1967  51.407 12760499  3246.992
## 5 Algeria    Africa 1972  54.518 14760787  4182.664
## 6 Algeria    Africa 1977  58.014 17152804  4910.417
  • Remember what we said before about everything being an object, and every object having a class.
## We'll be a bit more verbose
## to make things clearer
p <- ggplot(data=my.data,
            aes(x=gdpPercap,
                y=lifeExp)) 
  • ggplot works by building your plot piece by piece
  • We start with a clean data frame called my.data
  • Then we tell ggplot what pieces of it we are interested in right now.
  • We create an object called p containing this information
  • Here, x=gdpPercap and y=lifeExp say what will go on the x and the y axes
  • These are aesthetic mappings that connect pieces of the data to things we can actually see on a plot.

4.2 About aesthetic mappings

  • The aes() function links variables to things you will see on the plot.
  • The x and y values are the most obvious ones.
  • Other aesthetic mappings include, e.g., color, shape, and size.
  • These mappings are not directly specifying what specific, e.g., colors or shapes will be on the plot. Rather they say which variables in the data will be represented by, e.g., colors and shapes on the plot.

4.3 Adding layers to the plot

  • What happens when you type p at the console and hit return?
  • We need to add a layer to the plot.
  • This takes the p object we’ve created, and applies geom_point() to it, a function that knows how to take x and y values and plot them in a scatterplot.
p + geom_point()

5 The Plot-Making Process

    1. Start with your data in the right shape
    1. Tell ggplot what relationships you want to see
    1. Tell ggplot how you want to see them
    1. Layer these pictures as needed
    1. Fine-tune scales, labels, tick marks, etc
  • This layering process is literally additive
p <- ggplot(my.data,
            aes(x=gdpPercap, y=lifeExp))

p + geom_point()

p + geom_point() +
    geom_smooth(method="loess") 

  • Here we add a second geom. It’s a loess smoother. There are others. Try lm, for example.

  • What happens when you put geom_smooth() first instead of second?
  • Notice how both geom_point and geom_smooth() inherit the information in p about what the x and y variables are.

p + geom_point() +
    geom_smooth(method="loess") +
    scale_x_log10()

  • The next layer does not change anything in the underlying data. Instead it adjusts the x-axis scale.
p + geom_point(color="firebrick") +
    geom_smooth(method="loess") +
    scale_x_log10()

  • Here, notice we changed the color of the points by specifying the color argument in geom_point(). This is called setting an aesthetic feature.

  • Setting an aesthetic has no relationship to the data. In the previous plot, the color red is not representing or mapping any feature of the data.
  • To see the difference between setting and mapping an aesthetic, let’s go back to our p object and recreate it.
  • This time, in addition to x and y we tell ggplot to map the variable Continent to the color aesthetic.

p <- ggplot(my.data,
            aes(x=gdpPercap,
                y=lifeExp,
                color=continent))
  • Now there is a relationship or mapping between the data and the aesthetic.
  • The values of the variable continent will be represented by colors on the figure we draw.
p + geom_point() +
    scale_x_log10()

  • Like this. We do not manually specify any colors. We told ggplot() to map the values of contintent to the property, or aesthetic, of color
  • Try mapping continent to the aesthetic shape.

5.1 Colorless green ideas sleep furiously

  • ggplot implements a “grammar” of graphics, an idea developed by Leland Wilkinson (2005).
  • The grammar gives you rules for how to map pieces of data to geometric objects (like points and lines) with attributes (like color and size), together with further rules for transforming the data if needed, adjusting scales, or projecting the results onto a coordinate system.
  • A key point is that, like other rules of syntax, it limits what you can say but doesn’t make what you say sensible or meaningful.
  • It allows you to produce “sentences” (mappings of data to objects) but they can easily be garbled.

5.2 More work needed (1)

p + geom_line()

5.3 More work needed (2)

p + geom_bar(stat="identity")

5.4 Once you get used to it, this layered grammar lets you build up sophisticated plots

## "Not in" convenience operator
"%nin%" <- function(x, y) {
  return( !(x %in% y) )
}

library(scales)

p <- ggplot(subset(my.data, country %nin% "Kuwait"), aes(x=year, y=gdpPercap))

p1 <- p + geom_line(color="gray70", aes(group=country)) +
    geom_smooth(size=1.1, method="loess", se=FALSE)

p1 + facet_wrap(~ continent, ncol=1) +
    labs(x="Year", y="GDP per capita") +
    scale_y_continuous(labels=dollar)
  • To see the logic of each step of a plot, peel the layers backwards from the last one to the first, and see which parts of the plot are changed, or disappear.
  • Also examine what happens if you change some of the arguments, e.g. se=TRUE, or method='lm', or what happens when you leave them at their defaults.