1 Maps

  • Choropleth maps are visually striking, but can also be a bit misleading. Consider some ways of representing election results. The fundamental issue is that people vote, not places.

1.1 2012 US Election, Winner by State

Election Map 1

Election Map 1

Vast swathes of red and blue by state.

1.2 2012 US Election, Winner by County

Election Map 2

Election Map 2

Here’s the winner by county. A bit more more heterogeneous, but not by much.

1.3 2012 US Election, Winner by County Share

Election Map 3

Election Map 3

County share grades the color scale by the winning margin.

1.4 2012 US Election, Color-Centered

Election Map 4

Election Map 4

If we introduce purple as an intermediate color things start to look a little more hazy.

1.5 2012 US Election, Color-Centered, Scaled by Population

Election Map 5

Election Map 5

And if we distort the map polygons like a water balloon based on population, things get stranger again.

1.6 2012 US Election, Electoral College Cartogram

Election Map 6

Election Map 6

Cartograms try to preserve the sense of geographical space while building the “map” from equally-sized blocks, with the number of blocks used to construct each unit varying by some quantity of interest, like population.

2 Projections

Mercator

Mercator

2.1 Four views of the US

4 US projections

4 US projections

2.2 National Atlas

US National Atlas

US National Atlas

3 America’s Ur-Choropleths

In the U.S. case, the fact that states and counties vary widely in size and population means that they can be a bit misleading. And they make it easy to present a geographical distribution to insinuate an explanation. Together the results can be frustrating. Gabriel Rossman remarked to me a while ago that most choropleth maps of the U.S. for whatever variable in effect show population density more than anything else. (There’s an xkcd strip about this, too.) The other big variable, in the U.S. case, is Percent Black. Between the two of them, population density and percent black will do a lot to obliterate many a suggestively-patterned map of the United States. Those two variables aren’t explanations of anything in isolation, but if it turns out it’s more useful to know one or both of them instead of the thing you’re plotting, you probably want to reconsider your theory.

So as a public service, here are America’s two ur-choropleths, by county. First, Population Density.

Population Density

Population Density

And Percent Black:

Percent Black

Percent Black

Here we see a recently-tweeted map that makes the point. It shows county-level estimates of gun-related suicides.

Gun Suicides

Gun Suicides

Here’s a map I drew that just shows population density, but reverse-coded so that the darker areas 1 fewer people.

Reverse-Coded Pop Density

Reverse-Coded Pop Density

4 Often, you don’t need a ‘real’ map

A consequence of the choropleth issue is that, especially for US State-level data, you may not need a fancy map at all.

4.1 Statebins, by Bob Rudis

devtools::install_github("hrbrmstr/statebins")
library(statebins)

dat <- read.csv("http://www.washingtonpost.com/wp-srv/special/business/states-most-threatened-by-trade/states.csv?cache=1", stringsAsFactors=FALSE)
head(dat)
##   fipst stab          state workers1994 workers1995 workers1996
## 1    37   NC North Carolina        3831        2631        8716
## 2     1   AL        Alabama        1504        3527        7627
## 3    28   MS    Mississippi        1613        3344        4454
## 4     5   AR       Arkansas        1376        1107        1473
## 5    47   TN      Tennessee        3210        4149        7297
## 6    45   SC South Carolina        1991        1289        3422
##   workers1997 workers1998 workers1999 workers2000 workers2001 workers2002
## 1       12324        9428       12414        4790       12596       19564
## 2        6858        5340        8697        3919        4400        9895
## 3        1938        3108        6143        2839        3292        4796
## 4        2426        2559        2653         604        3041        3741
## 5        7544        7435        5706        5107        7104        8137
## 6         975        3922        1739        2427        6296        6585
##   workers2003 workers2004 workers2005 workers2006 workers2007 workers2008
## 1       21126       11435       13167       10602       14815       11767
## 2        3285        3156        1895        5358        4156        6377
## 3        3785        1350        1345        1807        2246        3366
## 4        3417        1093        2223        3781        4905        3220
## 5        7528        2673        6156        4849        8254        7262
## 6        5690        4623        5913        5270        3473        4479
##   workers2009 workers2010 workers2011 workers2012 workers2013
## 1       11533       12746        4120        2953        3301
## 2        6661        5306         571        1033         834
## 3        3262         473          77         386         226
## 4        2891        3088        2239        3769         507
## 5        7647        7938        7100        2108        2606
## 6        5846        2575        1513        1864        1321
##   share_cut1994 share_cut1995 share_cut1996 share_cut1997 share_cut1998
## 1           1.4           0.9           3.0           4.1           3.0
## 2           1.1           2.4           5.2           4.6           3.5
## 3           2.0           3.9           5.1           2.2           3.4
## 4           1.6           1.3           1.6           2.7           2.7
## 5           1.6           2.0           3.4           3.5           3.3
## 6           1.5           1.0           2.5           0.7           2.7
##   share_cut1999 share_cut2000 share_cut2001 share_cut2002 share_cut2003
## 1           3.9           1.5           3.8           6.1           6.7
## 2           5.6           2.5           2.8           6.5           2.2
## 3           6.7           3.1           3.6           5.4           4.3
## 4           2.8           0.6           3.1           3.9           3.6
## 5           2.5           2.2           3.1           3.6           3.3
## 6           1.2           1.6           4.1           4.5           3.9
##   share_cut2004 share_cut2005 share_cut2006 share_cut2007 share_cut2008
## 1           3.6           4.1           3.2           4.3           3.4
## 2           2.1           1.2           3.3           2.6           3.9
## 3           1.5           1.5           2.0           2.5           3.7
## 4           1.1           2.3           3.8           4.9           3.2
## 5           1.2           2.7           2.1           3.5           3.0
## 6           3.1           3.9           3.4           2.2           2.8
##   share_cut2009 share_cut2010 share_cut2011 share_cut2012 share_cut2013
## 1           3.5           4.1           1.3           0.9           1.0
## 2           4.3           3.6           0.4           0.7           0.6
## 3           3.7           0.6           0.1           0.5           0.3
## 4           3.0           3.3           2.4           3.9           0.5
## 5           3.4           3.7           3.2           0.9           1.1
## 6           3.9           1.8           1.0           1.2           0.9
##   avgshare avgshare94_00 avgshare01_07 avgshare08_12
## 1     3.56          2.53          4.56          2.64
## 2     3.22          3.55          2.95          2.58
## 3     2.94          3.77          2.99          1.71
## 4     2.89          1.91          3.27          3.16
## 5     2.86          2.64          2.77          2.85
## 6     2.61          1.60          3.57          2.14
p <- statebins(dat, "state", "avgshare94_00", breaks=4,
                labels=c("0-1", "1-2", "2-3", "3-4"),
                legend_title="Share of workforce with jobs\nlost or threatened by trade",
                font_size=3)

p + theme(legend.position="bottom") +
    ggtitle("1994-2000")
Statebins Example 2
p <- statebins_continuous(dat, "state", "avgshare01_07",
               legend_title="Share of workforce with jobs\nlost or threatened by trade",
               brewer_pal = "OrRd",
                font_size=3)

p + theme(legend.position="bottom") +
    ggtitle("2001-2007")
Statebins Example 3
library(httr)
library(dplyr)

election_2012 <- GET("https://raw.githubusercontent.com/hrbrmstr/statebins/master/tmp/election2012.csv")

results <- read.csv(textConnection(content(election_2012, as="text")),
                    header=TRUE, stringsAsFactors=FALSE)

results <- results %>%
    mutate(color=ifelse(is.na(Obama), "#2166ac", "#b2182b")) %>%
    select(state, color) 
head(results, 15)
##    state   color
## 1     AL #2166ac
## 2     AK #2166ac
## 3     AZ #2166ac
## 4     AR #2166ac
## 5     CA #b2182b
## 6     CO #b2182b
## 7     CT #b2182b
## 8     DE #b2182b
## 9     DC #b2182b
## 10    FL #b2182b
## 11    GA #2166ac
## 12    HI #b2182b
## 13    ID #2166ac
## 14    IL #b2182b
## 15    IN #2166ac
results %>%
    statebins_manual(font_size=4, text_color = "white",
                     labels=c("Romney", "Obama"),
                     legend_position="right",
                     legend_title="Winner")

5 US County Maps in R

Let’s make the Ur-Choropleths. This code borrows heavily from exellent work by Bob Rudis: https://github.com/hrbrmstr/rd3albers. You can get the county map repository from the terminal: git clone https://github.com/kjhealy/us-county.git. Or clone it in your browser, or git client.

5.1 Setting up the data

library(maptools)
library(mapproj)
library(rgeos)
library(rgdal)
library(RColorBrewer)
library(ggplot2)
library(stringr)
library(scales)
library(RColorBrewer)

theme_set(theme_minimal())

## for theme_map
## devtools::source_gist("33baa3a79c5cfef0f6df")

theme_map <- function(base_size=9, base_family="") {
    require(grid)
    theme_bw(base_size=base_size, base_family=base_family) %+replace%
    theme(axis.line=element_blank(),
          axis.text=element_blank(),
          axis.ticks=element_blank(),
          axis.title=element_blank(),
          panel.background=element_blank(),
          panel.border=element_blank(),
          panel.grid=element_blank(),
          panel.margin=unit(0, "lines"),
          plot.background=element_blank(),
          legend.justification = c(0,0),
          legend.position = c(0,0)
          )
}

First we need an actual map — the US census shapefiles, converted to geojson format.

## US Census Shapefiles
## https://www.census.gov/geo/maps-data/data/cbf/cbf_counties.html

## Converted to geojson format
## http://eric.clst.org/Stuff/USGeoJSON
## Read U.S. counties moderately-simplified GeoJSON file
us.counties <- readOGR(dsn="assets/gz_2010_us_050_00_5m.json",
                       layer="OGRGeoJSON")
## OGR data source with driver: GeoJSON 
## Source: "assets/gz_2010_us_050_00_5m.json", layer: "OGRGeoJSON"
## with 3221 features
## It has 6 fields

Next, we transform the map data to an Albers equal area projection.

# Convert it to Albers equal area
us.counties.aea <- spTransform(us.counties,
                               CRS("+proj=laea +lat_0=45 +lon_0=-100 +x_0=0 +y_0=0 +a=6370997 +b=6370997 +units=m +no_defs"))

us.counties.aea@data$id <- rownames(us.counties.aea@data)

We want to have a large map focused on the lower 48, but also preserve Alaska and Hawaii. So we extract, rotate, and move those two states. The state code numbers are standard FIPS codes, by the way.

# Extract, then rotate, shrink & move alaska (and reset projection)
# need to use state IDs via # https://www.census.gov/geo/reference/ansi_statetables.html
alaska <- us.counties.aea[us.counties.aea$STATE=="02",]
alaska <- elide(alaska, rotate=-50)
alaska <- elide(alaska, scale=max(apply(bbox(alaska), 1, diff)) / 2.3)
alaska <- elide(alaska, shift=c(-2100000, -2500000))
proj4string(alaska) <- proj4string(us.counties.aea)

# extract, then rotate & shift hawaii
hawaii <- us.counties.aea[us.counties.aea$STATE=="15",]
hawaii <- elide(hawaii, rotate=-35)
hawaii <- elide(hawaii, shift=c(5400000, -1400000))
proj4string(hawaii) <- proj4string(us.counties.aea)

# remove old states and put new ones back in; note the different order
# we're also removing puerto rico in this example but you can move it
# between texas and florida via similar methods to the ones we just used
us.counties.aea <- us.counties.aea[!us.counties.aea$STATE %in% c("02", "15", "72"),]
us.counties.aea <- rbind(us.counties.aea, alaska, hawaii)

This just gives us a map. Next we will merge in some state- and county-level data that we can actually plot.

state.data <- read.csv("assets/state-data-statabs-2012.csv", header=TRUE)

county.names <- read.csv("assets/fips-by-state.csv", header=TRUE)

county.data <- read.csv("assets/DataSet.txt", header=TRUE)
county.data$id <- as.character(county.data$fips)
ind <- county.data$fips<10000
county.data$id[ind] <- paste("0", county.data$id[ind], sep="")
county.data$id[county.data$id=="00"] <- "0"

ind <- match(county.data$fips, county.names$fips)
county.data$name <- county.names$name[ind]
county.data$state <- county.names$state[ind]

ind <- match(state.data$fips, county.data$fips)
county.data$state[ind] <- state.data$State.Abbr

## Add state names as levels of county name, so states have FIPS too
levels(county.data$name) <- c(levels(county.data$name), levels(state.data$State))
county.data$name[ind] <- state.data$State


### Add census region. Don't call the variable "region" because that's
### already reserved by the map object
ind <- match(county.data$state, state.data$State.Abbr)
county.data$census.region <- state.data$Region[ind]

The next step is to cut the population density into some bins.

library(Hmisc)
county.data$pop.dens <- with(county.data, PST045214/LND110210)
county.data$pop.dens <- cut2(county.data$pop.dens,
                             cuts = c(0, 10, 100, 1000, 10000))

county.data$pct.black <- cut2(county.data$RHI225213,
                              cuts = c(0, 2, 5, 10, 15, 25, 50))

Then we “fortify” the US County map data. In effect we create a data frame that ggplot can use. And we merge that spatial data with our county-level census data.

co.map <- fortify(us.counties.aea, region="GEO_ID")
co.map$id <- str_replace(co.map$id, "0500000US", "")

co.map <- merge(co.map, county.data, by="id")

5.2 Drawing the Maps

5.2.1 Population Density

p <- ggplot(data=co.map, aes(x=long, y=lat, group=group))

p1 <- p + geom_map(data=co.map,
                   map = co.map,
                   aes(map_id=id,
                       x=long,
                       y=lat,
                       group=group,
                       fill=pop.dens),
                   color="white",
                   size=0.2)

p2 <- p1 + scale_fill_brewer(palette="PuBu",
                             labels = c("0-10", "10-100", "100-1,000",
                                        "1,000-10,000", ">10,000"))
p2 <- p2 + coord_equal()
p2 <- p2 + theme_map()
p2 <- p2 + theme(legend.position="right") + labs(fill="Population per\nsquare mile")
p2 <- p2 + ggtitle("US Population Density, 2014")
p2
US County-Level Population Density

5.2.2 Percent Black

### Percent Black
p <- ggplot(data=co.map, aes(x=long, y=lat, group=group))

p1 <- p + geom_map(data=co.map,
                   map = co.map,
                   aes(map_id=id,
                       x=long,
                       y=lat,
                       group=group,
                       fill=pct.black),
                   color="white",
                   size=0.2)

p2 <- p1 + scale_fill_brewer(palette="Oranges",
                             labels = c("<2", "2-5", "5-10",
                                        "10-15", "15-25", "25-50", ">50"))
p2 <- p2 + coord_equal()
p2 <- p2 + theme_map()
p2 <- p2 + theme(legend.position="right") + labs(fill="Percent of\nPopulation, 2013")
p2 <- p2 + ggtitle("US Population, Percent Black")
p2
US County-Level Percent Black

5.3 Subsetting

Note that once we have the data frame set up, we can subset it just like we would for any other dataset. For instance, let’s take a look at Mississippi.

p <- ggplot(data=subset(co.map, state == "MS"), aes(x=long, y=lat, group=group))

p1 <- p + geom_map(data = subset(co.map, state == "MS"),
                   map = subset(co.map, state == "MS"),
                   aes(map_id=id,
                       x=long,
                       y=lat,
                       group=group,
                       fill=pct.black),
                   color="white",
                   size=0.2)

p2 <- p1 + scale_fill_brewer(palette="Oranges",
                             labels = c("<2", "2-5", "5-10",
                                        "10-15", "15-25", "25-50", ">50"))

p2 <- p2 + coord_equal()
p2 <- p2 + theme_map()
p2 <- p2 + theme(legend.position="right") + labs(fill="Percent of \nPopulation, 2013")
p2 <- p2 + ggtitle("MS State Population, Percent Black")
p2
Mississippi

5.4 See also