From the Sunlight Foundation blog:
Some 30 percent of all the money raised in last year’s presidential election came from just 10 of the nation’s more than 3,000 counties, all of them in major metropolitan areas. But a high proportion of multi-millionaires placed a couple of sparsely populated Wyoming counties among the last election cycle’s highest per-capita givers.
These are just a few of the interesting patterns of political influence that the Sunlight Foundation is beginning to uncover from a partnership with a Philadelphia-based firm that specializes in mapping and geo-spatial analysis. Over the summer, we worked together to create location-based analyses of the federal campaign finance data displayed on Influence Explorer. The partnership produced new and more accurate ways to identify trends in political spending through the power of data vizualization.
What would John Snow’s famous cholera map look like on a modern map of London, using modern mapping tools? The map changed what we know about germs and disease - and created a new way of looking at the world. With the help of mapping tool CartoDB and using the Stamen style maps, this is how it looks with larger circles representing more deaths.
The previous post mentioned the BuRd theme and ColorBrewer. Here are some possible uses of both in a series of plots with cross-sectional country-level data. The code uses pooled WDI estimates for fertility and real GDP per capita as measured by the World Bank, and then adds UN region names to the data.
// package dependencies ssc install wbopendata ssc install kountry // WDI data wbopendata, indicator(SP.DYN.TFRT.IN; NY.GDP.PCAP.PP.CD) year(2008:2010) long clear collapse /// (mean) fr = sp_dyn_tfrt_in /// (mean) gdpc = ny_gdp_pcap_pp_cd /// , by(countrycode) // geo indicators kountry countrycode, from(iso3c) geo(un) encode GEO, gen(region)
Geographical regions make it easy to plot the data over small multiples. I also often find it useful to look at a mosaic plot to diagnose how seriously missing data puts representativeness to threat in the sample.
// package dependencies ssc install splineplot // small multiples gr hbox gdpc, over(region, sort(1) des) mark(1, ms(i) mlab(countrycode) mlabp(0)) name(boxes, replace) hist fr, bin(4) by(region, total) name(bins, replace) // missing data gen full = !mi(fr, gdpc) spineplot full region
Going a bit further with regression results, a variety of graphs can be useful for running diagnostics. The first one shown below is a LOESS fit across the residuals against the fitted values, and the second one is an example of weighted markers where the error term is shown along the linear fit.
// residuals gen loggdpc = ln(gdpc) reg fr loggdpc predict r, resid predict yhat // residuals-versus-fitted values, plus LOESS sc r yhat, mlab(countrycode) yline(0) ms(i) mlabp(0) || lowess r yhat, /// name(residuals_loess, replace) // linear fit with residually weighted points sc fr loggdpc if abs(r) > .3 [w = abs(r)], ms(O) mc(gs14) mfc(gs12) || /// lfit fr loggdpc || /// sc fr loggdpc, ms(i) mlab(countrycode) mlabc(gs6) mlabp(0) legend(off) /// name(residuals_rvf, replace)
Last, a map of the residuals can also be informative if there is suspicion of spatial dependence in the error term:
// package dependencies ssc install spmap // map of residuals (caution with intervals) merge 1:1 countrycode using world-d, keep(match master) gen(mapmerge) spmap r using world-c, id(_ID) clmethod(boxplot) /// fcolor(RdYlGn) ndocolor(gs12) ndfcolor(gs14) ocolor(none ..) /// legstyle(1) legend(ring(1) pos(3)) /// name(residuals_map, replace)
Country-level data is an ideal candidate for plot tweaks such as using marker labels instead of observations. With survey data, there would be more work to do at the level of the data itsef, and text labels would have to be taken from aggregate measures like relative frequencies or averages, which makes it more complex to plot the data quickly and efficiently.
Observed US Temperature Change
A new report by the US Global Change Research Program explores climate change and its implications. The first draft, issued for public review, is the work of a 60-person advisory committee and 240 different authors. It draws on data from across US agencies.
Via the report (PDF):
U.S. temperatures will continue to rise, with the next few decades projected to see another 2°F to 4°F of warming in most areas. The amount of warming by the end of the century is projected to correspond closely to the cumulative global emissions of greenhouse gases up to that time: roughly 3°F to 5°F under a lower emissions scenario involving substantial reductions in emissions after 2050 (referred to as the “B1 scenario”), and 5°F to 10°F for a higher emissions scenario assuming continued increases in emissions (referred to as the “A2 scenario”)…
Human-induced climate change means much more than just hotter weather. Increases in ocean and freshwater temperatures, frost-free days, and heavy downpours have all been documented. Sea level has risen, and there have been large reductions in snow-cover extent, glaciers, permafrost, and sea ice. Winter storms along the west coast and the coast of New England have increased slightly in frequency and intensity. These changes and other climatic changes have affected and will continue to affect human health, water supply, agriculture, transportation, energy, and many other aspects of society.
Image: Observed US Temperature Change, via the NCADAC. “The colors on the map show temperature changes over the past 20 years in °F (1991-2011) compared to the 1901-1960 average. The bars on the graphs show the average temperature changes by decade for 1901-2011 (relative to the 1901-1960 average) for each region. The far right bar in each graph (2000s decade) includes 2011. The period from 2001 to 2011 was warmer than any previous decade in every region. (Figure source: NOAA NCDC / CICS-NC. Data from NOAA NCDC.)” Select to embiggen.