Example plots with country-level data
The previous post mentioned the BuRd theme and ColorBrewer. Here are some possible uses of both in a series of plots with cross-sectional country-level data. The code uses pooled WDI estimates for fertility and real GDP per capita as measured by the World Bank, and then adds UN region names to the data.
// package dependencies ssc install wbopendata ssc install kountry // WDI data wbopendata, indicator(SP.DYN.TFRT.IN; NY.GDP.PCAP.PP.CD) year(2008:2010) long clear collapse /// (mean) fr = sp_dyn_tfrt_in /// (mean) gdpc = ny_gdp_pcap_pp_cd /// , by(countrycode) // geo indicators kountry countrycode, from(iso3c) geo(un) encode GEO, gen(region)
Geographical regions make it easy to plot the data over small multiples. I also often find it useful to look at a mosaic plot to diagnose how seriously missing data puts representativeness to threat in the sample.
// package dependencies ssc install splineplot // small multiples gr hbox gdpc, over(region, sort(1) des) mark(1, ms(i) mlab(countrycode) mlabp(0)) name(boxes, replace) hist fr, bin(4) by(region, total) name(bins, replace) // missing data gen full = !mi(fr, gdpc) spineplot full region
Going a bit further with regression results, a variety of graphs can be useful for running diagnostics. The first one shown below is a LOESS fit across the residuals against the fitted values, and the second one is an example of weighted markers where the error term is shown along the linear fit.
// residuals gen loggdpc = ln(gdpc) reg fr loggdpc predict r, resid predict yhat // residuals-versus-fitted values, plus LOESS sc r yhat, mlab(countrycode) yline(0) ms(i) mlabp(0) || lowess r yhat, /// name(residuals_loess, replace) // linear fit with residually weighted points sc fr loggdpc if abs(r) > .3 [w = abs(r)], ms(O) mc(gs14) mfc(gs12) || /// lfit fr loggdpc || /// sc fr loggdpc, ms(i) mlab(countrycode) mlabc(gs6) mlabp(0) legend(off) /// name(residuals_rvf, replace)
Last, a map of the residuals can also be informative if there is suspicion of spatial dependence in the error term:
// package dependencies ssc install spmap // map of residuals (caution with intervals) merge 1:1 countrycode using world-d, keep(match master) gen(mapmerge) spmap r using world-c, id(_ID) clmethod(boxplot) /// fcolor(RdYlGn) ndocolor(gs12) ndfcolor(gs14) ocolor(none ..) /// legstyle(1) legend(ring(1) pos(3)) /// name(residuals_map, replace)
Country-level data is an ideal candidate for plot tweaks such as using marker labels instead of observations. With survey data, there would be more work to do at the level of the data itsef, and text labels would have to be taken from aggregate measures like relative frequencies or averages, which makes it more complex to plot the data quickly and efficiently.