Example plots with country-level data

The previous post mentioned the BuRd theme and ColorBrewer. Here are some possible uses of both in a series of plots with cross-sectional country-level data. The code uses pooled WDI estimates for fertility and real GDP per capita as measured by the World Bank, and then adds UN region names to the data.

// package dependencies

ssc install wbopendata
ssc install kountry

// WDI data

wbopendata, indicator(SP.DYN.TFRT.IN; NY.GDP.PCAP.PP.CD) year(2008:2010) long clear
collapse ///
    (mean) fr = sp_dyn_tfrt_in /// 
    (mean) gdpc = ny_gdp_pcap_pp_cd ///
    , by(countrycode)

// geo indicators

kountry countrycode, from(iso3c) geo(un)
encode GEO, gen(region)

Geographical regions make it easy to plot the data over small multiples. I also often find it useful to look at a mosaic plot to diagnose how seriously missing data puts representativeness to threat in the sample.

// package dependencies

ssc install splineplot

// small multiples

gr hbox gdpc, over(region, sort(1) des) mark(1, ms(i) mlab(countrycode) mlabp(0)) name(boxes, replace)

hist fr, bin(4) by(region, total) name(bins, replace)

// missing data

gen full = !mi(fr, gdpc)
spineplot full region

Going a bit further with regression results, a variety of graphs can be useful for running diagnostics. The first one shown below is a LOESS fit across the residuals against the fitted values, and the second one is an example of weighted markers where the error term is shown along the linear fit.

// residuals

gen loggdpc = ln(gdpc)
reg fr loggdpc

predict r, resid
predict yhat

// residuals-versus-fitted values, plus LOESS

sc r yhat, mlab(countrycode) yline(0) ms(i) mlabp(0) || lowess r yhat, ///
    name(residuals_loess, replace)

// linear fit with residually weighted points

sc fr loggdpc if abs(r) > .3 [w = abs(r)], ms(O) mc(gs14) mfc(gs12) || ///
    lfit fr loggdpc || ///
    sc fr loggdpc, ms(i) mlab(countrycode) mlabc(gs6) mlabp(0) legend(off) ///
    name(residuals_rvf, replace)

Last, a map of the residuals can also be informative if there is suspicion of spatial dependence in the error term:

// package dependencies

ssc install spmap

// map of residuals (caution with intervals)

merge 1:1 countrycode using world-d, keep(match master) gen(mapmerge)
spmap r using world-c, id(_ID) clmethod(boxplot) ///
  fcolor(RdYlGn) ndocolor(gs12) ndfcolor(gs14) ocolor(none ..) ///
  legstyle(1) legend(ring(1) pos(3)) ///
  name(residuals_map, replace)

Country-level data is an ideal candidate for plot tweaks such as using marker labels instead of observations. With survey data, there would be more work to do at the level of the data itsef, and text labels would have to be taken from aggregate measures like relative frequencies or averages, which makes it more complex to plot the data quickly and efficiently.

  1. srqm posted this
A blog companion to a bunch of courses on quantitative methods.


view archive