Posts tagged "writing"

Anatomy of a short research paper

Here’s a real world example of what your final paper for the course might look like. The terminology and modeling skills are more sophisticated than what we teach and use in the course, but this is a difference in degree, not nature.

The paper

In a nutshell, the paper asks whether climate influences human conflict, and finds no significant effect of rainfall on riots in various districts of India. You should be able to sort out the dependent and independent variables from this description. Here’s the abstract in its full glory, with some highlighting by myself:

The paper is a short research note by Heather Sarsons. It was recently presented in Yale, and mentioned at the Social Science Statistics blog by Richard Nielsen. The blog comments indicate that Heather Sarsons is starting a PhD at Harvard, where she also works as a faculty assistant.

Also, everything we told you about replication applies here:

These few words, buried in a frontpage footnote that no one will really care to read except for the colleagues mentioned because it gently (and rightly) strokes their professional ego, are crucial to the proper functioning of a scientific community.

Note: the paper is hard to categorize for a European observer. In the United States, my understanding of current classifications tells me this paper belongs to “comparative politics”. The methodology of “instruments” (read: special status independent variables) used in the paper comes from the vocabulary of econometrics, which is very common in American political research.

The first two sections

The introduction does all the work that you want to do in an introduction. Here are the three first opening paragraphs, with the topic well outlined and the literature concisely summarized to its major elements:

The next two paragraphs summarize results and provide the section outline:

That’s the whole introduction. Nothing more, nothing less. Then there is a “Data” section that immediately starts with a reference to the appendix of the paper (yes, even a 9-page paper can include an appendix):

The “Data” section selectively covers the most important elements of the research design. Everything else gets dumped to the appendix. Here’s the first part of the appendix, which explains how the dependent variable was measured:

The basic elements discussed in class are all here: source, unit of measurement, coverage period, bias. For a continuous variable, you can also describe the central tendency, as the author does for the next measurement (by the way, this variable is also available in the QOG dataset that we use in class):

For a categorical variable, the description is slightly different, using relative frequencies (percentages) to describe the modal category, i.e. the one that represents the most frequent situation:

The “Data” section has one more feature, announced at the end with some descriptive statistics for the main variables of interest. Yes, you guessed it! This is the point where you have to introduce your summary statistics:

The table in the paper is a close analogue to the one that we require you to build for your research project; the additional “Source” column is relevant there, as it would in a paper using mixed sources of data like the QOG dataset:

The next two sections

I won’t cover the estimation framework, results and discussion-aka-conclusion, as you should read it on your own to see what you can get out of it. However, I will come back to that part of the paper after our sessions on regression modeling.

Copypasta: A Stata Primer

Here’s how I grade papers.

My first step is to use Skim for Mac OS X (screenshot) or the default Preview application to leave positive and negative annotations. I do this for all kinds of coursework.


When working on a stats course, my second step is to use Stata to read through do-files and replicate the analysis. When doing so, I check the Stata output and therefore end up reading all tables.

This is why I dislike copypasta in papers. Copypasta is Internet slang for content that gets pasted over and over. Some papers use Stata output in a similar fashion. I usually flag Stata copypasta as such:


When charitable, I explain how to avoid it, usually by sending readers to Section 13 of the Stata Guide, where I cover commands to export results from Stata to portable formats:


Please avoid pasting Stata output, it never looks good enough. Elegant solutions take 20 minutes at most to run flawlessly.

From a recent piece on a much more complex topic. Take a look at the style of wording, paragraph building and use of statistical terminology. The intermesh with other aspects of research design (question, focus, etc.) is important.

From a recent piece on a much more complex topic. Take a look at the style of wording, paragraph building and use of statistical terminology. The intermesh with other aspects of research design (question, focus, etc.) is important.

Political scandal is typically portrayed as the direct result of misbehavior by public officials, but scandal should instead be understood as a widespread elite perception of misbehavior whose occurrence is also influenced by political and media context. I provide a theoretical argument for why the contemporary US presidents should become more vulnerable to scandal as (a) their approval ratings among opposition party identifiers decline and (b) congestion in the news agenda decreases. Using new data and analytical approaches, I find strong empirical support for both claims.
Brendan Nyhan’s latest paper explains how to predict a presidential scandal. Read for an excellent example of writing standards in quantitative methods, and for an interesting take on measuring various political phenomena from quantitative data.