Joining the Stata Bloggers aggregator: An opening post

A small subset of posts from this Tumblr log has been reformatted into this longer piece and submitted for inclusion into Francis Smart’s recently created aggregator for Stata blogs, Stata Bloggers. This post is an announcement message that explains the whys, hows and what of that operation.

First, why this blog? I am using Tumblr to run two course companions on health policy and data analysis. The latter, SRQM, is the one that you are reading from. It is named after the course “Statistical Reasoning and Quantitative Methods”, which I co-teach in Paris with Ivaylo Petev.

Why do you teach with Stata? Iavylo and I chose Stata for practical reasons: we both knew how to use it, the software was available where we teach the course, and we needed a software that could be taught to large groups of postgraduate beginners. We also teach an optional course with R.

Choosing a statistical software to work with is never an easy choice, but it has recently been made simple for a large category of users, for which the choice should be R. Anthony Damico is right when he jokingly writes the following:

confidential to sas, spss, stata, and sudaan users: the eighties called. they want their statistical languages back. time to transition to r. :D [Anthony also wrote that if Stata has a better learning curve than R, “so do bicycles with training wheels ;)”]

R is also marked by the eighties in many ways, but it has indeed made other statistical software rather obsolete. Its ggplot2 library, for instance, is just much better than Stata graphs. Even if you take the time to tweak them with complex code or alternative colors, Stata plots are often ugly.

Stata yet remains a good choice for those who are learning statistical analysis next to other things and therefore have limited time to learn programming. It is quick, cheap enough for universities, and copes well with large surveys. It is also easily scriptable and open to user contributions.

For these reasons, Stata has a good user base among academics, especially in sociology, economics and political science. Nate Silver also uses it. There’s great documentation for Stata, in English as well as in different languages, like this page in Lithuanian.

There’s more cool things about Stata. The World Bank has an awesome Stata package to download its data. Its syntax is even supported by a few plain text editors like TextMate, thanks to Tim Beatty and Phil Schumm (now on GitHub), and it might get ported to the Pygments engine used at GitHub and elsewhere.

The real trouble with Stata might actually lie with the overwhelming dominance of “regression quants” in its user base. Regression analysis curbs how you think towards net effects, which is not necessarily what you need. I will probably come back to this in later posts.

Why aggregate this blog? For some time, I have been hoping to connect this blog and course to a larger community of Stata users. Neither have ever been advertised to Statalist, but the blog has gained a small readership, and the course is also public thanks to its hosting as a GitHub repository.

How is this blog aggregated? Blog aggregators work by making use of RSS feeds, which are a handy way to syndicate a website’s content. Most blogging engines offer at least one blog-wide feed. Tumblr also offers hidden tag-specific feeds. This post starts a series that will be tagged stata.