Posts tagged "R"

Two little goodies from the R course.

R, Stata and matching additional learning costs

Francis Smart recently pointed to an important difference between R and Stata from a teaching perspective, which has to do with the additional learning costs of vectorization in R over the single-dataset orientation of Stata.

Stata makes it easy to manipulate names, or more specifically, variable names, as in a dataset with three variables for social expenditure called party1 party2 party3. This is common to many empirical preprocessed datasets.

    // example
    mvdecode party*, mv(999)

Furthermore, Stata works like an accountant’s book, so all variables belong to a same data object that never needs to be called beyond loading. This naturally suppresses a lot of possibilities, compensated in part by macros and scalars.

    // example
    loc regressors "age sex"

Macros in particular then branch with loops like the forval and foreach commands to allow more complex data processing. At that level of use, the software is flexible enough for most applied data cleaning.

    // example
    forval i = 1/3 {
      replace socx`i' = socx`i' / 10^6
    }

To access matrix notation, the Stata user needs to move to Mata syntax, while R immediately offers the user to manipulate objects through vectorization. Thinking in these terms is more demanding as there are more possibilities for errors, starting with calls to undeclared objects.

I teach both R and Stata. My experience with social science students is that the additional learning costs of R syntax need to be matched with other benefits to become valuable to them. To me, these benefits lie primordially in the more diverse array of data that R allows to access.

R Notebook with rCharts (by Ramnath Vaidyanathan) — and I suspect that this is only the beginning. Visualization is more and more interesting these days. Hat tip to KJH for linking to the video.

By using Excel, which was never designed for scientific research, they institutionalized mouse clicks and other untraceable actions into a scientific workflow, which must be avoided since it makes explaining to others (and to oneself) how to replicate the findings next to impossible and too easily introduces inadvertent mistakes.

Period. The replication was carried with R, and additional analysis (easily found online) was done with Stata.

Victoria Stodden at What the Reinhart & Rogoff Debacle Really Shows: Verifying Empirical Results Needs to be Routine — The Monkey Cage

From Patrick Burn’s presentation on the R Inferno. Interesting if you want some historical notes about the software.

From Patrick Burn’s presentation on the R Inferno. Interesting if you want some historical notes about the software.

Using R for causal inference in a study of expensive public policy decisions (by Jeromy Anglim, via)

Something’s wrong. The release notes for the last version of Rstudio state that error output is now shown “in distinct color for TextMate theme (and some others)”, so I was hoping for a change here, because students get confused by universal red ink.

Something’s wrong. The release notes for the last version of Rstudio state that error output is now shown “in distinct color for TextMate theme (and some others)”, so I was hoping for a change here, because students get confused by universal red ink.

Something truly impressive is happening to visualization, right now. This is R code, but also d3.js code.

Something truly impressive is happening to visualization, right now. This is R code, but also d3.js code.

THANK YOU R CORE.

THANK YOU R CORE.

“In this article we will show how the models look like, what kind of tools we used to build and visualise those and also providing a demo web application where anyone could compile a similar plot with a decent amount of annotations with a single click” (the straight-to-the-Web version of an exercise that we ran in class — via Olimpic predictions - from an R web service provider’s point of view | rapporter).

“In this article we will show how the models look like, what kind of tools we used to build and visualise those and also providing a demo web application where anyone could compile a similar plot with a decent amount of annotations with a single click” (the straight-to-the-Web version of an exercise that we ran in class — via Olimpic predictions - from an R web service provider’s point of view | rapporter).

I’ve basically worked in data science my whole career. I spent four and a half years as an statistics professor at Rice University, and I’ve recently joined Rstudio as Chief Scientist. I consider myself primarily a tool-builder for data scientistics working in R. I’m interested in tools that reduce the cognitive burden of solving data science problems. I like to figure out good ways to think about problems, then match up cognitive tools with computational tools that make it easy to solve real problems. My work in this area includes ggplot2 for visualisation, plyr for data transformation and reshape2 for data tidying.
Hadley Wickham | Stories on Data Science and Analytics

GSS Tutorial 01 (HD) (by Felipe Osorio)

A blog companion to a bunch of courses on quantitative methods.

twitter.com/politbistro

view archive



About

Software

Map

Suggestions