Posts tagged "piracy"

Torrent data from The Pirate Bay

Torrents are fantastic social objects. They express gift and transgression at the same time. Torrents are not just data but virtual economies between anonymous seeders, some of which free ride as leechers. There are a few papers lying around to say more on the transactional, social aspects of torrent files and file-sharing in general.

Since The Pirate Bay (TPB) is a major outlet of torrents ranked first both in volume of service and saliency in copyright politics, I have always wanted to scrape some data from it. Other researchers have also approached TPB from a survey angle. TPB cofounder Peter Sunde even used TPB survey data in court.

A first dataset of all TPB torrents was released by Fabio Hecht, Thomas Bocek and David Hausheer in February 2008. The data is advertised at Data Mob. It allowed its authors to build a few graphs on how and how much data was being shared at the time.

Just a few minutes ago, I accidentally stumbled upon a new dataset of all TPB torrents, now offered as magnet links.

The author of the dataset/torrent says that he was motivated by a recent article on Torrentfreak, so I will assume that his/her data can be dated from early February 2012, which gives us a four-year window of comparison. The data are perfectly comparable, except for the torrent names, which do not appear in Hecht et al.’s data.

Update, Feb 14, 2012:

The author of the dump has emailed TorrentFreak with details, and has provided the Perl script he used to scrape the data. I have looked at the data quickly, but I lack a formal research framework to proceed with analysis.

For the last 10 years, I’ve tracked the online distribution of Oscar-nominated films, going back to 2003. Using a number of sources (see below for methodology), I’ve compiled a massive spreadsheet, now updated to include 310 films.
Pirating the Oscars 2012: Ten Years of Data -
A blog companion to a bunch of courses on quantitative methods.

view archive