Update #1: Now with ~30,000 tweets!
Update #2: Now with ~40,000 tweets!
Update #3: Now with ~50,000 tweets!
You may be following pdftribute.net, which is a great initiative to scrape the #pdftribute links as they stream. I am posting a more comprehensive repository (to be updated regularly) which people can download to start doing their own analyses. (Edit: this one has about 50,000 tweets, whereas I believe pdftribute.net is based off around 10,000, having started the streaming later? Could be wrong.)
Early on, I was watching TweetReach to see how many tweets would show up in an API search. When it hit 1,500 tweets, I started tracking #pdftribute as it streamed as a public service (“twitter dump.csv” in the zip file) and also did an API search for those first 1,500 tweets (“early twitter dump.csv”). In this way, I think I’ve captured almost all the tweets, though may have missed a few of the early ones.
The tracking is still going on. Will continue to update.
If you can extract the links to files from the repository, please email me so I can upload the files themselves!
Thank you all very much for your support!


Hi, i started a scraper, too, collecting only unique urls at pdftribute.loc-com.de. Data is available in JSON at pdftribute.loc-com.de/json and free to use!