An Open-Source Strategy for Documenting Events: The Case Study of the 42nd Canadian Federal Election on Twitter

Ruest, Nick; Milligan, Ian

An Open-Source Strategy for Documenting Events: The Case Study of the 42nd Canadian Federal Election on Twitter

Files

The Code4Lib Journal – An Open-Source Strategy for Documenting Events_ The Case .pdf (876.86 KB)

Date

2016-04-25

Authors

Ruest, Nick

Milligan, Ian

Publisher

Code4Lib

Abstract

This article examines the tools, approaches, collaboration, and findings of the Web Archives for Historical Research Group around the capture and analysis of about 4 million tweets during the 2015 Canadian Federal Election. We hope that national libraries and other heritage institutions will find our model useful as they consider how to capture, preserve, and analyze ongoing events using Twitter. While Twitter is not a representative sample of broader society – Pew research shows in their study of US users that it skews young, college-educated, and affluent (above $50,000 household income) – Twitter still represents an exponential increase in the amount of information generated, retained, and preserved from 'everyday' people. Therefore, when historians study the 2015 federal election, Twitter will be a prime source. On August 3, 2015, the team initiated both a Search API and Stream API collection with twarc, a tool developed by Ed Summers, using the hashtag #elxn42. The hashtag referred to the election being Canada's 42nd general federal election (hence 'election 42' or elxn42). Data collection ceased on November 5, 2015, the day after Justin Trudeau was sworn in as the 42nd Prime Minister of Canada. We collected for a total of 102 days, 13 hours and 50 minutes. To analyze the data set, we took advantage of a number of command line tools, utilities that are available within twarc, twarc-report, and jq. In accordance with the Twitter Developer Agreement & Policy, and after ethical deliberations discussed below, we made the tweet IDs and other derivative data available in a data repository. This allows other people to use our dataset, cite our dataset, and enhance their own research projects by drawing on #elxn42 tweets. Our analytics included: breaking tweet text down by day to track change over time; client analysis, allowing us to see how the scale of mobile devices affected medium interactions; URL analysis, comparing both to Archive-It collections and the Wayback Availability API to add to our understanding of crawl completeness; and image analysis, using an archive of extracted images. Our article introduces our collecting work, ethical considerations, the analysis we have done, and provides a framework for other collecting institutions to do similar work with our off-the-shelf open-source tools. We conclude by ruminating about connecting Twitter archiving with a broader web archiving strategy.

Description

This work is licensed and made available under Creative Commons Attribution 3.0 United States license. Article first appeared in Code4Lib Journal, issue 32, 2016-04-25, Original available here http://journal.code4lib.org/articles/11358

Keywords

Canadian federal election, Tweet analysis, Twitter, Twarc, Historical research, Twitter archiving

URI

http://journal.code4lib.org/articles/11358
http://hdl.handle.net/10012/11747

Collections

Waterloo Research
History

Creative Commons license

Except where otherwise noted, this item's license is described as Attribution 3.0 United States

Full item page

An Open-Source Strategy for Documenting Events: The Case Study of the 42nd Canadian Federal Election on Twitter

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By

Creative Commons license