Automating Big Data Cleaning: An Example Using Local Bibliometric Data
MetadataShow full item record
The University of Waterloo recognizes bibliometric data as an important piece of evidence-based research assessment, and recommends bibliometric data as one measure, among many, for capturing research productivity trends, and elements of research impact. Even when working from a basket of measures, bibliometric data remains complex and requires significant cleaning due to issues of name ambiguity. This session will explore an innovative collaboration between the Library and Institutional Analysis and Planning (IAP) to support the integrity of local, discipline-level bibliometric data by automating key data processes of an internal project. This session will introduce how bibliometric data is relevant to the University, the process used to gather and vet local bibliometric data, and the ways in which key data processes have been successfully automated using Python and a database to support efficient reporting. Given known challenges presented by name ambiguity, this collaborative framework makes it possible to support the integrity of local bibliometric data—a key step in supporting this and similar in-demand analyses at the University.
Cite this version of the work
Jana Carson, Shannon Gordon (2017). Automating Big Data Cleaning: An Example Using Local Bibliometric Data. UWSpace. http://hdl.handle.net/10012/12333