Automating Big Data Cleaning: An Example Using Local Bibliometric Data

Loading...
Thumbnail Image

Date

2017-04-06

Authors

Carson, Jana
Gordon, Shannon

Advisor

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The University of Waterloo recognizes bibliometric data as an important piece of evidence-based research assessment, and recommends bibliometric data as one measure, among many, for capturing research productivity trends, and elements of research impact. Even when working from a basket of measures, bibliometric data remains complex and requires significant cleaning due to issues of name ambiguity. This session will explore an innovative collaboration between the Library and Institutional Analysis and Planning (IAP) to support the integrity of local, discipline-level bibliometric data by automating key data processes of an internal project. This session will introduce how bibliometric data is relevant to the University, the process used to gather and vet local bibliometric data, and the ways in which key data processes have been successfully automated using Python and a database to support efficient reporting. Given known challenges presented by name ambiguity, this collaborative framework makes it possible to support the integrity of local bibliometric data—a key step in supporting this and similar in-demand analyses at the University.

Description

Keywords

Bibliometrics, Research Productivity, Big Data, Data Cleanup, Collaboration, University Partnerships

LC Keywords

Citation