Issues in Computer Vision Data Collection: Bias, Consent, and Label Taxonomy

Dulhanty, Chris

Issues in Computer Vision Data Collection: Bias, Consent, and Label Taxonomy

Files

Dulhanty_Chris.pdf (1.88 MB)

Date

2020-09-30

Authors

Dulhanty, Chris

Advisor

Wong, Alexander
Clausi, David

Publisher

University of Waterloo

Abstract

Recent success of the convolutional neural network in image classification has pushed the computer vision community towards data-rich methods of deep learning. As a consequence of this shift, the data collection process has had to adapt, becoming increasingly automated and efficient to satisfy algorithms that require massive amounts of data. In the push for more data, however, careful consideration into decisions and assumptions in the data collection process have been neglected. Likewise, users accept datasets and their embed- ded assumptions at face-value, employing them in theory and application papers without scrutiny. As a result, undesirable biases, non-consensual data collection, and inappropriate label taxonomies are rife in computer vision datasets. This work aims to explore issues of bias, consent, and label taxonomy in computer vision through novel investigations into widely-used datasets in image classification, face recognition, and facial expression recognition. Through this work, I aim to challenge researchers to reconsider normative data collection and use practices such that computer vision systems can be developed in a more thoughtful and responsible manner.