|dc.description.abstract||Ambiguity, the quality of being open to more than one interpretation, permeates our lives. It comes in different forms including linguistic and visual ambiguity, arises for various reasons and gives rise to disagreements among human observers that can be hard or impossible to resolve. As artificial intelligence (AI) is increasingly infused into complex domains of human decision making it is crucial that the underlying AI mechanisms also support a notion of ambiguity. Yet, existing AI approaches typically assume that there is a single correct answer for any given input, lacking mechanisms to incorporate diverse human perspectives in various parts of the AI pipeline, including data labeling, model development and user interface design.
This dissertation aims to shed light on the question of how humans and AI can be effective partners in the presence of ambiguous problems. To address this question, we begin by studying group deliberation as a tool to detect and analyze ambiguous cases in data labeling. We present three case studies that investigate group deliberation in the context of different labeling tasks, data modalities and types of human labeling expertise.
First, we present CrowdDeliberation, an online platform for synchronous group deliberation in novice crowd work, and show how worker deliberation affects resolvability and accuracy in text classification tasks of varying subjectivity. We then translate our findings to the expert domain of medical image classification to demonstrate how imposing additional structure on deliberation arguments can improve the efficiency of the deliberation process without compromising its reliability. Finally, we present CrowdEEG, an online platform for collaborative annotation and deliberation of medical time series data, implementing an asynchronous and highly structured deliberation process. Our findings from an observational study with 36 sleep health professionals help explain how disagreements arise and when they can be resolved through group deliberation.
Beyond investigating group deliberation within data labeling, we also demonstrate how the resulting deliberation data can be used to support both human and artificial intelligence. To this end, we first present results from a controlled experiment with ten medical generalists, suggesting that reading deliberation data from medical specialists significantly improves generalists' comprehension and diagnostic accuracy on difficult patient cases. Second, we leverage deliberation data to simulate and investigate AI assistants that not only highlight ambiguous cases, but also explain the underlying sources of ambiguity to end users in human-interpretable terms. We provide evidence suggesting that this form of ambiguity-aware AI can help end users to triage and trust AI-provided data classifications.
We conclude by outlining the main contributions of this dissertation and directions for future research.||en