REACT: REcourse Analysis with Counterfactuals and Explanation Tables

Loading...
Thumbnail Image

Advisor

Golab, Lukasz

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

Machine learning models often exhibit not only explicit bias–unequal performance metrics across subgroups–but also implicit bias, where altering a model’s prediction is disproportionately difficult across subgroups. In this work, we investigate two complementary approaches to analyze ways to overturn a model's decision to achieve a desired label: modifying test input features and unlearning a set of training samples. The novelty of our solution lies in combining these two methods with data summarization via informative rule mining that highlights biased subgroups. We demonstrate the value of REACT by allowing users to detect a model’s implicit bias and compare the biases of different model versions. The resulting framework is flexible, supporting the definition of practical constraints on feature-level interventions–for example, by limiting changes to modifiable attributes.

Description

Keywords

LC Subject Headings

Citation