REACT: REcourse Analysis with Counterfactuals and Explanation Tables
Loading...
Date
Authors
Advisor
Golab, Lukasz
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
Machine learning models often exhibit not only explicit bias–unequal performance metrics across subgroups–but also implicit bias, where altering a model’s prediction is disproportionately difficult across subgroups. In this work, we investigate two complementary approaches to analyze ways to overturn a model's decision to achieve a desired label: modifying test input features and unlearning a set of training samples.
The novelty of our solution lies in combining these two methods with data summarization via informative rule mining that highlights biased subgroups.
We demonstrate the value of REACT by allowing users to detect a model’s implicit bias and compare the biases of different model versions. The resulting framework is flexible, supporting the definition of practical constraints on feature-level interventions–for example, by limiting changes to modifiable attributes.