A Comprehensive Empirical Study of Bias Mitigation Methods for Machine Learning Classifiers

Zhenpeng Chen, Jie Zhang, Federica Sarro, Mark Harman

Research output: Contribution to journalArticlepeer-review

Abstract

Software bias is an increasingly important operational concern for software engineers. We present a largescale,
comprehensive empirical study of 17 representative bias mitigation methods for Machine Learning
(ML) classifiers, evaluated with 11 ML performance metrics (e.g., accuracy), 4 fairness metrics, and 20 types of
fairness-performance trade-off assessment, applied to 8 widely-adopted software decision tasks. The empirical
coverage is much more comprehensive, covering the largest numbers of bias mitigation methods, evaluation
metrics, and fairness-performance trade-off measures compared to previous work on this important software
property. We find that (1) the bias mitigation methods significantly decrease ML performance in 53% of the
studied scenarios (ranging between 42%∼66% according to different ML performance metrics); (2) the bias
mitigation methods significantly improve fairness measured by the 4 used metrics in 46% of all the scenarios
(ranging between 24%∼59% according to different fairness metrics); (3) the bias mitigation methods even
lead to decrease in both fairness and ML performance in 25% of the scenarios; (4) the effectiveness of the
bias mitigation methods depends on tasks, models, the choice of protected attributes, and the set of metrics
used to assess fairness and ML performance; (5) there is no bias mitigation method that can achieve the
best trade-off in all the scenarios. The best method that we find outperforms other methods in 30% of the
scenarios. Researchers and practitioners need to choose the bias mitigation method best suited to their intended
application scenario(s).
Original languageEnglish
JournalACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY
Publication statusAccepted/In press - 11 Jan 2023

Fingerprint

Dive into the research topics of 'A Comprehensive Empirical Study of Bias Mitigation Methods for Machine Learning Classifiers'. Together they form a unique fingerprint.

Cite this