A Comprehensive Empirical Study of Bias Mitigation Methods for Machine Learning Classifiers

Zhenpeng Chen, Jie Zhang, Federica Sarro, Mark Harman

Research output: Contribution to journalArticlepeer-review

7 Citations (Scopus)


Software bias is an increasingly important operational concern for software engineers. We present a largescale,
comprehensive empirical study of 17 representative bias mitigation methods for Machine Learning
(ML) classifiers, evaluated with 11 ML performance metrics (e.g., accuracy), 4 fairness metrics, and 20 types of
fairness-performance trade-off assessment, applied to 8 widely-adopted software decision tasks. The empirical
coverage is much more comprehensive, covering the largest numbers of bias mitigation methods, evaluation
metrics, and fairness-performance trade-off measures compared to previous work on this important software
property. We find that (1) the bias mitigation methods significantly decrease ML performance in 53% of the
studied scenarios (ranging between 42%∼66% according to different ML performance metrics); (2) the bias
mitigation methods significantly improve fairness measured by the 4 used metrics in 46% of all the scenarios
(ranging between 24%∼59% according to different fairness metrics); (3) the bias mitigation methods even
lead to decrease in both fairness and ML performance in 25% of the scenarios; (4) the effectiveness of the
bias mitigation methods depends on tasks, models, the choice of protected attributes, and the set of metrics
used to assess fairness and ML performance; (5) there is no bias mitigation method that can achieve the
best trade-off in all the scenarios. The best method that we find outperforms other methods in 30% of the
scenarios. Researchers and practitioners need to choose the bias mitigation method best suited to their intended
application scenario(s).
Original languageEnglish
Article number106
Issue number4
Early online date27 May 2023
Publication statusPublished - 27 May 2023


Dive into the research topics of 'A Comprehensive Empirical Study of Bias Mitigation Methods for Machine Learning Classifiers'. Together they form a unique fingerprint.

Cite this