Abstract
Software bias is an increasingly important operational concern for software engineers. We present a largescale,
comprehensive empirical study of 17 representative bias mitigation methods for Machine Learning
(ML) classifiers, evaluated with 11 ML performance metrics (e.g., accuracy), 4 fairness metrics, and 20 types of
fairness-performance trade-off assessment, applied to 8 widely-adopted software decision tasks. The empirical
coverage is much more comprehensive, covering the largest numbers of bias mitigation methods, evaluation
metrics, and fairness-performance trade-off measures compared to previous work on this important software
property. We find that (1) the bias mitigation methods significantly decrease ML performance in 53% of the
studied scenarios (ranging between 42%∼66% according to different ML performance metrics); (2) the bias
mitigation methods significantly improve fairness measured by the 4 used metrics in 46% of all the scenarios
(ranging between 24%∼59% according to different fairness metrics); (3) the bias mitigation methods even
lead to decrease in both fairness and ML performance in 25% of the scenarios; (4) the effectiveness of the
bias mitigation methods depends on tasks, models, the choice of protected attributes, and the set of metrics
used to assess fairness and ML performance; (5) there is no bias mitigation method that can achieve the
best trade-off in all the scenarios. The best method that we find outperforms other methods in 30% of the
scenarios. Researchers and practitioners need to choose the bias mitigation method best suited to their intended
application scenario(s).
comprehensive empirical study of 17 representative bias mitigation methods for Machine Learning
(ML) classifiers, evaluated with 11 ML performance metrics (e.g., accuracy), 4 fairness metrics, and 20 types of
fairness-performance trade-off assessment, applied to 8 widely-adopted software decision tasks. The empirical
coverage is much more comprehensive, covering the largest numbers of bias mitigation methods, evaluation
metrics, and fairness-performance trade-off measures compared to previous work on this important software
property. We find that (1) the bias mitigation methods significantly decrease ML performance in 53% of the
studied scenarios (ranging between 42%∼66% according to different ML performance metrics); (2) the bias
mitigation methods significantly improve fairness measured by the 4 used metrics in 46% of all the scenarios
(ranging between 24%∼59% according to different fairness metrics); (3) the bias mitigation methods even
lead to decrease in both fairness and ML performance in 25% of the scenarios; (4) the effectiveness of the
bias mitigation methods depends on tasks, models, the choice of protected attributes, and the set of metrics
used to assess fairness and ML performance; (5) there is no bias mitigation method that can achieve the
best trade-off in all the scenarios. The best method that we find outperforms other methods in 30% of the
scenarios. Researchers and practitioners need to choose the bias mitigation method best suited to their intended
application scenario(s).
Original language | English |
---|---|
Article number | 106 |
Journal | ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY |
Volume | 32 |
Issue number | 4 |
Early online date | 27 May 2023 |
DOIs | |
Publication status | Published - 27 May 2023 |