TY - JOUR
T1 - Fairness in AI: are deep learning-based CMR segmentation algorithms biased?
AU - Puyol Anton, E
AU - Ruijsink, B
AU - Piechnik, S K
AU - Neubauer, S
AU - Petersen, S E
AU - Razavi, R
AU - King, A P
PY - 2021/10/12
Y1 - 2021/10/12
N2 - Abstract Background/Introduction Artificial intelligence (AI) is providing opportunities to transform cardiovascular medicine. A particular challenge in the application of AI technology is their potential for intrinsic and extrinsic biases, such as those based on gender and/or ethnicity. Unless satisfactorily addressed, these biases could lead to inequalities in early diagnosis, treatments and outcomes. Fairness in AI is a relatively new but fast-growing research field which deals with assessing and addressing potential bias in AI models. Purpose To perform the first analysis that assesses bias in AI-based cardiac MR segmentation models in a large-scale database. Methods A state-of-the-art deep learning (DL) based segmentation network, the “nnU-Net” framework [1], was used for automatic segmentation of both ventricles and the myocardium from cine short-axis cardiac MR over the full cardiac cycle. The dataset used consisted of end-diastole and end-systole short-axis cine cardiac MR images of 5,903 subjects (61.5±7.1 years). The nnU-Net network was trained and evaluated using a 5-fold cross validation (splits: train 60% / validation 20% / test 20%). Data on race and gender were obtained from the UK Biobank database and their distribution is summarized in Figure 1. To assess gender and racial bias in the segmentation network, we compared the Dice scores - which measure the overlap between manual and automatic segmentations – and the absolute error in measurements of biventricular volumes and function between patients grouped by ethnicity and gender. Results Figure 2 shows the Dice scores and the volumetric and functional measures for the full database, stratified by gender and by ethnicity. Results on the overall population showed an excellent agreement between the manual and automatic segmentations which is consistent with previous reported results [2–3]. However, we find statistically significant differences in Dice scores as well as volumetric measures between different ethnicities, showing that the segmentation network is biased against minority racial groups. No significant differences were found in Dice scores between genders. Similarly, for the end diastolic, end systolic volumes and ejection fraction, there were statistically significant differences in absolute error between the overall population and all racial groups except white. Conclusion(s) We have shown, for the first time, that racial bias exists in DL-based cardiac MR segmentation models. Our hypothesis is that this bias is a result of the unbalanced nature of the training data, and this is supported by the results which show that there is racial bias but not gender bias when trained using the UK Biobank database, which is gender-balanced but not race-balanced. In this work we want to highlight the potential issue of bias in DL-based image segmentation models when translating into a clinical environment. Funding Acknowledgement Type of funding sources: Public grant(s) – National budget only. Main funding source(s): - EPSRC- Wellcome EPSRC Centre for Medical Engineering at the School of Biomedical Engineering and Imaging Sciences, King's College London Figure 1Figure 2
AB - Abstract Background/Introduction Artificial intelligence (AI) is providing opportunities to transform cardiovascular medicine. A particular challenge in the application of AI technology is their potential for intrinsic and extrinsic biases, such as those based on gender and/or ethnicity. Unless satisfactorily addressed, these biases could lead to inequalities in early diagnosis, treatments and outcomes. Fairness in AI is a relatively new but fast-growing research field which deals with assessing and addressing potential bias in AI models. Purpose To perform the first analysis that assesses bias in AI-based cardiac MR segmentation models in a large-scale database. Methods A state-of-the-art deep learning (DL) based segmentation network, the “nnU-Net” framework [1], was used for automatic segmentation of both ventricles and the myocardium from cine short-axis cardiac MR over the full cardiac cycle. The dataset used consisted of end-diastole and end-systole short-axis cine cardiac MR images of 5,903 subjects (61.5±7.1 years). The nnU-Net network was trained and evaluated using a 5-fold cross validation (splits: train 60% / validation 20% / test 20%). Data on race and gender were obtained from the UK Biobank database and their distribution is summarized in Figure 1. To assess gender and racial bias in the segmentation network, we compared the Dice scores - which measure the overlap between manual and automatic segmentations – and the absolute error in measurements of biventricular volumes and function between patients grouped by ethnicity and gender. Results Figure 2 shows the Dice scores and the volumetric and functional measures for the full database, stratified by gender and by ethnicity. Results on the overall population showed an excellent agreement between the manual and automatic segmentations which is consistent with previous reported results [2–3]. However, we find statistically significant differences in Dice scores as well as volumetric measures between different ethnicities, showing that the segmentation network is biased against minority racial groups. No significant differences were found in Dice scores between genders. Similarly, for the end diastolic, end systolic volumes and ejection fraction, there were statistically significant differences in absolute error between the overall population and all racial groups except white. Conclusion(s) We have shown, for the first time, that racial bias exists in DL-based cardiac MR segmentation models. Our hypothesis is that this bias is a result of the unbalanced nature of the training data, and this is supported by the results which show that there is racial bias but not gender bias when trained using the UK Biobank database, which is gender-balanced but not race-balanced. In this work we want to highlight the potential issue of bias in DL-based image segmentation models when translating into a clinical environment. Funding Acknowledgement Type of funding sources: Public grant(s) – National budget only. Main funding source(s): - EPSRC- Wellcome EPSRC Centre for Medical Engineering at the School of Biomedical Engineering and Imaging Sciences, King's College London Figure 1Figure 2
KW - Cardiology and Cardiovascular Medicine
U2 - 10.1093/eurheartj/ehab724.3055
DO - 10.1093/eurheartj/ehab724.3055
M3 - Article
SN - 0195-668X
VL - 42
JO - European Heart Journal
JF - European Heart Journal
IS - Supplement_1
ER -