TY - CHAP
T1 - Average Calibration Error
T2 - 27th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2024
AU - Barfoot, Theodore
AU - Peraza Herrera, Luis C.Garcia
AU - Glocker, Ben
AU - Vercauteren, Tom
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
PY - 2024
Y1 - 2024
N2 - Deep neural networks for medical image segmentation often produce overconfident results misaligned with empirical observations. Such miscalibration challenges their clinical translation. We propose to use marginal L1 average calibration error (mL1-ACE) as a novel auxiliary loss function to improve pixel-wise calibration without compromising segmentation quality. We show that this loss, despite using hard binning, is directly differentiable, bypassing the need for approximate but differentiable surrogate or soft binning approaches. Our work also introduces the concept of dataset reliability histograms which generalises standard reliability diagrams for refined visual assessment of calibration in semantic segmentation aggregated at the dataset level. Using mL1-ACE, we reduce average and maximum calibration error by 45% and 55% respectively, maintaining a Dice score of 87% on the BraTS 2021 dataset.
AB - Deep neural networks for medical image segmentation often produce overconfident results misaligned with empirical observations. Such miscalibration challenges their clinical translation. We propose to use marginal L1 average calibration error (mL1-ACE) as a novel auxiliary loss function to improve pixel-wise calibration without compromising segmentation quality. We show that this loss, despite using hard binning, is directly differentiable, bypassing the need for approximate but differentiable surrogate or soft binning approaches. Our work also introduces the concept of dataset reliability histograms which generalises standard reliability diagrams for refined visual assessment of calibration in semantic segmentation aggregated at the dataset level. Using mL1-ACE, we reduce average and maximum calibration error by 45% and 55% respectively, maintaining a Dice score of 87% on the BraTS 2021 dataset.
UR - http://www.scopus.com/inward/record.url?scp=85210077605&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-72114-4_14
DO - 10.1007/978-3-031-72114-4_14
M3 - Conference paper
AN - SCOPUS:85210077605
SN - 9783031721137
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 139
EP - 149
BT - Medical Image Computing and Computer Assisted Intervention – MICCAI 2024, 27th International Conference Proceedings
A2 - Linguraru, Marius George
A2 - Dou, Qi
A2 - Feragen, Aasa
A2 - Giannarou, Stamatia
A2 - Glocker, Ben
A2 - Lekadir, Karim
A2 - Schnabel, Julia A.
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 6 October 2024 through 10 October 2024
ER -