Abstract
Evaluation of predictive deep learning (DL) models beyond conventional performance metrics has become increasingly important for applications in sensitive environments like healthcare. Such models might have the capability to analyse and predict from large sets of data but they often do not provide information about the certainty of their predictions. Furthermore, when (un)certainty information is provided, it is often poorly calibrated, meaning that the confidence/certainty does not correspond to the predictive accuracy. These factors combined mean that many DL models that are intended for use in a decisionsupport setting are not yet trusted by clinical end-users. The primary aim of this thesis is to develop tools that will improve clinical trust in AI, with a specific focus on cardiology.The literature has recently focused on quantifying uncertainty in DL, but relatively little attention has been given to the inclusion of uncertainty/confidence estimates when training such models. This represents a missed opportunity. If the aim is to produce a well-calibrated model it seems natural to include such an objective during training, rather than purely focusing on prediction accuracy. Considering this, a key hypothesis underlying this thesis is that a well-calibrated model will be more likely to promote clinical trust in AI tools, and the experimental work is designed to develop AI models that improve calibration as well as accuracy.
First, an experimental investigation was performed to improve calibration by making an AI model more aware of errors in calibration during training. Uncertainty-aware training strategies were utilised to evaluate six different approaches, three of which were novel, on two complex cardiac AI applications. The results showed benefits in terms of both accuracy and calibration of the models based on a wide range of performance metrics. This work also highlighted that different calibration metrics led to different conclusions about which was the optimal model, and hence that no single calibration metric can be universally applicable in all applications.
The next part of the thesis continued the investigation into improving model calibration but focused on an alternative approach of altering the labels used for training. Specifically, a range of approaches to label smoothing were investigated and their impact on model accuracy and calibration was assessed. As in the first piece of work, a realistic and complex cardiac AI application was used to evaluate the methods. The results showed promising signs in terms of calibration performance and highlighted that further work into this type of approach is justified.
The investigation then moved on to focus on evaluating different adaptations to the standard neural network architecture to improve uncertainty/confidence estimation. A range of deterministic uncertainty methods were investigated and evaluated for cardiac AI applications. Furthermore, the combination of such deterministic approaches with uncertainty-aware training was also investigated. The results showed that deterministic uncertainty models typically have better performance in terms of calibration and sometimes accuracy and that some deterministic models also benefited from combination with uncertainty-aware training approaches.
Lastly, the final piece of work focused on the calibration performance of multimodal AI models. Multimodal AI is an emerging area of AI that is likely to have a particular impact on cardiology since cardiologists typically make use of varied data sources when diagnosing cardiovascular disease and planning its treatment. However, the calibration performance of multimodal AI models has yet to be thoroughly investigated. This work aimed to determine the influence on model calibration of different modality fusion strategies. The results represented the first such study into the calibration of multimodal AI in cardiology and provided pointers as to the best fusion strategy to achieve good model calibration.
Overall the work presented in this thesis will contribute valuable insights towards developing methods and evaluation metrics to ensure AI tools are safe and deployable in cardiology and beyond. This would ultimately aid the clinical diagnostic pathway to provide trustworthy and faster medical care to patients.
Date of Award | 1 Jan 2025 |
---|---|
Original language | English |
Awarding Institution |
|
Supervisor | Andrew King (Supervisor), Reza Razavi (Supervisor) & Esther Puyol Anton (Supervisor) |