TY - CHAP
T1 - Automated ranking of chest x-ray radiological finding severity in a binary label setting
AU - Macpherson, Matthew
AU - Muthuswamy, Keerthini
AU - Amlani, Ashik
AU - Goh, Vicky
AU - Montana, Giovanni
N1 - Publisher Copyright:
© 2024 CC-BY 4.0, M. Macpherson, K. Muthuswamy, A. Amlani, V. Goh & G. Montana.
PY - 2024
Y1 - 2024
N2 - Machine learning has demonstrated the ability to match or exceed human performance in detecting a range of abnormalities in chest x-rays. However, current models largely operate within a binary classification paradigm using fixed decision thresholds, whereas many clinical findings can be more usefully described on a scale of severity which a skilled radiologist will incorporate into a more nuanced report. This limitation is due, in part, to the difficulty and expense of manually annotating fine-grained labels for training and test images versus the relative ease of automatically extracting binary labels from the associated free text reports using NLP algorithms. In this paper we examine the ability of models trained with only binary training data to give useful abnormality severity information from their raw outputs. We assess performance using manually ranked test sets for each of five findings: cardiomegaly, consolidation, paratracheal hilar changes, pleural effusion and subcutaneous emphysema. We find the raw model output predicts human-assessed severity ranking with Spearman’s rank coefficients between 0.563 - 0.848. Using patient age as an additional variable with full ground truth ranking available, we compare a binary classifier output against a fully supervised RankNet model, quantifying the increase in training data required for equivalent performance. We show that model performance is improved using a semi-supervised approach supplementing a smaller set of fully supervised images with a larger set of binary labelled images.
AB - Machine learning has demonstrated the ability to match or exceed human performance in detecting a range of abnormalities in chest x-rays. However, current models largely operate within a binary classification paradigm using fixed decision thresholds, whereas many clinical findings can be more usefully described on a scale of severity which a skilled radiologist will incorporate into a more nuanced report. This limitation is due, in part, to the difficulty and expense of manually annotating fine-grained labels for training and test images versus the relative ease of automatically extracting binary labels from the associated free text reports using NLP algorithms. In this paper we examine the ability of models trained with only binary training data to give useful abnormality severity information from their raw outputs. We assess performance using manually ranked test sets for each of five findings: cardiomegaly, consolidation, paratracheal hilar changes, pleural effusion and subcutaneous emphysema. We find the raw model output predicts human-assessed severity ranking with Spearman’s rank coefficients between 0.563 - 0.848. Using patient age as an additional variable with full ground truth ranking available, we compare a binary classifier output against a fully supervised RankNet model, quantifying the increase in training data required for equivalent performance. We show that model performance is improved using a semi-supervised approach supplementing a smaller set of fully supervised images with a larger set of binary labelled images.
KW - Chest x-ray
KW - ranking
KW - severity assessment
KW - weakly supervised
UR - http://www.scopus.com/inward/record.url?scp=85216610949&partnerID=8YFLogxK
M3 - Conference paper
AN - SCOPUS:85216610949
VL - 250
T3 - Proceedings of Machine Learning Research
SP - 949
EP - 963
BT - Automated ranking of chest x-ray radiological finding severity in a binary label setting
T2 - 7th International Conference on Medical Imaging with Deep Learning, MIDL 2024
Y2 - 3 July 2024 through 5 July 2024
ER -