Impact of hospital-specific domain adaptation on BERT-based models to classify neuroradiology reports

Siddharth Agarwal, David Wood, Benjamin A. K. Murray, Yiran Wei, Ayisha Al Busaidi, Sina Kafiabadi, Emily Guilhem, Jeremy Lynch, Matthew Townend, Asif Mazumder, Gareth J Barker, James H Cole, Peter Sasieni, Sebastien Ourselin, Marc Modat, Thomas C. Booth

Research output: Contribution to journalArticlepeer-review

36 Downloads (Pure)

Abstract

Objectives: To determine the effectiveness of hospital-specific domain adaptation through masked language modelling (MLM) on BERT-based models’ performance in classifying neuroradiology reports, and to compare these models with open-source large language models (LLMs). Materials and methods: This retrospective study (2008–2019) utilised 126,556 and 86,032 MRI brain reports from two tertiary hospitals—King’s College Hospital (KCH) and Guys and St Thomas’ Trust (GSTT). Various BERT-based models, including RoBERTa, BioBERT and RadBERT, underwent MLM on unlabelled reports from these centres. The downstream tasks were binary abnormality classification and multi-label classification. Performances of models with and without hospital-specific domain adaptation were compared against each other and LLMs on internal (KCH) and external (GSTT) hold-out test sets. Model performances for binary classification were compared using 2-way and 1-way ANOVA. Results: All models that underwent hospital-specific domain adaptation performed better than their baseline counterparts (all p-values < 0.001). For binary classification, MLM on all available unlabelled reports (194,467 reports) yielded the highest balanced accuracies (KCH: mean 97.0 ± 0.4% (standard deviation), GSTT: 95.5 ± 1.0%), after which no differences between BERT-based models remained (1-way ANOVA, p-values > 0.05). There was a log-linear relationship between the number of reports and performance. LLama-3.0 70B was the best-performing LLM (KCH: 97.1%, GSTT: 94.0%). Multi-label classification demonstrated consistent performance improvements from MLM for all abnormality categories. Conclusion: Hospital-specific domain adaptation should be considered best practice when deploying BERT-based models in new clinical settings. When labelled data is scarce or unavailable, LLMs can serve as a viable alternative, assuming adequate computational power is accessible. Key Points: Question BERT-based models can classify radiology reports, but it is unclear if there is any incremental benefit from additional hospital-specific domain adaptation. Findings Hospital-specific domain adaptation resulted in the highest BERT-based model accuracies and performance scaled log-linearly with the number of reports. Clinical relevance BERT-based models after hospital-specific domain adaptation achieve the best classification results provided sufficient high-quality training labels. When labelled data is scarce, LLMs such as Llama-3.0 70B are a viable alternative provided there are sufficient computational resources.

Original languageEnglish
Article number102391
JournalEuropean Radiology
Early online date17 Mar 2025
DOIs
Publication statusE-pub ahead of print - 17 Mar 2025

Fingerprint

Dive into the research topics of 'Impact of hospital-specific domain adaptation on BERT-based models to classify neuroradiology reports'. Together they form a unique fingerprint.

Cite this