King's College London

Research portal

Multi-observer concordance and accuracy of the British Thoracic Society scale and other visual assessment qualitative criteria for solid pulmonary nodule assessment using FDG PET-CT

Research output: Contribution to journalArticlepeer-review

K. Fatania, P.j. Brown, C. Xie, G. Mcdermott, M.e.j. Callister, R. Graham, M. Subesinghe, F.v. Gleeson, A.f. Scarsbrook

Original languageEnglish
Pages (from-to)878.e21-878.e28
JournalClinical Radiology
Issue number11
Accepted/In press24 Jun 2020
PublishedNov 2020


King's Authors


AIM: To compare the interobserver reliability and diagnostic accuracy of the British Thoracic Society (BTS) scale and other visual assessment criteria in the context of 2-[ 18F]-fluoro-2-deoxy-D-glucose (FDG) positron-emission tomography (PET)-computed tomography (CT) evaluation of solid pulmonary nodules (SPNs). MATERIALS AND METHODS: Fifty patients who underwent FDG PET-CT for assessment of a SPN were identified. Seven reporters with varied experience at four centres graded FDG uptake visually using the British Thoracic Society (BTS) four-point scale. Five reporters also scored SPNs according to three- and five-point visual assessment scales and using semi-quantitative assessment (maximum standardised uptake value [SUV max]). Interobserver reliability was assessed with the intra-class correlation coefficient (ICC) and weighted Cohen's kappa (κ). Diagnostic performance was evaluated by receiver operator characteristic (ROC) analysis. RESULTS: Good interobserver reliability was demonstrated with the BTS scale (ICC=0.78, 95% confidence interval [CI]: 0.69–0.85) and five-point scale (ICC=0.78, 95 CI 0.68–0.86), whilst the three-point scale demonstrated moderate reliability (ICC=0.70, 95% CI: 0.59–0.80). Almost perfect agreement was achieved between two consultants (κ=0.85), and substantial agreement between two other consultants (κ=0.78) using the BTS scale. ROC curves for the BTS and five-point scales demonstrated equivalent accuracy (BTS area under the ROC curve [AUC]=0.768; five-point AUC=0.768). SUV max was no more accurate compared to the BTS scale (SUV max AUC=0.794; BTS AUC=0.768, p=0.43). CONCLUSIONS: The BTS scale can be applied reliably by reporters with varied levels of PET-CT reporting experience, across different centres and has a diagnostic performance that is not surpassed by alternative scales.

View graph of relations

© 2020 King's College London | Strand | London WC2R 2LS | England | United Kingdom | Tel +44 (0)20 7836 5454