TY - JOUR
T1 - Machine learning enabled multi-Trust audit of stroke co-morbidities using Natural language Processing
T2 - Machine learning enabled audit of stroke
AU - Shek, Anthony
AU - Jiang, Zhilin
AU - Teo, James
AU - Yeung, Joshua Au
AU - Bhalla, Ajay
AU - Richardson, Mark P.
AU - Mah, Yee
N1 - Funding Information:
J.T. has received research grant funding support from Innovate UK, NHSX, Office of Life Sciences, Bristol‐Meyers Squibb, and Pfizer; speaker honoraria from Bayer, Bristol‐Meyers Squibb, Pfizer, and Goldman Sachs; hospitality from iRhythm Technologies; and copyright fees from Wiley‐Blackwell; and owns public shares in Nvidia, Amazon, and Alphabet.
Funding Information:
A.S. is supported by the Kings Medical Research Trust. Y.M. is supported by an MRC Clinical Academic Research Partnership grant (MR/T005351/1).
Funding Information:
The supporting infrastructure and code base received funding from NIHR Maudsley BRC, Health Data Research UK, UK Research and Innovation, London Medical Imaging & Artificial Intelligence Centre for Value Based Healthcare, Innovate UK, the NIHR Applied Research Collaboration South London, Office of Life Sciences (UK), and NHSX.
Publisher Copyright:
© 2021 The Authors. European Journal of Neurology published by John Wiley & Sons Ltd on behalf of European Academy of Neurology.
PY - 2021/12
Y1 - 2021/12
N2 - Background and purpose: With the increasing adoption of electronic records in the health system, machine learning-enabled techniques offer the opportunity for greater computer-assisted curation of these data for audit and research purposes. In this project, we evaluate the consistency of traditional curation methods used in routine clinical practice against a new machine learning-enabled tool, MedCAT, for the extraction of the stroke comorbidities recorded within the UK's Sentinel Stroke National Audit Programme (SSNAP) initiative. Methods: A total of 2327 stroke admission episodes from three different National Health Service (NHS) hospitals, between January 2019 and April 2020, were included in this evaluation. In addition, current clinical curation methods (SSNAP) and the machine learning-enabled method (MedCAT) were compared against a subsample of 200 admission episodes manually reviewed by our study team. Performance metrics of sensitivity, specificity, precision, negative predictive value, and F1 scores are reported. Results: The reporting of stroke comorbidities with current clinical curation methods is good for atrial fibrillation, hypertension, and diabetes mellitus, but poor for congestive cardiac failure. The machine learning-enabled method, MedCAT, achieved better performances across all four assessed comorbidities compared with current clinical methods, predominantly driven by higher sensitivity and F1 scores. Conclusions: We have shown machine learning-enabled data collection can support existing clinical and service initiatives, with the potential to improve the quality and speed of data extraction from existing clinical repositories. The scalability and flexibility of these new machine-learning tools, therefore, present an opportunity to revolutionize audit and research methods.
AB - Background and purpose: With the increasing adoption of electronic records in the health system, machine learning-enabled techniques offer the opportunity for greater computer-assisted curation of these data for audit and research purposes. In this project, we evaluate the consistency of traditional curation methods used in routine clinical practice against a new machine learning-enabled tool, MedCAT, for the extraction of the stroke comorbidities recorded within the UK's Sentinel Stroke National Audit Programme (SSNAP) initiative. Methods: A total of 2327 stroke admission episodes from three different National Health Service (NHS) hospitals, between January 2019 and April 2020, were included in this evaluation. In addition, current clinical curation methods (SSNAP) and the machine learning-enabled method (MedCAT) were compared against a subsample of 200 admission episodes manually reviewed by our study team. Performance metrics of sensitivity, specificity, precision, negative predictive value, and F1 scores are reported. Results: The reporting of stroke comorbidities with current clinical curation methods is good for atrial fibrillation, hypertension, and diabetes mellitus, but poor for congestive cardiac failure. The machine learning-enabled method, MedCAT, achieved better performances across all four assessed comorbidities compared with current clinical methods, predominantly driven by higher sensitivity and F1 scores. Conclusions: We have shown machine learning-enabled data collection can support existing clinical and service initiatives, with the potential to improve the quality and speed of data extraction from existing clinical repositories. The scalability and flexibility of these new machine-learning tools, therefore, present an opportunity to revolutionize audit and research methods.
KW - Stroke
KW - Natural language processing (NLP)
KW - clinical coding
KW - Program evaluation
UR - http://www.scopus.com/inward/record.url?scp=85113820624&partnerID=8YFLogxK
U2 - 10.1111/ene.15071
DO - 10.1111/ene.15071
M3 - Article
SN - 1351-5101
VL - 28
SP - 4090
EP - 4097
JO - European Journal of Neurology
JF - European Journal of Neurology
IS - 12
ER -