Background and purpose: With the increasing adoption of electronic records in the health system, machine learning-enabled techniques offer the opportunity for greater computer-assisted curation of these data for audit and research purposes. In this project, we evaluate the consistency of traditional curation methods used in routine clinical practice against a new machine learning-enabled tool, MedCAT, for the extraction of the stroke comorbidities recorded within the UK's Sentinel Stroke National Audit Programme (SSNAP) initiative. Methods: A total of 2327 stroke admission episodes from three different National Health Service (NHS) hospitals, between January 2019 and April 2020, were included in this evaluation. In addition, current clinical curation methods (SSNAP) and the machine learning-enabled method (MedCAT) were compared against a subsample of 200 admission episodes manually reviewed by our study team. Performance metrics of sensitivity, specificity, precision, negative predictive value, and F1 scores are reported. Results: The reporting of stroke comorbidities with current clinical curation methods is good for atrial fibrillation, hypertension, and diabetes mellitus, but poor for congestive cardiac failure. The machine learning-enabled method, MedCAT, achieved better performances across all four assessed comorbidities compared with current clinical methods, predominantly driven by higher sensitivity and F1 scores. Conclusions: We have shown machine learning-enabled data collection can support existing clinical and service initiatives, with the potential to improve the quality and speed of data extraction from existing clinical repositories. The scalability and flexibility of these new machine-learning tools, therefore, present an opportunity to revolutionize audit and research methods.
- Natural language processing (NLP)
- clinical coding
- Program evaluation