Abstract
BACKGROUND AND OBJECTIVE: Radiological reports typically summarize the content and interpretation of imaging studies in unstructured form that precludes quantitative analysis. This limits the monitoring of radiological services to throughput undifferentiated by content, impeding specific, targeted operational optimization. Here we present Neuradicon, a natural language processing (NLP) framework for quantitative analysis of neuroradiological reports.
METHODS: Our framework is a hybrid of rule-based and machine-learning models to represent neurological reports in succinct, quantitative form optimally suited to operational guidance. These include probabilistic models for text classification and tagging tasks, alongside auto-encoders for learning latent representations and statistical mapping of the latent space.
RESULTS: We demonstrate the application of Neuradicon to operational phenotyping of a corpus of 336,569 reports, and report excellent generalizability across time and two independent healthcare institutions. In particular, we report pathology classification metrics with f1-scores of 0.96 on prospective data, and semantic means of interrogating the phenotypes surfaced via latent space representations.
CONCLUSION: Neuradicon allows the segmentation, analysis, classification, representation and interrogation of neuroradiological reports structure and content. It offers a blueprint for the extraction of rich, quantitative, actionable signals from unstructured text data in an operational context.
Original language | English |
---|---|
Article number | 108638 |
Pages (from-to) | 108638 |
Journal | Computer Methods and Programs in Biomedicine |
Volume | 262 |
Early online date | 12 Feb 2025 |
DOIs | |
Publication status | E-pub ahead of print - 12 Feb 2025 |