King's College London

Research portal

A neural candidate-selector architecture for automatic structured clinical text annotation

Research output: Chapter in Book/Report/Conference proceedingOther chapter contributionpeer-review

Gaurav Singh, Iain J. Marshall, James Thomas, John Shawe-Taylor, Byron C. Wallace

Original languageEnglish
Title of host publicationCIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management
PublisherAssociation for Computing Machinery
Number of pages10
VolumePart F131841
ISBN (Electronic)9781450349185
Accepted/In press5 Aug 2017
E-pub ahead of print6 Nov 2017
PublishedNov 2017
Event26th ACM International Conference on Information and Knowledge Management, CIKM 2017 - Singapore, Singapore
Duration: 6 Nov 201710 Nov 2017


Conference26th ACM International Conference on Information and Knowledge Management, CIKM 2017


King's Authors


We consider the task of automatically annotating free texts describing clinical trials with concepts from a controlled, structured medical vocabulary. Specifically, we aim to build a model to infer distinct sets of (ontological) concepts describing complementary clinically salient aspects of the underlying trials: the populations enrolled, the interventions administered and the outcomes measured, i.e., the PICO elements. This important practical problem poses a few key challenges. One issue is that the output space is vast, because the vocabulary comprises many unique concepts. Compounding this problem, annotated data in this domain is expensive to collect and hence sparse. Furthermore, the outputs (sets of concepts for each PICO element) are correlated: specific populations (e.g., diabetics) will render certain intervention concepts likely (insulin therapy) while effectively precluding others (radiation therapy). Such correlations should be exploited. We propose a novel neural model that addresses these challenges. We introduce a Candidate-Selector architecture in which the model considers setes of candidate concepts for PICO elements, and assesses their plausibility conditioned on the input text to be annotated. This relies on a 'candidate set' generator, which may be learned or relies on heuristics. A conditional discriminative neural model then jointly selects candidate concepts, given the input text. We compare the predictive performance of our approach to strong baselines, and show that it outperforms them. Finally, we perform a qualitative evaluation of the generated annotations by asking domain experts to assess their quality.

Download statistics

No data available

View graph of relations

© 2020 King's College London | Strand | London WC2R 2LS | England | United Kingdom | Tel +44 (0)20 7836 5454