Abstract
Causal relations are one of the most important types of information that can be extracted from medical publications. Therefore, the automated extraction of such relations from medical text, and the development of ontologies for representing them, are active fields of research. Causal relation extraction is typically decomposed into causal sentence detection, entity recognition, and relation extraction. This study addresses the entity recognition sub-task, which remains largely unsolved since existing ontological models do not capture the various entities involved in causal relations, and datasets annotated with such entities are missing. Therefore, here we propose MediCause, an ontological model for entities involved in causal relations, and a novel dataset using it to annotate 1,202 causal sentences from existing datasets. We evaluate MediCause by training various BERT models that can recognize and label the entities in unseen texts, and we find that a BioBERT-large model fine-tuned with our dataset is the best model at this task (macro-averaged F1-score of 0.844). We also use MediCause to annotate entities in causal sentences from unseen, recent publications, and have experts evaluate them with encouraging results.
Original language | English |
---|---|
Pages (from-to) | 1-18 |
Number of pages | 18 |
Journal | CEUR Workshop Proceedings |
Volume | 3184 |
Publication status | Published - 11 Aug 2022 |
Event | 1st International Workshop on Knowledge Graph Generation From Text and the 1st International Workshop on Modular Knowledge, TEXT2KG 2022 and MK 2022 - Hersonissos, Greece Duration: 30 May 2022 → … |
Keywords
- Causal Relation Extraction
- Medicine
- NLP
- Ontology