Generating Positive Psychosis Symptom Keywords from Electronic Health Records

Research output: Chapter in Book/Report/Conference proceedingConference paperpeer-review

7 Citations (Scopus)
158 Downloads (Pure)


The development of Natural Language Processing (NLP) solutions for information extraction from electronic health records (EHRs) has grown in recent years, as most clinically relevant information in EHRs is documented only in free text. One of the core tasks for any NLP system is to extract clinically relevant concepts such as symptoms. This information can then be used for more complex problems such as determining symptom onset, which requires temporal information. In the mental health domain, comprehensive vocabularies for specific disorders are scarce, and rarely contain keywords that reflect real-world terminology use. We explore the use of embedding techniques to automatically generate lexical variants of psychosis symptoms into vocabularies, that can be used in complex downstream NLP tasks. We study the impact of the underlying text material on generating useful lexical entries, experimenting with different corpora and with unigram/bigram models. We also propose a method to automatically compute thresholds for choosing the most relevant terms. Our main contribution is a systematic study of unsupervised vocabulary generation using different corpora for an understudied clinical use-case. Resulting lexicons are publicly available.
Original languageEnglish
Title of host publicationArtificial Intelligence in Medicine
Subtitle of host publication17th Conference on Artificial Intelligence in Medicine, AIME 2019, Poznan, Poland, June 26–29, 2019, Proceedings
EditorsDavid Riaño, Szymon Wilk, Annette ten Teije
Number of pages6
Publication statusPublished - 30 May 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11526 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


  • Electronic health records
  • Embedding models
  • Natural language processing
  • Schizophrenia


Dive into the research topics of 'Generating Positive Psychosis Symptom Keywords from Electronic Health Records'. Together they form a unique fingerprint.

Cite this