Quoted text in the mental healthcare electronic record: An analysis of the distribution and content of single-word quotations

Lasantha Jayasinghe*, Sumithra Velupillai, Robert Stewart

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)


Objective To investigate the distribution and content of quoted text within the electronic health records (EHRs) using a previously developed natural language processing tool to generate a database of quotations. Design χ 2 and logistic regression were used to assess the profile of patients receiving mental healthcare for whom quotations exist. K-means clustering using pre-trained word embeddings developed on general discharge summaries and psychosis specific mental health records were used to group one-word quotations into semantically similar groups and labelled by human subjective judgement. Setting EHRs from a large mental healthcare provider serving a geographic catchment area of 1.3 million residents in South London. Participants For analysis of distribution, 33 499 individuals receiving mental healthcare on 30 June 2019 in South London and Maudsley. For analysis of content, 1587 unique lemmatised words, appearing a minimum of 20 times on the database of quotations created on 16 January 2020. Results The strongest individual indicator of quoted text is inpatient care in the preceding 12 months (OR 9.79, 95% CI 7.84 to 12.23). Next highest indicator is ethnicity with those with a black background more likely to have quoted text in comparison to white background (OR 2.20, 95% CI 2.08 to 2.33). Both are attenuated slightly in the adjusted model. Early psychosis intervention word embeddings subjectively produced categories pertaining to: mental illness, verbs, negative sentiment, people/relationships, mixed sentiment, aggression/violence and negative connotation. Conclusions The findings that inpatients and those from a black ethnic background more commonly have quoted text raise important questions around where clinical attention is focused and whether this may point to any systematic bias. Our study also shows that word embeddings trained on early psychosis intervention records are useful in categorising even small subsets of the clinical records represented by one-word quotations.

Original languageEnglish
Article numbere049249
JournalBMJ Open
Issue number12
Publication statusPublished - 30 Dec 2021


  • health informatics
  • mental health
  • psychiatry


Dive into the research topics of 'Quoted text in the mental healthcare electronic record: An analysis of the distribution and content of single-word quotations'. Together they form a unique fingerprint.

Cite this