King's College London

Research portal

Generation and evaluation of artificial mental health records for Natural Language Processing

Research output: Contribution to journalArticle

Standard

Generation and evaluation of artificial mental health records for Natural Language Processing. / Ive, Julia ; Viani, Natalia; Kam, Joyce; Yin, Lucia; Verma, Somain; Puntis , Stephen; Cardinal, Rudolf; Roberts, Angus; Stewart, Robert; Velupillai, Sumithra.

In: npj Digital Medicine, 14.05.2020.

Research output: Contribution to journalArticle

Harvard

Ive, J, Viani, N, Kam, J, Yin, L, Verma, S, Puntis , S, Cardinal, R, Roberts, A, Stewart, R & Velupillai, S 2020, 'Generation and evaluation of artificial mental health records for Natural Language Processing', npj Digital Medicine. https://doi.org/10.1038/s41746-020-0267-x

APA

Ive, J., Viani, N., Kam, J., Yin, L., Verma, S., Puntis , S., ... Velupillai, S. (2020). Generation and evaluation of artificial mental health records for Natural Language Processing. npj Digital Medicine. https://doi.org/10.1038/s41746-020-0267-x

Vancouver

Ive J, Viani N, Kam J, Yin L, Verma S, Puntis S et al. Generation and evaluation of artificial mental health records for Natural Language Processing. npj Digital Medicine. 2020 May 14. https://doi.org/10.1038/s41746-020-0267-x

Author

Ive, Julia ; Viani, Natalia ; Kam, Joyce ; Yin, Lucia ; Verma, Somain ; Puntis , Stephen ; Cardinal, Rudolf ; Roberts, Angus ; Stewart, Robert ; Velupillai, Sumithra. / Generation and evaluation of artificial mental health records for Natural Language Processing. In: npj Digital Medicine. 2020.

Bibtex Download

@article{bc15bd2ae23746cf9666e12c82ff28a1,
title = "Generation and evaluation of artificial mental health records for Natural Language Processing",
abstract = "A serious obstacle to the development of Natural Language Processing (NLP) methods in the clinical domain is the accessibility of textual data. The mental health domain is particularly challenging, partly because clinical documentation relies heavily on free text that is difficult to de-identify completely. This problem could be tackled by using artificial medical data. In this work, we present an approach to generate artificial clinical documents. We apply this approach to discharge summaries from a large mental healthcare provider and discharge summaries from an intensive care unit. We perform an extensive intrinsic evaluation where we (1) apply several measures of text preservation; (2) measure how much the model memorises training data; and (3) estimate clinical validity of the generated text based on a human evaluation task. Furthermore, we perform an extrinsic evaluation by studying the impact of using artificial text in a downstream NLP text classification task. We found that using this artificial data as training data can lead to classification results that are comparable to the original results. Additionally, using only a small amount of information from the original data to condition the generation of the artificial data is successful, which holds promise for reducing the risk of these artificial data retaining rare information from the original data. This is an important finding for our long-term goal of being able to generate artificial clinical data that can be released to the wider research community and accelerate advances in developing computational methods that use healthcare data.",
keywords = "Natural Language Processing, Computational methods, artificial medical data, mental health, textual data",
author = "Julia Ive and Natalia Viani and Joyce Kam and Lucia Yin and Somain Verma and Stephen Puntis and Rudolf Cardinal and Angus Roberts and Robert Stewart and Sumithra Velupillai",
year = "2020",
month = "5",
day = "14",
doi = "10.1038/s41746-020-0267-x",
language = "English",
journal = "npj Digital Medicine",
issn = "2398-6352",
publisher = "Nature Publishing Group",

}

RIS (suitable for import to EndNote) Download

TY - JOUR

T1 - Generation and evaluation of artificial mental health records for Natural Language Processing

AU - Ive, Julia

AU - Viani, Natalia

AU - Kam, Joyce

AU - Yin, Lucia

AU - Verma, Somain

AU - Puntis , Stephen

AU - Cardinal, Rudolf

AU - Roberts, Angus

AU - Stewart, Robert

AU - Velupillai, Sumithra

PY - 2020/5/14

Y1 - 2020/5/14

N2 - A serious obstacle to the development of Natural Language Processing (NLP) methods in the clinical domain is the accessibility of textual data. The mental health domain is particularly challenging, partly because clinical documentation relies heavily on free text that is difficult to de-identify completely. This problem could be tackled by using artificial medical data. In this work, we present an approach to generate artificial clinical documents. We apply this approach to discharge summaries from a large mental healthcare provider and discharge summaries from an intensive care unit. We perform an extensive intrinsic evaluation where we (1) apply several measures of text preservation; (2) measure how much the model memorises training data; and (3) estimate clinical validity of the generated text based on a human evaluation task. Furthermore, we perform an extrinsic evaluation by studying the impact of using artificial text in a downstream NLP text classification task. We found that using this artificial data as training data can lead to classification results that are comparable to the original results. Additionally, using only a small amount of information from the original data to condition the generation of the artificial data is successful, which holds promise for reducing the risk of these artificial data retaining rare information from the original data. This is an important finding for our long-term goal of being able to generate artificial clinical data that can be released to the wider research community and accelerate advances in developing computational methods that use healthcare data.

AB - A serious obstacle to the development of Natural Language Processing (NLP) methods in the clinical domain is the accessibility of textual data. The mental health domain is particularly challenging, partly because clinical documentation relies heavily on free text that is difficult to de-identify completely. This problem could be tackled by using artificial medical data. In this work, we present an approach to generate artificial clinical documents. We apply this approach to discharge summaries from a large mental healthcare provider and discharge summaries from an intensive care unit. We perform an extensive intrinsic evaluation where we (1) apply several measures of text preservation; (2) measure how much the model memorises training data; and (3) estimate clinical validity of the generated text based on a human evaluation task. Furthermore, we perform an extrinsic evaluation by studying the impact of using artificial text in a downstream NLP text classification task. We found that using this artificial data as training data can lead to classification results that are comparable to the original results. Additionally, using only a small amount of information from the original data to condition the generation of the artificial data is successful, which holds promise for reducing the risk of these artificial data retaining rare information from the original data. This is an important finding for our long-term goal of being able to generate artificial clinical data that can be released to the wider research community and accelerate advances in developing computational methods that use healthcare data.

KW - Natural Language Processing

KW - Computational methods

KW - artificial medical data

KW - mental health

KW - textual data

U2 - 10.1038/s41746-020-0267-x

DO - 10.1038/s41746-020-0267-x

M3 - Article

JO - npj Digital Medicine

JF - npj Digital Medicine

SN - 2398-6352

ER -

View graph of relations

© 2018 King's College London | Strand | London WC2R 2LS | England | United Kingdom | Tel +44 (0)20 7836 5454