TY - JOUR
T1 - SemEHR
T2 - A General-purpose Semantic Search System to Surface Semantic Data from Clinical Notes for Tailored Care, Trial Recruitment and Clinical Research
AU - Wu, Honghan
AU - Toti, Giulia
AU - Morley, Katherine I
AU - Ibrahim, Zina
AU - Folarin, Amos
AU - Jackson, Richard
AU - Kartoglu, Ismail
AU - Agrawal, Asha
AU - Stringer, Clive
AU - Gale, Darren
AU - Gorrell, Genevieve
AU - Roberts, Angus
AU - Broadbent, Matthew
AU - Stewart, Robert
AU - Dobson, Richard JB
PY - 2018/5
Y1 - 2018/5
N2 - Objective: Unlocking the data contained within both structured and unstructured components of Electronic Health Records (EHRs) has the potential to provide a step change in data available for secondary research use, generation of actionable medical insights, hospital management and trial recruitment. To achieve this, we implemented SemEHR - a semantic search and analytics, open source tool for EHRs. Methods: SemEHR implements a generic information extraction (IE) and retrieval infrastructure by identifying contextualised mentions of a wide range of biomedical concepts within EHRs. Natural Language Processing (NLP) annotations are further assembled at patient level and extended with EHR-specific knowledge to generate a timeline for each patient. The semantic data is serviced via ontology-based search and analytics interfaces. Results: SemEHR has been deployed to a number of UK hospitals including the Clinical Record Interactive Search (CRIS), an anonymised replica of the EHR of the UK South London and Maudsley (SLaM) NHS Foundation Trust, one of Europes largest providers of mental health services. In two CRIS-based studies, SemEHR achieved 93% (Hepatitis C case) and 99% (HIV case) F-Measure results in identifying true positive patients. At King’s College Hospital in London, as part of the CogStack programme (github.com/cogstack), SemEHR is being used to recruit patients into the UK Department of Health 100k Genome Project (genomicsengland.co.uk). The validation study suggests that the tool can validate previously recruited cases and is very fast in searching phenotypes - time for recruitment criteria checking reduced from days to minutes. Validated on an open intensive care EHR data - MIMIC-III, the vital signs extracted by SemEHR can achieve around 97% accuracy.Conclusion: Results from the multiple case studies demonstrate SemEHR’s efficiency - weeks or months of work can be done within hours or minutes in some cases. SemEHR provides a more comprehensive view of a patient, bringing in more and unexpected insight compared to study-oriented bespoke information extraction systems. SemEHR is open source available at https://github.com/CogStack/SemEHR.
AB - Objective: Unlocking the data contained within both structured and unstructured components of Electronic Health Records (EHRs) has the potential to provide a step change in data available for secondary research use, generation of actionable medical insights, hospital management and trial recruitment. To achieve this, we implemented SemEHR - a semantic search and analytics, open source tool for EHRs. Methods: SemEHR implements a generic information extraction (IE) and retrieval infrastructure by identifying contextualised mentions of a wide range of biomedical concepts within EHRs. Natural Language Processing (NLP) annotations are further assembled at patient level and extended with EHR-specific knowledge to generate a timeline for each patient. The semantic data is serviced via ontology-based search and analytics interfaces. Results: SemEHR has been deployed to a number of UK hospitals including the Clinical Record Interactive Search (CRIS), an anonymised replica of the EHR of the UK South London and Maudsley (SLaM) NHS Foundation Trust, one of Europes largest providers of mental health services. In two CRIS-based studies, SemEHR achieved 93% (Hepatitis C case) and 99% (HIV case) F-Measure results in identifying true positive patients. At King’s College Hospital in London, as part of the CogStack programme (github.com/cogstack), SemEHR is being used to recruit patients into the UK Department of Health 100k Genome Project (genomicsengland.co.uk). The validation study suggests that the tool can validate previously recruited cases and is very fast in searching phenotypes - time for recruitment criteria checking reduced from days to minutes. Validated on an open intensive care EHR data - MIMIC-III, the vital signs extracted by SemEHR can achieve around 97% accuracy.Conclusion: Results from the multiple case studies demonstrate SemEHR’s efficiency - weeks or months of work can be done within hours or minutes in some cases. SemEHR provides a more comprehensive view of a patient, bringing in more and unexpected insight compared to study-oriented bespoke information extraction systems. SemEHR is open source available at https://github.com/CogStack/SemEHR.
UR - http://www.scopus.com/inward/record.url?scp=85052641162&partnerID=8YFLogxK
U2 - 10.1093/jamia/ocx160
DO - 10.1093/jamia/ocx160
M3 - Article
SN - 1527-974X
VL - 25
SP - 530
EP - 537
JO - Journal of the American Medical Informatics Association : JAMIA
JF - Journal of the American Medical Informatics Association : JAMIA
IS - 5
ER -