TY - JOUR
T1 - Text Hunter
T2 - A User Friendly Tool for Extracting Generic Concepts from Free Text in Clinical Research
AU - Jackson, Richard
AU - Ball, Michael
AU - Patel, Rashmi
AU - Hayes, Richard
AU - Dobson, Richard
AU - Stewart, Robert
PY - 2014/11/19
Y1 - 2014/11/19
N2 - Observational research using data from electronic health records (EHR) is a rapidly growing area, which promises
both increased sample size and data richness - therefore unprecedented study power. However, in many medical
domains, large amounts of potentially valuable data are contained within the free text clinical narrative. Manually
reviewing free text to obtain desired information is an inefficient use of researcher time and skill. Previous work has
demonstrated the feasibility of applying Natural Language Processing (NLP) to extract information. However, in
real world research environments, the demand for NLP skills outweighs supply, creating a bottleneck in the
secondary exploitation of the EHR. To address this, we present TextHunter, a tool for the creation of training data,
construction of concept extraction machine learning models and their application to documents. Using confidence
thresholds to ensure high precision (>90%), we achieved recall measurements as high as 99% in real world use
cases.
AB - Observational research using data from electronic health records (EHR) is a rapidly growing area, which promises
both increased sample size and data richness - therefore unprecedented study power. However, in many medical
domains, large amounts of potentially valuable data are contained within the free text clinical narrative. Manually
reviewing free text to obtain desired information is an inefficient use of researcher time and skill. Previous work has
demonstrated the feasibility of applying Natural Language Processing (NLP) to extract information. However, in
real world research environments, the demand for NLP skills outweighs supply, creating a bottleneck in the
secondary exploitation of the EHR. To address this, we present TextHunter, a tool for the creation of training data,
construction of concept extraction machine learning models and their application to documents. Using confidence
thresholds to ensure high precision (>90%), we achieved recall measurements as high as 99% in real world use
cases.
M3 - Conference paper
SN - 1942-597X
SP - 729
EP - 738
JO - Proceedings of the American Medical Informatics Association
JF - Proceedings of the American Medical Informatics Association
ER -