TY - JOUR
T1 - Desiderata for the development of next-generation electronic health record phenotype libraries
AU - Chapman, Martin
AU - Mumtaz, Shahzad
AU - Rasmussen, Luke V
AU - Karwath, Andreas
AU - Gkoutos, Georgios V
AU - Gao, Chuang
AU - Thayer, Dan
AU - Pacheco, Jennifer A
AU - Parkinson, Helen
AU - Richesson, Rachel L
AU - Jefferson, Emily
AU - Denaxas, Spiros
AU - Curcin, Vasa
PY - 2021/9/1
Y1 - 2021/9/1
N2 - Background: High-quality phenotype definitions are desirable to enable the extraction of patient cohorts from large electronic health record repositories and are characterized by properties such as portability, reproducibility, and validity. Phenotype libraries, where definitions are stored, have the potential to contribute significantly to the quality of the definitions they host. In this work, we present a set of desiderata for the design of a next-generation phenotype library that is able to ensure the quality of hosted definitions by combining the functionality currently offered by disparate tooling. Methods: A group of researchers examined work to date on phenotype models, implementation, and validation, as well as contemporary phenotype libraries developed as a part of their own phenomics communities. Existing phenotype frameworks were also examined. This work was translated and refined by all the authors into a set of best practices. Results: We present 14 library desiderata that promote high-quality phenotype definitions, in the areas of modelling, logging, validation, and sharing and warehousing. Conclusions: There are a number of choices to be made when constructing phenotype libraries. Our considerations distil the best practices in the field and include pointers towards their further development to support portable, reproducible, and clinically valid phenotype design. The provision of high-quality phenotype definitions enables electronic health record data to be more effectively used in medical domains.
AB - Background: High-quality phenotype definitions are desirable to enable the extraction of patient cohorts from large electronic health record repositories and are characterized by properties such as portability, reproducibility, and validity. Phenotype libraries, where definitions are stored, have the potential to contribute significantly to the quality of the definitions they host. In this work, we present a set of desiderata for the design of a next-generation phenotype library that is able to ensure the quality of hosted definitions by combining the functionality currently offered by disparate tooling. Methods: A group of researchers examined work to date on phenotype models, implementation, and validation, as well as contemporary phenotype libraries developed as a part of their own phenomics communities. Existing phenotype frameworks were also examined. This work was translated and refined by all the authors into a set of best practices. Results: We present 14 library desiderata that promote high-quality phenotype definitions, in the areas of modelling, logging, validation, and sharing and warehousing. Conclusions: There are a number of choices to be made when constructing phenotype libraries. Our considerations distil the best practices in the field and include pointers towards their further development to support portable, reproducible, and clinically valid phenotype design. The provision of high-quality phenotype definitions enables electronic health record data to be more effectively used in medical domains.
UR - http://www.scopus.com/inward/record.url?scp=85116486505&partnerID=8YFLogxK
U2 - https://doi.org/10.1093/gigascience/giab059
DO - https://doi.org/10.1093/gigascience/giab059
M3 - Article
SN - 2047-217X
VL - 10
JO - Gigascience
JF - Gigascience
IS - 9
M1 - giab059
ER -