King's College London

Research portal

UKBTools: An R package to manage and query UK Biobank data

Research output: Contribution to journalArticle

Ken B. Hanscombe, Jonathan R.I. Coleman, Matthew Traylor, Cathryn M. Lewis

Original languageEnglish
Article numbere0214311
Pages (from-to)1-6
Number of pages6
JournalPLoS ONE
Issue number5
Early online date31 May 2019
Publication statusE-pub ahead of print - 31 May 2019


King's Authors


Introduction The UK Biobank (UKB) is a resource that includes detailed health-related data on about 500,000 individuals and is available to the research community. However, several obstacles limit immediate analysis of the data: data files vary in format, may be very large, and have numerical codes for column names. Results ukbtools removes all the upfront data wrangling required to get a single dataset for statistical analysis. All associated data files are merged into a single dataset with descriptive column names. The package also provides tools to assist in quality control by exploring the primary demographics of subsets of participants; query of disease diagnoses for one or more individuals, and estimating disease frequency relative to a reference variable; and to retrieve genetic metadata. Conclusion Having a dataset with meaningful variable names, a set of UKB-specific exploratory data analysis tools, disease query functions, and a set of helper functions to explore and write genetic metadata to file, will rapidly enable UKB users to undertake their research.

View graph of relations

© 2018 King's College London | Strand | London WC2R 2LS | England | United Kingdom | Tel +44 (0)20 7836 5454