TY - JOUR
T1 - Highly accurate quantification of allelic gene expression for population and disease genetics
AU - Saukkonen, Anna
AU - Kilpinen, Helena
AU - Hodgkinson, Alan
N1 - Funding Information:
A.S. is supported by the London Interdisciplinary Doctoral (LIDO) Programme funded by the UK Biotechnology and Biological Sciences Research Council (BBSRC, BB/M009513/1). H.K. and A.H. have both received support from Medical Research Council eMedLab Medical Bioinformatics career development awards from the UK Medical Research Council (MR/L016311/1). This work is (partly) funded by the National Institute for Health Research (NIHR) Great Ormond Street Hospital (GOSH) Biomedical Research Centre (BRC). The authors acknowledge use of the research computing facility at King’s College London, Rosalind (https://rosalind.kcl.ac.uk), which is delivered in partnership with the NIHR BRCs at South London & Maudsley and Guy’s & St. Thomas’ NHS Foundation Trusts, and part-funded by capital equipment grants from the Maudsley Charity (award 980) and Guy’s & St. Thomas’ Charity (TR130505). The views expressed are those of the authors and not necessarily those of the National Health Service (NHS), the NIHR, King’s College London, or the Department of Health and Social Care. The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health (NIH), and by the National Cancer Institute (NCI), the National Human Genome Research Institute (NHGRI), the National Heart, Lung, and Blood Institute (NHLBI), the National Institute on Drug Abuse (NIDA), the National Institute of Mental Health (NIMH), and the National Institute of Neurological Disorders and Stroke (NINDS).
Publisher Copyright:
© 2022 Saukkonen et al.
PY - 2022/8
Y1 - 2022/8
N2 - Analysis of allele-specific gene expression (ASE) is a powerful approach for studying gene regulation, particularly when sample sizes are small, such as for rare diseases, or when studying the effects of rare genetic variation. However, detection of ASE events relies on accurate alignment of RNA sequencing reads, where challenges still remain, particularly for reads containing genetic variants or those that align to many different genomic locations. We have developed the Personalised ASE Caller (PAC), a tool that combines multiple steps to improve the quantification of allelic reads, including personalized (i.e., diploid) read alignment with improved allocation of multimapping reads. Using simulated RNA sequencing data, we show that PAC outperforms standard alignment approaches for ASE detection, reducing the number of sites with incorrect biases (>10%) by ∼80% and increasing the number of sites that can be reliably quantified by ∼3%. Applying PAC to real RNA sequencing data from 670 whole-blood samples, we show that genetic regulatory signatures inferred from ASE data more closely match those from population-based methods that are less prone to alignment biases. Finally, we use PAC to characterize cell type–specific ASE events that would be missed by standard alignment approaches, and in doing so identify disease relevant genes that may modulate their effects through the regulation of gene expression. PAC can be applied to the vast quantity of existing RNA sequencing data sets to better understand a wide array of fundamental biological and disease processes.
AB - Analysis of allele-specific gene expression (ASE) is a powerful approach for studying gene regulation, particularly when sample sizes are small, such as for rare diseases, or when studying the effects of rare genetic variation. However, detection of ASE events relies on accurate alignment of RNA sequencing reads, where challenges still remain, particularly for reads containing genetic variants or those that align to many different genomic locations. We have developed the Personalised ASE Caller (PAC), a tool that combines multiple steps to improve the quantification of allelic reads, including personalized (i.e., diploid) read alignment with improved allocation of multimapping reads. Using simulated RNA sequencing data, we show that PAC outperforms standard alignment approaches for ASE detection, reducing the number of sites with incorrect biases (>10%) by ∼80% and increasing the number of sites that can be reliably quantified by ∼3%. Applying PAC to real RNA sequencing data from 670 whole-blood samples, we show that genetic regulatory signatures inferred from ASE data more closely match those from population-based methods that are less prone to alignment biases. Finally, we use PAC to characterize cell type–specific ASE events that would be missed by standard alignment approaches, and in doing so identify disease relevant genes that may modulate their effects through the regulation of gene expression. PAC can be applied to the vast quantity of existing RNA sequencing data sets to better understand a wide array of fundamental biological and disease processes.
UR - http://www.scopus.com/inward/record.url?scp=85137108884&partnerID=8YFLogxK
U2 - 10.1101/gr.276296.121
DO - 10.1101/gr.276296.121
M3 - Article
SN - 1088-9051
VL - 32
SP - 1565
EP - 1572
JO - Genome Research
JF - Genome Research
IS - 8
ER -