The 944 individuals of the CEPH human genome diversity panel (HGDP–CEPH), a standard sample set of 51 globally distributed populations, were sequenced using the Illumina ForenSeq™ DNA Signature Prep Kit. The ForenSeq™ system is a single multiplex for the MiSeq/FGx™ massively parallel sequencing instrument, comprising: amelogenin, 27 autosomal STRs, 24 Y-STRs, 7 X-STRs, and 94 SNPforID+Kiddlab autosomal ID-SNPs (plus optionally detected ancestry and phenotyping SNP sets). We report in detail the patterns of sequence variation observed in the repeat regions of the 58 forensic STR loci typed by the ForenSeq™ system. Sequence alleles were characterized and repeat region structures annotated by aligning the ForenSeq™ sequence output to the latest GRCh38 human reference sequence, necessitating the reversal and re-alignment of STR allele sequences reported by the Forenseq™ system in 20 of 58 STRs (plus the reverse alleles in two Y-STRs with duplicated-inverted repeat regions). Individual population sample sizes of the HGDP–CEPH panel do not allow reliable inferences to be made about levels of genetic variability in low frequency STR alleles-where particular sequence variants are found in only a few individuals; but we assessed the occurrence of both population-specific sequence variants and singleton observations; finding each of these in a sizeable proportion of HGDP–CEPH samples, with consequences for planning the co-ordinated compilation of sequence variation on a much larger scale than was required before by forensic laboratories now adopting massively parallel sequencing.
- Autosomal STRs
- CEPH Human genome diversity panel
- Massively parallel sequencing