Identification of sequence polymorphisms at 58 STRs and 94 iiSNPs in a Tibetan population using massively parallel sequencing
www.nature.com/scientificreports
OPEN
Identification of sequence
polymorphisms at 58 STRs and 94
iiSNPs in a Tibetan population
using massively parallel sequencing
Dan Peng1,2,3, Yinming Zhang1,2,3, Han Ren1,2,3, Haixia Li1,2, Ran Li1,2, Xuefeng Shen1,2,
Nana Wang1,2, Erwen Huang1,2*, Riga Wu1,2* & Hongyu Sun1,2*
Massively parallel sequencing (MPS) has rapidly become a promising method for forensic DNA typing,
due to its ability to detect a large number of markers and samples simultaneously in a single reaction,
and sequence information can be obtained directly. In the present study, two kinds of forensic genetic
markers, short tandem repeat (STR) and identity-informative single nucleotide polymorphism (iiSNP)
were analyzed simultaneously using ForenSeq DNA Signature Prep Kit, a commercially available
kit on MPS platform. A total of 152 DNA markers, including 27 autosomal STR (A-STR) loci, 24 Y
chromosomal STR (Y-STR) loci, 7 X chromosomal STR (X-STR) loci and 94 iiSNP loci were genotyped
for 107 Tibetan individuals (53 males and 54 females). Compared with length-based STR typing
methods, 112 more A-STR alleles, 41 more Y-STR alleles, and 24 more X-STR alleles were observed
at 17 A-STRs, 9 Y-STRs, and 5 X-STRs using sequence-based approaches. Thirty-nine novel sequence
variations were observed at 20 STR loci. When the flanking regions were also analyzed in addition
to target SNPs at the 94 iiSNPs, 38 more alleles were identified. Our study provided an adequate
genotype and frequencies data of the two types of genetic markers for forensic practice. Moreover, we
also proved that this panel is highly polymorphic and informative in Tibetan population, and should be
efficient in forensic kinship testing and personal identification cases.
Several genetic markers have been introduced to forensic genetics to clarify the problems of kinship analysis and
personal identification. Short tandem repeats (STRs) and single nucleotide polymorphisms (SNPs) are commonly
used genetic markers in present forensic cases1,2.
STRs, usually 2–6 bp in length, are commonly typed with the amplified fragment length polymorphism
(Amp-FLP) strategy combining fluorescently labelled multiplex PCR and capillary electrophoresis (CE)3. Allele
calling can thus be inferred from fragment length by comparison with a locus specific allelic ladder that has been
previously sequenced, where the number of repeat units is distinct2. Thus, each allele is regarded as a lengthbased (LB) allele using this approach. With the advancement of sequencing technologies over the last decade,
the existence of sequence structure variations in alleles with the same length has been u
ncovered4.
SNPs, which could be amplified with smaller amplicons, are bi-allelic genetic markers with lower mutation
rates compared with S TRs5. Several autosomal SNP marker sets and detection methods, such as single-base extension, chip-based microarrays, and allele-specific hybridization arrays, have been developed to compensate for
the relatively weaker discrimination power of single loci caused by the bi-allelic nature of the human g enome5–7.
However, these methods are not widely used in forensic practice due to the requirement of higher DNA inputs
or the limited ability to detect a vast number of SNP loci in a single r eaction8.
Different from detection methods mentioned above, massively parallel sequencing (MPS), also known as nextgeneration sequencing (NGS), provides new technology for forensic genetic marker typing. Numerous markers
and samples can be investigated simultaneously with MPS, and there is no need to consider the problem of the
1
Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun Yat-Sen University, No. 74 Zhongshan Road
II, Guangzhou 510080, Guangdong, People’s Republic of China. 2Guangdong Province Translational Forensic
Medicine Engineering Technology Research Center, Sun Yat-Sen University, Guangzhou 510080, People’s Republic
of China. 3These authors contributed equally: Dan Peng, Yinming Zhang and Han Ren. *email: 812811596@
qq.com; ;
Scientific Reports |
(2020) 10:12225
| https://doi.org/10.1038/s41598-020-69137-1
1
Vol.:(0123456789)
www.nature.com/scientificreports/
size overlapping of amplified fragments or the availability of fluorescent labels as CE method does. For STRs, both
length and sequence data can be achieved; thus, allele calling may be more informative and the allele’s sequence
characteristics are identified, resulting in sequence-based (SB) alleles. For SNPs, not only the target SNPs but also
the variations in the flanking regions can be identified simultaneously, and form the potential microhaplotype9,10.
Thus, more alleles can be identified based on the analysis of full sequences of SNPs.
This new technology puts forward new challenges to researchers. First of all, the immense variable and
complex data produced by MPS platforms is hard to be analysed manually. Meanwhile, the software packages
developed for LB datasets are not efficient anymore. New bioinformatic methods are required to process and
interpret these extensive data. An optimal package for MPS data analysis needs to be accurate, time-saving and
easy to operate. Several packages has been published to make this process convenient for forensic uses, such as
TSSV11, STRait R
azor12,13, STRinNGS14, SEQ Mapper15, FDStools16 et al. Sequencer manufacturers also carried
out supplementary analysis packages to fit for the data produced by their sequencers, such as ForenSeq Universal
Analysis Software17 (UAS, Illumina, San Diego, CA) and Ion Torrent Suite Software Plugins18 (Thermo Fisher
Scientific, South San Francisco, CA).
Moreover, the LB nomenclature of CE method for STR is not suitable for the complex sequence variations
detected by MPS platforms. It is urgent to know how the MPS data should be analysed and reported, what connections do these data have with LB alleles, and how to record and search such datasets in a database4. Some
researchers have tried to answer these q
uestions19,20 but a perfect nomenclature is still under development. A
unified minimal nomenclature of the complex sequences obtained by MPS technologies was recommended by the
International Society for Forensic Genetics (ISFG) in 2 01621,22 to facilitate communication between laboratories
and to make this data backward compatible with LB data produced on CE platform. In early 2019, the STRAND
Working Group was formalized to discuss the expanding and advancing topics of STR sequence n
omenclature23.
24
Quality control of string sequences and alleles has also been suggested by I SFG .
Aiming to facilitate MPS in forensic genetics practice, several commercial and custom STR typing systems
have been developed based on different MPS platforms for different purposes25–28. The ForenSeq DNA Signature
Prep Kit (Illumina, San Diego, CA) is one of the library preparation kits that simultaneously targets the sequences
of Amelogenin, 27 autosomal STRs (A-STRs), 24 Y-ST (...truncated)