Identification of sequence polymorphisms at 58 STRs and 94 iiSNPs in a Tibetan population using massively parallel sequencing (pdf)

Article PDF cannot be displayed. You can download it here:

https://www.nature.com/articles/s41598-020-69137-1.pdf

Identification of sequence polymorphisms at 58 STRs and 94 iiSNPs in a Tibetan population using massively parallel sequencing

www.nature.com/scientificreports OPEN Identification of sequence polymorphisms at 58 STRs and 94 iiSNPs in a Tibetan population using massively parallel sequencing Dan Peng1,2,3, Yinming Zhang1,2,3, Han Ren1,2,3, Haixia Li1,2, Ran Li1,2, Xuefeng Shen1,2, Nana Wang1,2, Erwen Huang1,2*, Riga Wu1,2* & Hongyu Sun1,2* Massively parallel sequencing (MPS) has rapidly become a promising method for forensic DNA typing, due to its ability to detect a large number of markers and samples simultaneously in a single reaction, and sequence information can be obtained directly. In the present study, two kinds of forensic genetic markers, short tandem repeat (STR) and identity-informative single nucleotide polymorphism (iiSNP) were analyzed simultaneously using ForenSeq DNA Signature Prep Kit, a commercially available kit on MPS platform. A total of 152 DNA markers, including 27 autosomal STR (A-STR) loci, 24 Y chromosomal STR (Y-STR) loci, 7 X chromosomal STR (X-STR) loci and 94 iiSNP loci were genotyped for 107 Tibetan individuals (53 males and 54 females). Compared with length-based STR typing methods, 112 more A-STR alleles, 41 more Y-STR alleles, and 24 more X-STR alleles were observed at 17 A-STRs, 9 Y-STRs, and 5 X-STRs using sequence-based approaches. Thirty-nine novel sequence variations were observed at 20 STR loci. When the flanking regions were also analyzed in addition to target SNPs at the 94 iiSNPs, 38 more alleles were identified. Our study provided an adequate genotype and frequencies data of the two types of genetic markers for forensic practice. Moreover, we also proved that this panel is highly polymorphic and informative in Tibetan population, and should be efficient in forensic kinship testing and personal identification cases. Several genetic markers have been introduced to forensic genetics to clarify the problems of kinship analysis and personal identification. Short tandem repeats (STRs) and single nucleotide polymorphisms (SNPs) are commonly used genetic markers in present forensic cases1,2. STRs, usually 2–6 bp in length, are commonly typed with the amplified fragment length polymorphism (Amp-FLP) strategy combining fluorescently labelled multiplex PCR and capillary electrophoresis (CE)3. Allele calling can thus be inferred from fragment length by comparison with a locus specific allelic ladder that has been previously sequenced, where the number of repeat units is distinct2. Thus, each allele is regarded as a lengthbased (LB) allele using this approach. With the advancement of sequencing technologies over the last decade, the existence of sequence structure variations in alleles with the same length has been u ncovered4. SNPs, which could be amplified with smaller amplicons, are bi-allelic genetic markers with lower mutation rates compared with S TRs5. Several autosomal SNP marker sets and detection methods, such as single-base extension, chip-based microarrays, and allele-specific hybridization arrays, have been developed to compensate for the relatively weaker discrimination power of single loci caused by the bi-allelic nature of the human g enome5–7. However, these methods are not widely used in forensic practice due to the requirement of higher DNA inputs or the limited ability to detect a vast number of SNP loci in a single r eaction8. Different from detection methods mentioned above, massively parallel sequencing (MPS), also known as nextgeneration sequencing (NGS), provides new technology for forensic genetic marker typing. Numerous markers and samples can be investigated simultaneously with MPS, and there is no need to consider the problem of the 1 Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun Yat-Sen University, No. 74 Zhongshan Road II, Guangzhou 510080, Guangdong, People’s Republic of China. 2Guangdong Province Translational Forensic Medicine Engineering Technology Research Center, Sun Yat-Sen University, Guangzhou 510080, People’s Republic of China. 3These authors contributed equally: Dan Peng, Yinming Zhang and Han Ren. *email: 812811596@ qq.com; ; Scientific Reports | (2020) 10:12225 | https://doi.org/10.1038/s41598-020-69137-1 1 Vol.:(0123456789) www.nature.com/scientificreports/ size overlapping of amplified fragments or the availability of fluorescent labels as CE method does. For STRs, both length and sequence data can be achieved; thus, allele calling may be more informative and the allele’s sequence characteristics are identified, resulting in sequence-based (SB) alleles. For SNPs, not only the target SNPs but also the variations in the flanking regions can be identified simultaneously, and form the potential microhaplotype9,10. Thus, more alleles can be identified based on the analysis of full sequences of SNPs. This new technology puts forward new challenges to researchers. First of all, the immense variable and complex data produced by MPS platforms is hard to be analysed manually. Meanwhile, the software packages developed for LB datasets are not efficient anymore. New bioinformatic methods are required to process and interpret these extensive data. An optimal package for MPS data analysis needs to be accurate, time-saving and easy to operate. Several packages has been published to make this process convenient for forensic uses, such as TSSV11, STRait R azor12,13, STRinNGS14, SEQ Mapper15, FDStools16 et al. Sequencer manufacturers also carried out supplementary analysis packages to fit for the data produced by their sequencers, such as ForenSeq Universal Analysis Software17 (UAS, Illumina, San Diego, CA) and Ion Torrent Suite Software Plugins18 (Thermo Fisher Scientific, South San Francisco, CA). Moreover, the LB nomenclature of CE method for STR is not suitable for the complex sequence variations detected by MPS platforms. It is urgent to know how the MPS data should be analysed and reported, what connections do these data have with LB alleles, and how to record and search such datasets in a database4. Some researchers have tried to answer these q uestions19,20 but a perfect nomenclature is still under development. A unified minimal nomenclature of the complex sequences obtained by MPS technologies was recommended by the International Society for Forensic Genetics (ISFG) in 2 01621,22 to facilitate communication between laboratories and to make this data backward compatible with LB data produced on CE platform. In early 2019, the STRAND Working Group was formalized to discuss the expanding and advancing topics of STR sequence n omenclature23. 24 Quality control of string sequences and alleles has also been suggested by I SFG . Aiming to facilitate MPS in forensic genetics practice, several commercial and custom STR typing systems have been developed based on different MPS platforms for different purposes25–28. The ForenSeq DNA Signature Prep Kit (Illumina, San Diego, CA) is one of the library preparation kits that simultaneously targets the sequences of Amelogenin, 27 autosomal STRs (A-STRs), 24 Y-ST (...truncated)