Sensitivity and correlation of hypervariable regions in 16S rRNA genes in phylogenetic analysis

BMC Bioinformatics, Mar 2016

Background Prokaryotic 16S ribosomal RNA (rRNA) sequences are widely used in environmental microbiology and molecular evolution as reliable markers for the taxonomic classification and phylogenetic analysis of microbes. Restricted by current sequencing techniques, the massive sequencing of 16S rRNA gene amplicons encompassing the full length of genes is not yet feasible. Thus, the selection of the most efficient hypervariable regions for phylogenetic analysis and taxonomic classification is still debated. In the present study, several bioinformatics tools were integrated to build an in silico pipeline to evaluate the phylogenetic sensitivity of the hypervariable regions compared with the corresponding full-length sequences. Results The correlation of seven sub-regions was inferred from the geodesic distance, a parameter that is applied to quantitatively compare the topology of different phylogenetic trees constructed using the sequences from different sub-regions. The relationship between different sub-regions based on the geodesic distance indicated that V4-V6 were the most reliable regions for representing the full-length 16S rRNA sequences in the phylogenetic analysis of most bacterial phyla, while V2 and V8 were the least reliable regions. Conclusions Our results suggest that V4-V6 might be optimal sub-regions for the design of universal primers with superior phylogenetic resolution for bacterial phyla. A potential relationship between function and the evolution of 16S rRNA is also discussed.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

http://www.biomedcentral.com/content/pdf/s12859-016-0992-y.pdf

Sensitivity and correlation of hypervariable regions in 16S rRNA genes in phylogenetic analysis

Yang et al. BMC Bioinformatics Sensitivity and correlation of hypervariable regions in 16S rRNA genes in phylogenetic analysis Bo Yang 0 Yong Wang Pei-Yuan Qian 0 0 Division of Life Sciences, Hong Kong University of Science and Technology , Clear Water Bay , Hong Kong Background: Prokaryotic 16S ribosomal RNA (rRNA) sequences are widely used in environmental microbiology and molecular evolution as reliable markers for the taxonomic classification and phylogenetic analysis of microbes. Restricted by current sequencing techniques, the massive sequencing of 16S rRNA gene amplicons encompassing the full length of genes is not yet feasible. Thus, the selection of the most efficient hypervariable regions for phylogenetic analysis and taxonomic classification is still debated. In the present study, several bioinformatics tools were integrated to build an in silico pipeline to evaluate the phylogenetic sensitivity of the hypervariable regions compared with the corresponding full-length sequences. Results: The correlation of seven sub-regions was inferred from the geodesic distance, a parameter that is applied to quantitatively compare the topology of different phylogenetic trees constructed using the sequences from different sub-regions. The relationship between different sub-regions based on the geodesic distance indicated that V4-V6 were the most reliable regions for representing the full-length 16S rRNA sequences in the phylogenetic analysis of most bacterial phyla, while V2 and V8 were the least reliable regions. Conclusions: Our results suggest that V4-V6 might be optimal sub-regions for the design of universal primers with superior phylogenetic resolution for bacterial phyla. A potential relationship between function and the evolution of 16S rRNA is also discussed. 16S rRNA; 16S rRNA gene; Variable regions; Phylogenetic; Geodesic distance; Primer Background As the major players in almost all environments explored, bacteria contribute immensely to global energy conversion and the recycling of matter. Thus, profiling of the microbial community is one of the most important tasks for microbiologists to explore various ecosystems. However, our understanding of the kingdom Bacteria remains limited because most bacteria cannot be cultured or isolated under laboratory conditions [ 1 ]. In the past few decades, DGGE (Denaturing gradient gel electrophoresis) [ 2 ], T-RFLP (Terminal restriction fragment length polymorphism) [ 3 ], FISH (fluorescent in situ hybridization) [ 4 ] and Genechips [ 5 ] were used as mainstream methods in studies of bacterial communities and diversity until the development of high-throughput sequencing technology. Recently, meta-genomic methods provided by next-generation sequencing technology such as Roche 454 [ 6, 7 ] and Illumina [8] have facilitated a remarkable expansion of our knowledge regarding uncultured bacteria [ 7 ]. The 16S rRNA gene sequence was first used in 1985 for phylogenetic analysis [ 9 ]. Because it contains both highly conserved regions for primer design and hypervariable regions to identify phylogenetic characteristics of microorganisms, the 16S rRNA gene sequence became the most widely used marker gene for profiling bacterial communities [ 10 ]. Full-length 16S rRNA gene sequences consist of nine hypervariable regions that are separated by nine highly conserved regions [ 11, 12 ]. Limited by sequencing technology, the 16S rRNA gene sequences used in most studies are partial sequences. Therefore, the selection of proper primers is critical to study bacterial phylogeny in various environments. An early study has shown that the use of different primers might result in different DGGE patterns [ 13 ]. Recent studies utilizing high throughput technology have also demonstrated that the use of suboptimal primer pairs results in the uneven amplification of certain species, causing either an under- or over-estimation of some species in a microbial community [ 10–12, 14 ]. Although several studies have focused on optimal primer pairs or, equivalently, optimal variable regions for the study of bacterial communities [ 15–17 ], they utilized synthetic microbial communities and the taxa that were chosen to conduct those experiments would largely influence the final results. Consequently, the use of different sequencing technologies and targeting of different sub-regions of 16S rRNA genes will result in a distinct composition of a given microbial community. However, till now there was few study focusing on comparing the phylogenetic sensitivity of the 16S rRNA sub-regions. Phylogenetic trees are widely used to elucidate systematic relationships between different species, in particular the novel microbial lineages [ 9, 18–20 ]. However, strategies to determine relationships between different 16S rRNA sub-regions in terms of phylogenetic resolution remain questionable. The correlation of the different hypervariable regions may be inferred from the geo (...truncated)


This is a preview of a remote PDF: http://www.biomedcentral.com/content/pdf/s12859-016-0992-y.pdf

Bo Yang, Yong Wang, Pei-Yuan Qian. Sensitivity and correlation of hypervariable regions in 16S rRNA genes in phylogenetic analysis, BMC Bioinformatics, 2016, pp. 135, 17, DOI: 10.1186/s12859-016-0992-y