Sensitivity and correlation of hypervariable regions in 16S rRNA genes in phylogenetic analysis
Yang et al. BMC Bioinformatics
Sensitivity and correlation of hypervariable regions in 16S rRNA genes in phylogenetic analysis
Bo Yang 0
Yong Wang
Pei-Yuan Qian 0
0 Division of Life Sciences, Hong Kong University of Science and Technology , Clear Water Bay , Hong Kong
Background: Prokaryotic 16S ribosomal RNA (rRNA) sequences are widely used in environmental microbiology and molecular evolution as reliable markers for the taxonomic classification and phylogenetic analysis of microbes. Restricted by current sequencing techniques, the massive sequencing of 16S rRNA gene amplicons encompassing the full length of genes is not yet feasible. Thus, the selection of the most efficient hypervariable regions for phylogenetic analysis and taxonomic classification is still debated. In the present study, several bioinformatics tools were integrated to build an in silico pipeline to evaluate the phylogenetic sensitivity of the hypervariable regions compared with the corresponding full-length sequences. Results: The correlation of seven sub-regions was inferred from the geodesic distance, a parameter that is applied to quantitatively compare the topology of different phylogenetic trees constructed using the sequences from different sub-regions. The relationship between different sub-regions based on the geodesic distance indicated that V4-V6 were the most reliable regions for representing the full-length 16S rRNA sequences in the phylogenetic analysis of most bacterial phyla, while V2 and V8 were the least reliable regions. Conclusions: Our results suggest that V4-V6 might be optimal sub-regions for the design of universal primers with superior phylogenetic resolution for bacterial phyla. A potential relationship between function and the evolution of 16S rRNA is also discussed.
16S rRNA; 16S rRNA gene; Variable regions; Phylogenetic; Geodesic distance; Primer
Background
As the major players in almost all environments
explored, bacteria contribute immensely to global energy
conversion and the recycling of matter. Thus, profiling
of the microbial community is one of the most
important tasks for microbiologists to explore various
ecosystems. However, our understanding of the kingdom
Bacteria remains limited because most bacteria cannot
be cultured or isolated under laboratory conditions [
1
].
In the past few decades, DGGE (Denaturing gradient
gel electrophoresis) [
2
], T-RFLP (Terminal restriction
fragment length polymorphism) [
3
], FISH (fluorescent
in situ hybridization) [
4
] and Genechips [
5
] were used as
mainstream methods in studies of bacterial communities
and diversity until the development of high-throughput
sequencing technology. Recently, meta-genomic methods
provided by next-generation sequencing technology such
as Roche 454 [
6, 7
] and Illumina [8] have facilitated a
remarkable expansion of our knowledge regarding
uncultured bacteria [
7
].
The 16S rRNA gene sequence was first used in 1985
for phylogenetic analysis [
9
]. Because it contains both
highly conserved regions for primer design and
hypervariable regions to identify phylogenetic characteristics
of microorganisms, the 16S rRNA gene sequence
became the most widely used marker gene for profiling
bacterial communities [
10
]. Full-length 16S rRNA gene
sequences consist of nine hypervariable regions that are
separated by nine highly conserved regions [
11, 12
].
Limited by sequencing technology, the 16S rRNA gene
sequences used in most studies are partial sequences.
Therefore, the selection of proper primers is critical to
study bacterial phylogeny in various environments.
An early study has shown that the use of different
primers might result in different DGGE patterns [
13
].
Recent studies utilizing high throughput technology have
also demonstrated that the use of suboptimal primer
pairs results in the uneven amplification of certain
species, causing either an under- or over-estimation of some
species in a microbial community [
10–12, 14
]. Although
several studies have focused on optimal primer pairs or,
equivalently, optimal variable regions for the study of
bacterial communities [
15–17
], they utilized synthetic
microbial communities and the taxa that were chosen to
conduct those experiments would largely influence the
final results. Consequently, the use of different
sequencing technologies and targeting of different sub-regions
of 16S rRNA genes will result in a distinct composition
of a given microbial community. However, till now there
was few study focusing on comparing the phylogenetic
sensitivity of the 16S rRNA sub-regions.
Phylogenetic trees are widely used to elucidate
systematic relationships between different species, in particular
the novel microbial lineages [
9, 18–20
]. However,
strategies to determine relationships between different 16S
rRNA sub-regions in terms of phylogenetic resolution
remain questionable. The correlation of the different
hypervariable regions may be inferred from the geo (...truncated)