Mapping Insertions, Deletions and SNPs on Venter's Chromosomes
Citation: Costantini M, Bernardi G (
Mapping Insertions, Deletions and SNPs on Venter's Chromosomes
Maria Costantini 0
Giorgio Bernardi 0
Mark A. Batzer, Louisiana State University, United States of America
0 Stazione Zoologica Anton Dohrn , Naples , Italy
Background: The very recent availability of fully sequenced individual human genomes is a major revolution in biology which is certainly going to provide new insights into genetic diseases and genomic rearrangements. Results: We mapped the insertions, deletions and SNPs (single nucleotide polymorphisms) that are present in Craig Venter's genome, more precisely on chromosomes 17 to 22, and compared them with the human reference genome hg17. Our results show that insertions and deletions are almost absent in L1 and generally scarce in L2 isochore families (GC-poor L1+L2 isochores represent slightly over half of the human genome), whereas they increase in GC-rich isochores, largely paralleling the densities of genes, retroviral integrations and Alu sequences. The distributions of insertions/deletions are in striking contrast with those of SNPs which exhibit almost the same density across all isochore families with, however, a trend for lower concentrations in gene-rich regions. Conclusions: Our study strongly suggests that the distribution of insertions/deletions is due to the structure of chromatin which is mostly open in gene-rich, GC-rich isochores, and largely closed in gene-poor, GC-poor isochores. The different distributions of insertions/deletions and SNPs are clearly related to the two different responsible mechanisms, namely recombination and point mutations.
-
The very recent availability of fully sequenced individual human
genomes [15] is a major revolution in biology which is certainly
going to provide new insights into genetic diseases and genomic
rearrangements in the near future. In the present work, we looked
at the insertions, deletions and SNPs that are present in Craig
Venters genome [1], more precisely on chromosomes 17 to 22
(334 megabases, about 10% of the human genome), and compared
them with the human reference genome hg17 from UCSC
website.
The three main reasons for carrying out this investigation were
the following: (i) to localize insertions, deletions and SNPs on
chromosomes 17 to 22, in connection with the
compartmentalization of the human genome into isochores [6,7]; this was done at
two levels, namely localization in isochore families (L1, L2, H1,
H2, H3, in order of increasing GC and gene density) and mapping
within the isochores; (ii) to correlate insertions, deletions and SNPs
with the densities of genes, interspersed repeats and retroviral
insertions, since these densities are correlated, in turn, with
isochore GC levels [812,6], and since they may provide
indications for the preference of insertions/deletions for different
isochore families; (iii) to prepare the ground for exploring the
expression of genes located in the neighborhood of deletions and
insertions; indeed it has been postulated [7] that compositional
changes due to the accumulation of AT-biased point mutations or
to deletions/insertions may be associated with alterations of
chromatin structure that, in turn, may affect gene expression.
It should be pointed out that the present work only concerns (i)
insertions and deletions among structural variations (not including
copy-number variations such as segmental duplications; see ref.
[13] for a review, and ref. [14]); and (ii) SNPs as detected by
pairwise alignment of sequences. It should also be stressed that the
Venter genome used in our comparison, represents a composite
haploid version of the genome where the highest scoring alleles
contained are represented in the consensus sequence. The human
reference genome hg17 (practically identical to the latest hg18
version for the chromosomes under consideration) is a composite
genome resulting from several individuals. Insertions and
deletions, as well as SNPs, reported in this article are, therefore, the
result of the comparison of one genome, the Venter genome, with
several individual genomes. In other words, each insertion and
deletion in Venter is derived from a comparison with another
individual, but not necessarily the same individual. Obviously, this
also applies to SNPs. We thought that our approach was
acceptable in view of the fact that our primary aim was to look
for the localization of insertions/deletions and SNPs on isochores.
Focusing on chromosomes 1722 is justified by considering that
these chromosomes are representative, in terms of isochores, of the
whole human genome. A detailed comparison of the full Venter
genome with the human reference genome was not warranted at
the time of our investigations, because the human reference
genome, as already mentioned, is a composite genome. Obviously,
a comparison of full individual genomes will be of interest as soon
as this will be possible.
The choice of chromosomes 17 to 22 was due to the fact that
while these chromosomes exhibit wide differences in their isochore
patterns, they cumulatively show an overall similarity with the
isochore patterns of the whole human genome [15]. Indeed, as
shown in Figure 1, chromosomes 17 and 20 are characterized by a
predominance of H1 and H2 isochores, whereas L1 isochores are
poorly represented. In contrast, chromosomes 18 and 21 are
characterized by abundant L1 isochores (as well as L2 isochores in
the case of chromosome 18, which lacks H3 isochores altogether).
Chromosomes 19 and 22 completely lack isochore family L1, are
very scarce in L2 isochores, and show a great abundance of H1
and, especially, of H2 isochores. It should be noted that while
Figure 1 reports the isochore patterns of chromosomes from
Figure 1. Distribution of isochores on chromosomes 17 to 22 from the human reference genome. The histograms show the distribution
(by weight) of isochores as pooled in bins of 0.5% GC for chromosomes 17 to 22 from hg17. Colors represent the five isochore families. The color code
spans the spectrum of GC level in five steps, indicated by broken horizontal lines: ultramarine blue (L1), light blue (L2), yellow (H1), orange (H2) and
red (H3). Note the different scales on the ordinate axis.
doi:10.1371/journal.pone.0005972.g001
release hg17, the isochore profiles of hg17 and hg18, the most
recent release, are identical as far as chromosomes 17 to 22 are
concerned, the only exceptions being three small gaps in hg17 of
chromosome 22 which were filled in the hg18 version (see Figure
S1).
Figure 2 compares the cumulative isochore pattern of
chromosomes 17 to 22 with that of the whole human genome.
The former one is characterized by an under-representation of
GC-poor isochore families L1 and L2 and by an
overrepresentation of GC-rich isochore families H1, H2 and H3.
Chromosomes 17 to 22 still provide, however, a fair representation
of the isochore pattern of the whole human genome, which is
satisfactory for the purpose of this (...truncated)