Mapping Insertions, Deletions and SNPs on Venter's Chromosomes

Jun 2009

Background The very recent availability of fully sequenced individual human genomes is a major revolution in biology which is certainly going to provide new insights into genetic diseases and genomic rearrangements. Results We mapped the insertions, deletions and SNPs (single nucleotide polymorphisms) that are present in Craig Venter's genome, more precisely on chromosomes 17 to 22, and compared them with the human reference genome hg17. Our results show that insertions and deletions are almost absent in L1 and generally scarce in L2 isochore families (GC-poor L1+L2 isochores represent slightly over half of the human genome), whereas they increase in GC-rich isochores, largely paralleling the densities of genes, retroviral integrations and Alu sequences. The distributions of insertions/deletions are in striking contrast with those of SNPs which exhibit almost the same density across all isochore families with, however, a trend for lower concentrations in gene-rich regions. Conclusions Our study strongly suggests that the distribution of insertions/deletions is due to the structure of chromatin which is mostly open in gene-rich, GC-rich isochores, and largely closed in gene-poor, GC-poor isochores. The different distributions of insertions/deletions and SNPs are clearly related to the two different responsible mechanisms, namely recombination and point mutations.

Mapping Insertions, Deletions and SNPs on Venter's Chromosomes

Citation: Costantini M, Bernardi G ( Mapping Insertions, Deletions and SNPs on Venter's Chromosomes Maria Costantini 0 Giorgio Bernardi 0 Mark A. Batzer, Louisiana State University, United States of America 0 Stazione Zoologica Anton Dohrn , Naples , Italy Background: The very recent availability of fully sequenced individual human genomes is a major revolution in biology which is certainly going to provide new insights into genetic diseases and genomic rearrangements. Results: We mapped the insertions, deletions and SNPs (single nucleotide polymorphisms) that are present in Craig Venter's genome, more precisely on chromosomes 17 to 22, and compared them with the human reference genome hg17. Our results show that insertions and deletions are almost absent in L1 and generally scarce in L2 isochore families (GC-poor L1+L2 isochores represent slightly over half of the human genome), whereas they increase in GC-rich isochores, largely paralleling the densities of genes, retroviral integrations and Alu sequences. The distributions of insertions/deletions are in striking contrast with those of SNPs which exhibit almost the same density across all isochore families with, however, a trend for lower concentrations in gene-rich regions. Conclusions: Our study strongly suggests that the distribution of insertions/deletions is due to the structure of chromatin which is mostly open in gene-rich, GC-rich isochores, and largely closed in gene-poor, GC-poor isochores. The different distributions of insertions/deletions and SNPs are clearly related to the two different responsible mechanisms, namely recombination and point mutations. - The very recent availability of fully sequenced individual human genomes [15] is a major revolution in biology which is certainly going to provide new insights into genetic diseases and genomic rearrangements in the near future. In the present work, we looked at the insertions, deletions and SNPs that are present in Craig Venters genome [1], more precisely on chromosomes 17 to 22 (334 megabases, about 10% of the human genome), and compared them with the human reference genome hg17 from UCSC website. The three main reasons for carrying out this investigation were the following: (i) to localize insertions, deletions and SNPs on chromosomes 17 to 22, in connection with the compartmentalization of the human genome into isochores [6,7]; this was done at two levels, namely localization in isochore families (L1, L2, H1, H2, H3, in order of increasing GC and gene density) and mapping within the isochores; (ii) to correlate insertions, deletions and SNPs with the densities of genes, interspersed repeats and retroviral insertions, since these densities are correlated, in turn, with isochore GC levels [812,6], and since they may provide indications for the preference of insertions/deletions for different isochore families; (iii) to prepare the ground for exploring the expression of genes located in the neighborhood of deletions and insertions; indeed it has been postulated [7] that compositional changes due to the accumulation of AT-biased point mutations or to deletions/insertions may be associated with alterations of chromatin structure that, in turn, may affect gene expression. It should be pointed out that the present work only concerns (i) insertions and deletions among structural variations (not including copy-number variations such as segmental duplications; see ref. [13] for a review, and ref. [14]); and (ii) SNPs as detected by pairwise alignment of sequences. It should also be stressed that the Venter genome used in our comparison, represents a composite haploid version of the genome where the highest scoring alleles contained are represented in the consensus sequence. The human reference genome hg17 (practically identical to the latest hg18 version for the chromosomes under consideration) is a composite genome resulting from several individuals. Insertions and deletions, as well as SNPs, reported in this article are, therefore, the result of the comparison of one genome, the Venter genome, with several individual genomes. In other words, each insertion and deletion in Venter is derived from a comparison with another individual, but not necessarily the same individual. Obviously, this also applies to SNPs. We thought that our approach was acceptable in view of the fact that our primary aim was to look for the localization of insertions/deletions and SNPs on isochores. Focusing on chromosomes 1722 is justified by considering that these chromosomes are representative, in terms of isochores, of the whole human genome. A detailed comparison of the full Venter genome with the human reference genome was not warranted at the time of our investigations, because the human reference genome, as already mentioned, is a composite genome. Obviously, a comparison of full individual genomes will be of interest as soon as this will be possible. The choice of chromosomes 17 to 22 was due to the fact that while these chromosomes exhibit wide differences in their isochore patterns, they cumulatively show an overall similarity with the isochore patterns of the whole human genome [15]. Indeed, as shown in Figure 1, chromosomes 17 and 20 are characterized by a predominance of H1 and H2 isochores, whereas L1 isochores are poorly represented. In contrast, chromosomes 18 and 21 are characterized by abundant L1 isochores (as well as L2 isochores in the case of chromosome 18, which lacks H3 isochores altogether). Chromosomes 19 and 22 completely lack isochore family L1, are very scarce in L2 isochores, and show a great abundance of H1 and, especially, of H2 isochores. It should be noted that while Figure 1 reports the isochore patterns of chromosomes from Figure 1. Distribution of isochores on chromosomes 17 to 22 from the human reference genome. The histograms show the distribution (by weight) of isochores as pooled in bins of 0.5% GC for chromosomes 17 to 22 from hg17. Colors represent the five isochore families. The color code spans the spectrum of GC level in five steps, indicated by broken horizontal lines: ultramarine blue (L1), light blue (L2), yellow (H1), orange (H2) and red (H3). Note the different scales on the ordinate axis. doi:10.1371/journal.pone.0005972.g001 release hg17, the isochore profiles of hg17 and hg18, the most recent release, are identical as far as chromosomes 17 to 22 are concerned, the only exceptions being three small gaps in hg17 of chromosome 22 which were filled in the hg18 version (see Figure S1). Figure 2 compares the cumulative isochore pattern of chromosomes 17 to 22 with that of the whole human genome. The former one is characterized by an under-representation of GC-poor isochore families L1 and L2 and by an overrepresentation of GC-rich isochore families H1, H2 and H3. Chromosomes 17 to 22 still provide, however, a fair representation of the isochore pattern of the whole human genome, which is satisfactory for the purpose of this (...truncated)


This is a preview of a remote PDF: http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0005972&type=printable
Article home page: http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0005972

Maria Costantini, Giorgio Bernardi. Mapping Insertions, Deletions and SNPs on Venter's Chromosomes, 2009, 6, DOI: 10.1371/journal.pone.0005972