Genomic features in the breakpoint regions between syntenic blocks (pdf)

Article PDF cannot be displayed. You can download it here:

https://bioinformatics.oxfordjournals.org/content/20/suppl_1/i318.full.pdf

Genomic features in the breakpoint regions between syntenic blocks

Phil Trinh 2 Aoife McLysaght 1 David Sankoff 0 0 Department of Mathematics and Statistics, University of Ottawa , 585 King Edward Avenue, Ottawa K1N 6N5 , Canada 1 Genetics Department, Trinity College, University of Dublin , Dublin 2 , Ireland 2 Hillcrest High School , Ottawa K1G 2L7 , Canada Motivation: We study the largely unaligned regions between the syntenic blocks conserved in humans and mice, based on data extracted from the UCSC genome browser. These regions contain evolutionary breakpoints caused by inversion, translocation and other processes. Results: We suggest explanations for the limited amount of genomic alignment in the neighbourhoods of breakpoints. We discount inferences of extensive breakpoint reuse as artefacts introduced during the reconstruction of syntenic blocks. We find that the number, size and distribution of small aligned fragments in the breakpoint regions depend on the origin of the neighbouring blocks and the other blocks on the same chromosome. We account for this and for the generalized loss of alignment in the regions partially by artefacts due to alignment protocols and partially by mutational processes operative only after the rearrangement event. These results are consistent with breakpoints occurring randomly over virtually the entire genome. Contact: 1 INTRODUCTION Complex alignment protocols developed independently by two research groups (Pevzner and Tesler, 2003; Kent et al., 2003) have reconstructed the chromosomal segments conserved in the evolution of the genome sequences of both mouse and man, without recourse to an intermediate stage of orthologous gene identification. The protocols use somewhat different strategies to combine short regions of elevated similarity to construct the conserved segments, bridging singly- or doublygapped regions where similarity does not attain a threshold criterion and ignoring short inversions and transpositions that have rearranged one sequence or the other. The difficulty of this reconstruction task cannot be overemphasized and its accomplishment is a testimony to the scientific judgement and computational skills of the participating researchers. One aspect of the reconstruction that is of particular interest is the nature of the DNA sequence in the neighbourhood of the Fig. 1. Hypothetical human chromosome with syntenic blocks B1B5 and small fragments, with shading keyed to aligned portions of mouse chromosomes. a, archipelago; c, compatriot; and f, foreigner. breakpoints between two conserved (or syntenic) blocks adjacent on an autosome in the human genome, say, but remote or even on different autosomes in the mouse genome. (For clarity, we will continue our exposition treating the human and mouse genomes asymmetrically in this way, though their roles could be reversed without materially affecting the discussion or results.) Generally, the two syntenic blocks on either side of the breakpoint do not abut directly, but are rather separated by a short region where there is little similarity with the mouse genome. These regions (or spaces) do generally contain a number of smaller fragments of homology with the same mouse chromosomes as the two adjacent syntenic blocks (the archipelago), with other mouse autosomes sharing syntenic blocks with the same human chromosome (the compatriots) and with mouse chromosomes, including the X, having no such syntenic blocks (the foreigners). Figure 1 depicts these categories. The breakpoints are created by chromosomal rearrangement process such as inversion and translocation of various kinds that drive the evolution of genomic structure. Where in the genome these breakpoints can and do occur is a fundamental question in the evolution of species, and it is in the hope that the small fragments within the breakpoint regions contain some hints about this question that we undertake a statistical assessment of the three types. THE RANDOM HYPOTHESIS AND THE ALTERNATIVE At a sufficiently low level of resolution, one might hypothesize that breakpoints could occur randomly along the lengths of chromosome, in analogy to recombination sites. Indeed, this hypothesis is implicit in the prophetic work of Nadeau and Taylor (1984) in their early estimation of the number of conserved segments in the humanmouse comparison. Again in analogy with recombination sites, we may weaken this hypothesis by allowing some variation of breakage susceptibility. And of course at higher levels of resolution, we would expect selection to disfavour breakage at gene-internal sites (in introns and especially within exons) or occasionally between neighbouring genes co-expressed for functional reasons, while breakage is known to be endemic in eukaryotes in subtelomeric regions (Mefford and Trask, 2002; Kellis et al., 2003; Katinka et al., 2001) and, at least in primates, much rearrangement seems to occur in pericentromeric regions (Bailey et al., 2002). Nevertheless, with specific exceptional regions, accounting for perhaps 5% of the genome, the idea that evolutionary rearrangements can break chromosomes anywhere in the genome cannot be rejected with current data. Indeed, the only data not of the historical inference type bearing directly on this question, namely the location of breakpoints in (non-sterile) human carriers of translocations, suggests a uniform distribution the length of the chromosome, contrasting with breakpoints in somatic cell (tumour) genomes, which are non-uniformly concentrated arm-centrally on chromosomes, or in subtelomeric bands (Sankoff et al., 2002). Documentation of evolutionary subtelomeric translocational hotspots and pericentromeric duplication and/or transpositional hotspots lead, nonetheless, to an alternate hypothesis, that potential breakpoints are largely restricted to a limited number (e.g. <500) of very small regions in the genome, and that this regional susceptibility is conserved over considerable evolutionary time scales. This position has been argued most forcefully by Pevzner and Tesler (2003), who advanced the hypothesis that the observed spaces between syntenic blocks correspond to fragile breakpoint regions and these are conserved, at least across the mammals. The main evidence offered for this claim is that an algorithmic reconstruction of rearrangement history, based on the current positions of the syntenic blocks in the two species, requires almost the same number of rearrangements (mostly inversions and reciprocal translocations) as the number of blocks, implying that each breakpoint region contains almost two breakpoints, on the average (since each inversion or reciprocal translocation involves two breakpoints). Were the two breakpoints for each rearrangement situated at random chromosomal sites, on the other hand, it would be rare that any two points would fall in the same small region. Pevzner and Tesler (2003) interpret the lack of sustained humanmouse similarity in the breakpoint regions as suggestive of frequent rearrangement affecti (...truncated)