Genomic features in the breakpoint regions between syntenic blocks
Phil Trinh
2
Aoife McLysaght
1
David Sankoff
0
0
Department of Mathematics and Statistics, University of Ottawa
,
585 King Edward Avenue, Ottawa K1N 6N5
,
Canada
1
Genetics Department, Trinity College, University of Dublin
,
Dublin 2
,
Ireland
2
Hillcrest High School
,
Ottawa K1G 2L7
,
Canada
Motivation: We study the largely unaligned regions between the syntenic blocks conserved in humans and mice, based on data extracted from the UCSC genome browser. These regions contain evolutionary breakpoints caused by inversion, translocation and other processes. Results: We suggest explanations for the limited amount of genomic alignment in the neighbourhoods of breakpoints. We discount inferences of extensive breakpoint reuse as artefacts introduced during the reconstruction of syntenic blocks. We find that the number, size and distribution of small aligned fragments in the breakpoint regions depend on the origin of the neighbouring blocks and the other blocks on the same chromosome. We account for this and for the generalized loss of alignment in the regions partially by artefacts due to alignment protocols and partially by mutational processes operative only after the rearrangement event. These results are consistent with breakpoints occurring randomly over virtually the entire genome. Contact:
1 INTRODUCTION
Complex alignment protocols developed independently by
two research groups (Pevzner and Tesler, 2003; Kent et al.,
2003) have reconstructed the chromosomal segments
conserved in the evolution of the genome sequences of both mouse
and man, without recourse to an intermediate stage of
orthologous gene identification. The protocols use somewhat
different strategies to combine short regions of elevated similarity to
construct the conserved segments, bridging singly- or
doublygapped regions where similarity does not attain a threshold
criterion and ignoring short inversions and transpositions that
have rearranged one sequence or the other. The difficulty
of this reconstruction task cannot be overemphasized and its
accomplishment is a testimony to the scientific judgement and
computational skills of the participating researchers.
One aspect of the reconstruction that is of particular interest
is the nature of the DNA sequence in the neighbourhood of the
Fig. 1. Hypothetical human chromosome with syntenic blocks
B1B5 and small fragments, with shading keyed to aligned
portions of mouse chromosomes. a, archipelago; c, compatriot; and
f, foreigner.
breakpoints between two conserved (or syntenic) blocks
adjacent on an autosome in the human genome, say, but remote
or even on different autosomes in the mouse genome. (For
clarity, we will continue our exposition treating the human
and mouse genomes asymmetrically in this way, though their
roles could be reversed without materially affecting the
discussion or results.) Generally, the two syntenic blocks on either
side of the breakpoint do not abut directly, but are rather
separated by a short region where there is little similarity with the
mouse genome. These regions (or spaces) do generally
contain a number of smaller fragments of homology with the same
mouse chromosomes as the two adjacent syntenic blocks (the
archipelago), with other mouse autosomes sharing syntenic
blocks with the same human chromosome (the compatriots)
and with mouse chromosomes, including the X, having no
such syntenic blocks (the foreigners). Figure 1 depicts these
categories.
The breakpoints are created by chromosomal rearrangement
process such as inversion and translocation of various kinds
that drive the evolution of genomic structure. Where in the
genome these breakpoints can and do occur is a fundamental
question in the evolution of species, and it is in the hope
that the small fragments within the breakpoint regions contain
some hints about this question that we undertake a statistical
assessment of the three types.
THE RANDOM HYPOTHESIS AND THE
ALTERNATIVE
At a sufficiently low level of resolution, one might
hypothesize that breakpoints could occur randomly along the lengths
of chromosome, in analogy to recombination sites. Indeed,
this hypothesis is implicit in the prophetic work of Nadeau
and Taylor (1984) in their early estimation of the number
of conserved segments in the humanmouse comparison.
Again in analogy with recombination sites, we may weaken
this hypothesis by allowing some variation of breakage
susceptibility. And of course at higher levels of resolution, we
would expect selection to disfavour breakage at gene-internal
sites (in introns and especially within exons) or occasionally
between neighbouring genes co-expressed for functional
reasons, while breakage is known to be endemic in eukaryotes in
subtelomeric regions (Mefford and Trask, 2002; Kellis et al.,
2003; Katinka et al., 2001) and, at least in primates, much
rearrangement seems to occur in pericentromeric regions
(Bailey et al., 2002). Nevertheless, with specific exceptional
regions, accounting for perhaps 5% of the genome, the idea
that evolutionary rearrangements can break chromosomes
anywhere in the genome cannot be rejected with current
data. Indeed, the only data not of the historical inference
type bearing directly on this question, namely the location of
breakpoints in (non-sterile) human carriers of translocations,
suggests a uniform distribution the length of the chromosome,
contrasting with breakpoints in somatic cell (tumour)
genomes, which are non-uniformly concentrated arm-centrally
on chromosomes, or in subtelomeric bands (Sankoff et al.,
2002).
Documentation of evolutionary subtelomeric translocational
hotspots and pericentromeric duplication and/or
transpositional hotspots lead, nonetheless, to an alternate hypothesis,
that potential breakpoints are largely restricted to a limited
number (e.g. <500) of very small regions in the genome, and
that this regional susceptibility is conserved over considerable
evolutionary time scales. This position has been argued most
forcefully by Pevzner and Tesler (2003), who advanced the
hypothesis that the observed spaces between syntenic blocks
correspond to fragile breakpoint regions and these are
conserved, at least across the mammals. The main evidence
offered for this claim is that an algorithmic reconstruction
of rearrangement history, based on the current positions of
the syntenic blocks in the two species, requires almost the
same number of rearrangements (mostly inversions and
reciprocal translocations) as the number of blocks, implying that
each breakpoint region contains almost two breakpoints, on
the average (since each inversion or reciprocal translocation
involves two breakpoints). Were the two breakpoints for each
rearrangement situated at random chromosomal sites, on the
other hand, it would be rare that any two points would fall in
the same small region.
Pevzner and Tesler (2003) interpret the lack of sustained
humanmouse similarity in the breakpoint regions as
suggestive of frequent rearrangement affecti (...truncated)