DNA conformations and their sequence preferences
Daniel Svozil
1
Jan Kalina
0
Marek Omelka
0
Bohdan Schneider
1
0
Jaroslav Ha jek Center for Theoretical and Applied Statistics, Department of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, Charles University
, Sokolovska 83, CZ-186 75 Prague,
Czech Republic
1
Institute of Organic Chemistry and Biochemistry, Academy of Sciences of the Czech Republic and Center for Biomolecules and Complex Molecular Systems
, Flemingovo na m. 2, CZ-166 10 Prague
The geometry of the phosphodiester backbone was analyzed for 7739 dinucleotides from 447 selected crystal structures of naked and complexed DNA. Ten torsion angles of a near-dinucleotide unit have been studied by combining Fourier averaging and clustering. Besides the known variants of the A-, Band Z-DNA forms, we have also identified combined A + B backbone-deformed conformers, e.g. with a/c switches, and a few conformers with a syn orientation of bases occurring e.g. in G-quadruplex structures. A plethora of A- and B-like conformers show a close relationship between the A- and B-form double helices. A comparison of the populations of the conformers occurring in naked and complexed DNA has revealed a significant broadening of the DNA conformational space in the complexes, but the conformers still remain within the limits defined by the A- and B- forms. Possible sequence preferences, important for sequence-dependent recognition, have been assessed for the main A and B conformers by means of statistical goodness-of-fit tests. The structural properties of the backbone in quadruplexes, junctions and histone-core particles are discussed in further detail.
-
INTRODUCTION
The apparent simplicity of double-helical DNA, the icon
of molecular biology, is deceiving. While the architecture
of its antiparallel strands remains the same, subtle
conformational variations suffice to guarantee its
recognition by other molecules. The structural variations are
critical especially for reliable recognition between DNA
and proteins, which is the conditio sine qua non in the
essential processes of replication, transcription and DNA
chromatin compaction. Local conformational changes
induced by interactions with other molecules can either
leave the DNA structure unaltered (i.e. in the form of a
straight double helix) or introduce bends and kinks within
the double helix, as in sequence-dependent CAP/DNA
complexes (1) or in DNA coiled around histone-core
proteins (2).
The necessity of understanding DNA variability has
become more urgent as the sequence-specific protein/
DNA recognition required e.g. by transcription factors
seems less likely to follow simple and generally
applicable rules analogous to the rules governing DNA
selfrecognition by the complementarity of the WatsonCrick
(WC) paired bases (3). The idea of the general code of
recognition between amino acids and nucleotides (4) has
not been confirmed despite extensive efforts. The lack of
simple rules for general protein/DNA recognition has
been explained by the existence of too many structural
degrees of freedom at the protein/DNA interface (5), and
so far only limited rules of recognition have been
formulated within narrower groups of transcription factors
with certain binding motifs, such as zinc fingers or helix
turnhelix (69).
Ultimately, the variability and plasticity of the local
DNA structure, and thus its ability to recognize other
molecules and be recognized by them, can be attributed to
the properties of the bases and to their sequence-dependent
arrangement. Base-pair and base-step morphology (10,11)
has been widely analyzed to describe sequence-dependent
deformability as observed in the crystal structures of DNA
complexes with sequence-specific proteins (12,13) as well as
in noncomplexed DNA (14). By combining descriptors of
base morphology with constraints imposed by a simple
model of the phosphodiester backbone, slide and shift have
been suggested to describe the key sequence properties of
dinucleotide steps (15). However, the backbone does not
act as a passive link merely holding the bases at their
positions, but its inherent flexibility contributes to, and
limits, the base placement so that the local DNA structure
results from the interplay between optimal base positions
and preferred conformations of the sugar phosphate
backbone. An analysis of the conformational space
populated by the DNA backbone and the correlation
between its conformation and the DNA sequence are
therefore important for fully understanding DNA
recognition.
The structural alphabet of the DNA double-helical A-,
B- and Z-forms has been described in detail earlier (16,17).
Nevertheless, DNA is known to adopt also other forms,
such as triple (18) and quadruple helices (19), junction
(cruciform) structures (20) and parallel helices (21).
However unusual some of these DNA forms may be,
their architecture is, in full analogy to the double helical
DNA, almost completely based on the self-assembly of
two or more DNA strands and does not form complicated
folds analogous to RNA. The availability of some of these
unusual DNA structures in well-refined crystal structures
as well as the growing number and quality of more
conventional DNA crystal structures present a challenge
to undertake an analysis of the DNA conformational
space in much greater detail than it was possible a few
years ago (22).
This work presents a comprehensive analysis of the
conformational space of the DNA backbone using a
neardinucleotide building block as a model. Dinucleotide
conformations have been clustered as the local structural
property without any consideration of the classification of
the overall DNA architecture as, for instance, B- or
Atype double helix. The study has been performed on
almost 8000 dinucleotide units from 447 crystal structures
of DNA, alone or in complexes with other molecules and
has made use of a slightly modified procedure developed
earlier for an analysis of RNA conformations (23). To
assess the nature of the broadening of the DNA
conformational space upon interacting with other
molecules (mainly with proteins), the classified conformers of
naked DNA have been compared to those of complexed
DNA molecules. In addition, the structural properties of
the backbone have been discussed in selected unusual
structures like quadruplexes and histone-core particles.
Because the possible sequence preferences of various
conformers are important for the sequence-dependent
recognition they have been assessed by means of rigorous
nonparametric statistical testing within the group of naked
B- and A-form double helices.
The selection of structures used for the analysis was
limited to nucleic acid (NA) crystal structures containing
only DNA (thus excluding hybrids with RNA) present in
the Nucleic Acid Database (24) on 19 July 2005. Four
hundred and fifteen structures with crystallographic
resolution better than or equal to 1.9 A were selected;
this resolution had previously been identified as limiting
t (...truncated)