A DNA Structural Alphabet Distinguishes Structural Features of DNA Bound to Regulatory Proteins and in the Nucleosome Core Particle
G C A T
T A C G
G C A T
genes
Article
A DNA Structural Alphabet Distinguishes Structural
Features of DNA Bound to Regulatory Proteins and in
the Nucleosome Core Particle
Bohdan Schneider 1, * ID , Paulína Božíková 1 ID , Petr Čech 2 ID , Daniel Svozil 2 ID and
Jiří Černý 1 ID
1
2
*
Institute of Biotechnology of the Czech Academy of Sciences, BIOCEV, Průmyslová 595, CZ-252 50 Vestec,
Prague West, Czech Republic; (P.B.); (J.Č.)
Laboratory of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and
Technology Prague, Technická 5, CZ-166 28 Prague, Czech Republic; (P.Č.);
(D.S.)
Correspondence:
Academic Editors: Linda Bloom and Jörg Bungert
Received: 3 August 2017; Accepted: 13 October 2017; Published: 18 October 2017
Abstract: We analyzed the structural behavior of DNA complexed with regulatory proteins and the
nucleosome core particle (NCP). The three-dimensional structures of almost 25 thousand dinucleotide
steps from more than 500 sequentially non-redundant crystal structures were classified by using DNA
structural alphabet CANA (Conformational Alphabet of Nucleic Acids) and associations between
ten CANA letters and sixteen dinucleotide sequences were investigated. The associations showed
features discriminating between specific and non-specific binding of DNA to proteins. Important
is the specific role of two DNA structural forms, A-DNA, and BII-DNA, represented by the CANA
letters AAA and BB2: AAA structures are avoided in non-specific NCP complexes, where the
wrapping of the DNA duplex is explained by the periodic occurrence of BB2 every 10.3 steps. In both
regulatory and NCP complexes, the extent of bending of the DNA local helical axis does not influence
proportional representation of the CANA alphabet letters, namely the relative incidences of AAA
and BB2 remain constant in bent and straight duplexes.
Keywords: DNA; DNA-protein recognition; transcription factors; regulatory proteins; histone;
nucleosome core particle; molecular structure
1. Introduction
DNA double helix is recognized as the icon of molecular biology for more than 60 years [1].
The ability of DNA to convey the genetic information via self-recognition by base pairing forms
paradigm paralleled by its rigor only to physical laws. In contrast to the “digital” mechanism
of self-recognition of complementary DNA duplexes, the mutual recognition between DNA and
proteins is not driven by a simple code but by a complex combination of structure, electrostatics,
and solvation, all of which are ultimately but indirectly determined by the sequences of the
interacting molecules. Understanding of protein–DNA recognition is therefore beyond the limits
of straightforward complementarity and requires the tools of molecular modeling used to describe
analogue protein–protein or protein–small molecule interactions.
Structural features of protein–DNA recognition have attracted a lot of interest [2]. It has been
suggested that three-dimensional structures of both interacting biomolecules are equally important
and necessary for full understanding of the protein–DNA recognition and that the nucleotide
sequence in immediate contact with the protein explains only a few aspects of the recognition process.
Genes 2017, 8, 278; doi:10.3390/genes8100278
www.mdpi.com/journal/genes
Genes 2017, 8, 278
2 of 15
The importance of the local DNA structure was also highlighted with respect to evolution showing that
substantially more DNA regions of the human genome are under selection pressure for maintaining
the shape than for the exact nucleotide sequence [3].
A possible approach to comprehend the structural base of biomolecular recognition is to translate
complicated three-dimensional structures into a linear code using so called structural alphabets. They
simplify an ensemble of possible structures of a suitably selected biomolecular segment into a limited
set of building blocks that can be symbolically represented by alphabet letters. The approach is
used fairly routinely for describing and analyzing protein structures since it has been suggested [4,5];
a pentapeptide is often used as the biomolecular segment to formulate the alphabet [6]. The approach
is however new in analysis of DNA structures. The first DNA structural alphabet has been formulated
only recently [7,8]; its first version has been applied to the analysis of protein-DNA interactions [9].
The motivation for this work was to distinguish potentially different structural features of the
specifically and non-specifically bound DNA. We examined crystal structures of DNA complexes
with regulatory proteins, mostly transcription factors, and DNA in nucleosome core particle (NCP).
These two groups of proteins not only exemplify different modes of interaction with DNA but they
directly compete for binding to the DNA duplex in the cell. Possible differences in the way how they
influence DNA structural behavior therefore bears direct biological consequences: some transcription
factors can bind to nucleosomal DNA, while others can only bind nucleosome-free DNA. For instance,
the minor groove width is constrained in DNA bound in NCP, precluding thus binding of general
transcription factors binding to DNA sequences called TATA box to their wide-open DNA minor
grooves [10]. On the other hand, DNA bound in NCP is targeted by a specific group of pioneer factors
that recognize and bind the nucleosomal DNA employing mostly the major groove already structurally
modified by the histone binding [11]. It has been reported that binding of p53 protein to nucleosomes
leads to loss of nucleosome and transcriptional activation in vivo [12]. Direct kinetic competition
between DNA binding to nucleosome-forming histones and to transcription factors has also been
observed to regulate zebrafish genome activation [13].
The structural behavior of DNA in complexes with regulatory proteins and in NCP was analyzed
here by using the Conformational Alphabet of Nucleic Acids, CANA, a first DNA structural alphabet
developed earlier [7,8] to catalogue possible dinucleotide structures. Associations between the CANA
letters and their dinucleotide sequences displayed different patterns in specifically and non-specifically
bound DNA, and thus distinguished these two modes of DNA binding.
2. Methods
2.1. Selection of Structures
We selected an ensemble of crystal structures that contained 141 protein–DNA complexes of the
nucleosome core particle and 942 DNA complexes with proteins classified as regulatory by querying
the Nucleic Acid Database (NDB, [14]) release of 2017-03-01 for structures of resolution 3.0 Å or better.
The final curated sequentially non-redundant ensemble contains structures with at least one DNA
strand longer than six nucleotides and peptide chains longer than 20 amino acids. The analyzed
structures are identified by their Protein Data Bank (PDB) codes in the supplementary Table S1.
The ensemble consists of 493 structures of DNA in complex wit (...truncated)