A DNA Structural Alphabet Distinguishes Structural Features of DNA Bound to Regulatory Proteins and in the Nucleosome Core Particle (pdf)

Article PDF cannot be displayed. You can download it here:

https://www.mdpi.com/2073-4425/8/10/278/pdf

A DNA Structural Alphabet Distinguishes Structural Features of DNA Bound to Regulatory Proteins and in the Nucleosome Core Particle

G C A T T A C G G C A T genes Article A DNA Structural Alphabet Distinguishes Structural Features of DNA Bound to Regulatory Proteins and in the Nucleosome Core Particle Bohdan Schneider 1, * ID , Paulína Božíková 1 ID , Petr Čech 2 ID , Daniel Svozil 2 ID and Jiří Černý 1 ID 1 2 * Institute of Biotechnology of the Czech Academy of Sciences, BIOCEV, Průmyslová 595, CZ-252 50 Vestec, Prague West, Czech Republic; (P.B.); (J.Č.) Laboratory of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, CZ-166 28 Prague, Czech Republic; (P.Č.); (D.S.) Correspondence: Academic Editors: Linda Bloom and Jörg Bungert Received: 3 August 2017; Accepted: 13 October 2017; Published: 18 October 2017 Abstract: We analyzed the structural behavior of DNA complexed with regulatory proteins and the nucleosome core particle (NCP). The three-dimensional structures of almost 25 thousand dinucleotide steps from more than 500 sequentially non-redundant crystal structures were classified by using DNA structural alphabet CANA (Conformational Alphabet of Nucleic Acids) and associations between ten CANA letters and sixteen dinucleotide sequences were investigated. The associations showed features discriminating between specific and non-specific binding of DNA to proteins. Important is the specific role of two DNA structural forms, A-DNA, and BII-DNA, represented by the CANA letters AAA and BB2: AAA structures are avoided in non-specific NCP complexes, where the wrapping of the DNA duplex is explained by the periodic occurrence of BB2 every 10.3 steps. In both regulatory and NCP complexes, the extent of bending of the DNA local helical axis does not influence proportional representation of the CANA alphabet letters, namely the relative incidences of AAA and BB2 remain constant in bent and straight duplexes. Keywords: DNA; DNA-protein recognition; transcription factors; regulatory proteins; histone; nucleosome core particle; molecular structure 1. Introduction DNA double helix is recognized as the icon of molecular biology for more than 60 years [1]. The ability of DNA to convey the genetic information via self-recognition by base pairing forms paradigm paralleled by its rigor only to physical laws. In contrast to the “digital” mechanism of self-recognition of complementary DNA duplexes, the mutual recognition between DNA and proteins is not driven by a simple code but by a complex combination of structure, electrostatics, and solvation, all of which are ultimately but indirectly determined by the sequences of the interacting molecules. Understanding of protein–DNA recognition is therefore beyond the limits of straightforward complementarity and requires the tools of molecular modeling used to describe analogue protein–protein or protein–small molecule interactions. Structural features of protein–DNA recognition have attracted a lot of interest [2]. It has been suggested that three-dimensional structures of both interacting biomolecules are equally important and necessary for full understanding of the protein–DNA recognition and that the nucleotide sequence in immediate contact with the protein explains only a few aspects of the recognition process. Genes 2017, 8, 278; doi:10.3390/genes8100278 www.mdpi.com/journal/genes Genes 2017, 8, 278 2 of 15 The importance of the local DNA structure was also highlighted with respect to evolution showing that substantially more DNA regions of the human genome are under selection pressure for maintaining the shape than for the exact nucleotide sequence [3]. A possible approach to comprehend the structural base of biomolecular recognition is to translate complicated three-dimensional structures into a linear code using so called structural alphabets. They simplify an ensemble of possible structures of a suitably selected biomolecular segment into a limited set of building blocks that can be symbolically represented by alphabet letters. The approach is used fairly routinely for describing and analyzing protein structures since it has been suggested [4,5]; a pentapeptide is often used as the biomolecular segment to formulate the alphabet [6]. The approach is however new in analysis of DNA structures. The first DNA structural alphabet has been formulated only recently [7,8]; its first version has been applied to the analysis of protein-DNA interactions [9]. The motivation for this work was to distinguish potentially different structural features of the specifically and non-specifically bound DNA. We examined crystal structures of DNA complexes with regulatory proteins, mostly transcription factors, and DNA in nucleosome core particle (NCP). These two groups of proteins not only exemplify different modes of interaction with DNA but they directly compete for binding to the DNA duplex in the cell. Possible differences in the way how they influence DNA structural behavior therefore bears direct biological consequences: some transcription factors can bind to nucleosomal DNA, while others can only bind nucleosome-free DNA. For instance, the minor groove width is constrained in DNA bound in NCP, precluding thus binding of general transcription factors binding to DNA sequences called TATA box to their wide-open DNA minor grooves [10]. On the other hand, DNA bound in NCP is targeted by a specific group of pioneer factors that recognize and bind the nucleosomal DNA employing mostly the major groove already structurally modified by the histone binding [11]. It has been reported that binding of p53 protein to nucleosomes leads to loss of nucleosome and transcriptional activation in vivo [12]. Direct kinetic competition between DNA binding to nucleosome-forming histones and to transcription factors has also been observed to regulate zebrafish genome activation [13]. The structural behavior of DNA in complexes with regulatory proteins and in NCP was analyzed here by using the Conformational Alphabet of Nucleic Acids, CANA, a first DNA structural alphabet developed earlier [7,8] to catalogue possible dinucleotide structures. Associations between the CANA letters and their dinucleotide sequences displayed different patterns in specifically and non-specifically bound DNA, and thus distinguished these two modes of DNA binding. 2. Methods 2.1. Selection of Structures We selected an ensemble of crystal structures that contained 141 protein–DNA complexes of the nucleosome core particle and 942 DNA complexes with proteins classified as regulatory by querying the Nucleic Acid Database (NDB, [14]) release of 2017-03-01 for structures of resolution 3.0 Å or better. The final curated sequentially non-redundant ensemble contains structures with at least one DNA strand longer than six nucleotides and peptide chains longer than 20 amino acids. The analyzed structures are identified by their Protein Data Bank (PDB) codes in the supplementary Table S1. The ensemble consists of 493 structures of DNA in complex wit (...truncated)