Automatic workflow for the classification of local DNA conformations (pdf)

Article PDF cannot be displayed. You can download it here:

http://www.biomedcentral.com/content/pdf/1471-2105-14-205.pdf

Automatic workflow for the classification of local DNA conformations

Petr ech 0 3 4 Jaromr Kukal 2 3 Ji ern 1 5 Bohdan Schneider 1 5 Daniel Svozil 0 4 0 Laboratory of Informatics and Chemistry, ICT Prague , Technicka 5, Prague 6 166 28, Czech republic 1 Institute of Biotechnology AS CR, v. v. i. , Videnska 1083, Prague 4 142 00, Czech republic 2 Faculty of Nuclear Sciences and Physical Engineering, CTU Prague , Trojanova 13, Prague 2 122 00, Czech republic 3 Department of Computing and Control Engineering, ICT Prague , Technicka 5, Prague 6 166 28, Czech republic 4 Laboratory of Informatics and Chemistry, ICT Prague , Technicka 5, Prague 6 166 28, Czech republic 5 Institute of Biotechnology AS CR, v. v. i. , Videnska 1083, Prague 4 142 00, Czech republic Background: A growing number of crystal and NMR structures reveals a considerable structural polymorphism of DNA architecture going well beyond the usual image of a double helical molecule. DNA is highly variable with dinucleotide steps exhibiting a substantial flexibility in a sequence-dependent manner. An analysis of the conformational space of the DNA backbone and the enhancement of our understanding of the conformational dependencies in DNA are therefore important for full comprehension of DNA structural polymorphism. Results: A detailed classification of local DNA conformations based on the technique of Fourier averaging was published in our previous work. However, this procedure requires a considerable amount of manual work. To overcome this limitation we developed an automatic classification method consisting of the combination of supervised and unsupervised approaches. A proposed workflow is composed of k-NN method followed by a nonhierarchical single-pass clustering algorithm. We applied this workflow to analyze 816 X-ray and 664 NMR DNA structures released till February 2013. We identified and annotated six new conformers, and we assigned four of these conformers to two structurally important DNA families: guanine quadruplexes and Holliday (four-way) junctions. We also compared populations of the assigned conformers in the dataset of X-ray and NMR structures. Conclusions: In the present work we developed a machine learning workflow for the automatic classification of dinucleotide conformations. Dinucleotides with unassigned conformations can be either classified into one of already known 24 classes or they can be flagged as unclassifiable. The proposed machine learning workflow permits identification of new classes among so far unclassifiable data, and we identified and annotated six new conformations in the X-ray structures released since our previous analysis. The results illustrate the utility of machine learning approaches in the classification of local DNA conformations. - Background The antiparallel double helical structure of DNA and its self-recognition form the basis for the conservation and the transfer of genetic information. The model of the canonicalB-DNA form proposed by Watson and Crick [1] has later been enriched by detailed structural data from single-crystal structures of the biologically prevailing B-form [2] and of its kin right-handed A-form [3,4]. In addition, the first DNA single crystal [5] revealed atomic details of a third major form of a DNA double helix, left-handed Z-DNA. The atomic resolution structures of B-DNA duplexes [6] revealed the existence of sequence-dependent structural deviations which provide the required specificity for DNA recognition by proteins and drugs [7]. The association of DNA with proteins is known to induce a local deformation of the B-form toward the A-form [8-13] in various protein-DNA complexes such as, e.g. high mobility group (HMG) proteins [14], trp repressor/operator complex [15], TATA box binding protein [16-18], HIV-1 reverse transcriptase [19], various DNA polymerases [20-23], zinc finger protein [24], hyperthermophile Sac7d protein [25], and EcoRV endonuclease [26-28]. Along the transition pathway between the B- and A-forms [29] various intermediate B-to-A conformations were identified [9,30-32]. The importance of conformational substates of the DNA backbone for protein binding to the minor groove was suggested by several analyses [13,33,34]. Besides the A-, B- and Z-forms, DNA can also adopt other biologically relevant structures, such as single-stranded hairpins [35], triple helices [36], three- and four-way junctions [37,38], four-stranded G-quadruplexes [39] or parallel helices [40]. Their existence indicates that DNA structure is much more polymorphic than it might be deduced from the misleading simplicity of the canonical B-DNA duplex. The base morphology in a DNA double helix is commonly described [12,41-46] by parameters giving mutual position between bases in a base-pair (e.g., propeller twist or stagger) and in a base-step (e.g. rise or twist) [47]. The same parameters can also be used for other unusual DNA structures such as triple helices [48-50], G-quadruplexes [51] or three- and four-way junctions [52,53]. In addition, for the last two groups of structures additional specific parameters such as the G-quartet planarity [54] or the angle between the junction arms [55] were also defined. Another set of quantitative measures that can be used to characterize secondary structure of DNA are backbone torsional angles , , , , , together with the glycosidic torsion [56]. Though the relationship between the phosphodiester backbone states and local distortions of DNA double helix was described in the '80 and '90s [57,58], the backbone was regarded as a passive link holding bases at their positions in several early analyses [7,59,60]. However, nowadays it is clear that the backbone must be considered as an active dynamic element while defining the conformational properties of double-helical DNA [34,61-69]. The main role of the backbone is in restricting the conformational space available for the placement of bases, and in steric coupling of the adjacent base steps [61]. An overall conformational flexibility of DNA thus results from the interplay between the optimal base positions and the preferred conformations of the sugar-phosphate backbone. An increasing number and quality of DNA structures led to several detailed analyses of the conformational space of the DNA backbone, most of these studies have been based on crystal structures [32,70-73] but structures determined by various solution-based techniques of NMR spectroscopy have also contributed significantly to our understanding of biology of nucleic acids [74-76]. NMR methods were successfully applied to study a dynamics of DNA phosphodiester backbone in solution [77-82], NMR studies also provide evidence for the BII states in solution and help to unravel a role of the phosphorus atom in a BIBII transition [68,83-87]. To uncover a potential role of the sugar-phosphate backbone in the DNA structural polymorphism we have analyzed a set of carefully selected double-helical structures of naked and protein bound DNA resolved at high resolution (1.9 ) [32] (...truncated)