DSSR: an integrated software tool for dissecting the spatial structure of RNA
Nucleic Acids Research
DSSR: an integrated software tool for dissecting the spatial structure of RNA
Xiang-Jun Lu 2
Harmen J. Bussemaker 1 2
Wilma K. Olson 0
0 Department of Chemistry and Chemical Biology, Rutgers - The State University of New Jersey , Piscataway, NJ 08854 , USA
1 Department of Systems Biology, Columbia University , New York, NY 10032 , USA
2 Department of Biological Sciences, Columbia University , New York, NY 10027 , USA
Insight into the three-dimensional architecture of RNA is essential for understanding its cellular functions. However, even the classic transfer RNA structure contains features that are overlooked by existing bioinformatics tools. Here we present DSSR (Dissecting the Spatial Structure of RNA), an integrated and automated tool for analyzing and annotating RNA tertiary structures. The software identifies canonical and noncanonical base pairs, including those with modified nucleotides, in any tautomeric or protonation state. DSSR detects higher-order coplanar base associations, termed multiplets. It finds arrays of stacked pairs, classifies them by base-pair identity and backbone connectivity, and distinguishes a stem of covalently connected canonical pairs from a helix of stacked pairs of arbitrary type/linkage. DSSR identifies coaxial stacking of multiple stems within a single helix and lists isolated canonical pairs that lie outside of a stem. The program characterizes 'closed' loops of various types (hairpin, bulge, internal, and junction loops) and pseudoknots of arbitrary complexity. Notably, DSSR employs isolated pairs and the ends of stems, whether pseudoknotted or not, to define junction loops. This new, inclusive definition provides a novel perspective on the spatial organization of RNA. Tests on all nucleic acid structures in the Protein Data Bank confirm the efficiency and robustness of the software, and applications to representative RNA molecules illustrate its unique features. DSSR and related materials are freely available at http://x3dna.org/.
INTRODUCTION
The three-dimensional (3D) folding of RNA shows
striking parallels to that of proteins. On the other hand, RNA is
distinct from proteins due to its more flexible backbone and
the wide variety of observed base-pairing motifs (
1
). In
massive assemblies such as the ribosome, RNA displays a
bewildering complexity that overwhelms our abilities to
comprehend its organization (
2
). Even small RNA molecules (such
as tRNA, riboswitches, and ribozymes) can fold into
complex tertiary structures. Deciphering the information
provided by the growing library of solved RNA structures and
relating this information to biological function constitute
two of the challenges of modern structural biology. For
example, the design of RNA-based nanostructures relies on
well-characterized small structural motifs (
3
).
Discoveries of new RNA folds and functions have
stimulated interest in the development of technologies that can
make sense of the complex spatial arrangements of these
molecules. Fundamental RNA structural features are
currently characterized by a plethora of computer programs
and databases specialized in the identification of paired
bases (
4–8
), A-form double helices (
6,7
), loops of various
types (including multi-branched junction loops) (
9–11
), and
pseudoknots (
12,13
). Use of one program often requires
the output of another. For example, pseudoknot detection
(
12,14
) requires a listing of canonical base pairs. Moreover,
some of the programs do not consider modified nucleotides
(
4,8
) and others ignore pseudoknots when finding junction
loops (
10,11
).
The analysis of RNA 3D structure presents challenges
not usually encountered in the characterization of DNA
and protein structures, including: (i) a large number of
chemically modified nucleotides; (ii) the presence of both
canonical (Watson-Crick or G–U wobble) and
noncanonical base pairs; (iii) the coaxial stacking and higher-order
hydrogen-bonded, coplanar associations (multiplets) of
base pairs; (iv) the formation of pseudoknots; (v) the
heterogeneity of loops, including junction loops; (vi) a mix of
structural motifs; and (vii) the RNA-specific interactions of
the 2 -hydroxyl group. DSSR (Dissecting the Spatial
Structure of RNA) is a computational tool that resolves all of
these issues in a single self-contained program. The
software consolidates, refines, and extends the functionality of
the 3DNA suite of programs (
6,7
) for RNA structural
analysis (15). DSSR is built upon our extensive experience in
supporting 3DNA, growing knowledge of RNA structures,
and refined programming skills.
The key features of DSSR are illustrated in Figure 1 and
include: (i) recognition of nucleotides, both standard and
modified, based on atom names and base planarity; (ii)
detection of hydrogen-bonded base pairs regardless of
tautomeric or protonation state, using embedded standard
reference frames and simple geometric criteria; (iii)
identif (...truncated)