RAIN: RNA–protein Association and Interaction Networks
Database, 2017, 1–9
doi: 10.1093/database/baw167
Original article
Original article
RAIN: RNA–protein Association and Interaction
Networks
Alexander Junge1,2,†, Jan C. Refsgaard3,†, Christian Garde1,4,†,
Xiaoyong Pan1,2,3, Alberto Santos3, Ferhat Alkan1,2, Christian Anthon1,2,
Christian von Mering5, Christopher T. Workman1,4, Lars Juhl Jensen1,3,*
and Jan Gorodkin1,2,*
1
Center for Non-coding RNA in Technology and Health, University of Copenhagen, Copenhagen, ,
Groennegaardsvej 3, DK-1870 Frederiksberg C, Denmark, 2Department of Veterinary Clinical and
Animal Sciences, University of Copenhagen, Groennegaardsvej 3, DK-1870 Frederiksberg C, Denmark,
3
Disease Systems Biology Program, Novo Nordisk Foundation Center for Protein Research, University
of Copenhagen, Building: 06-2-26, Blegdamsvej 3B, DK-2200 Copenhagen N, Denmark, 4Center for
Biological Sequence Analysis, Technical University of Denmark, Kemitorvet, Building 208, DK-2800
Lyngby, Denmark, 5Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics,
University of Zurich, Winterthurerstrasse 190, CH-8057 Zurich, Switzerland
*Corresponding author: Tel: þ45 3533 3578; Fax: þ45 3533 3042
Correspondence may also be addressed to Jan Gorodkin. Tel: +45 35 32 50 25 Email:
Present address: Christian Garde, The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health
and Medical Sciences, University of Copenhagen, Building 6.6 Blegdamsvej 3B, 2200 Copenhagen N, Copenhagen, Denmark
†
These authors contributed equally to this work.
Citation details: Junge,A., Refsgaard,J.C., Garde,C. et al. RAIN: RNA–protein association and interaction networks.
Database (2016) Vol. 2016: article ID baw167; doi:10.1093/database/baw100
Revised 18 November 2016; Accepted 5 December 2016
Abstract
Protein association networks can be inferred from a range of resources including
experimental data, literature mining and computational predictions. These types of evidence are emerging for non-coding RNAs (ncRNAs) as well. However, integration of ncRNAs
into protein association networks is challenging due to data heterogeneity. Here, we present
a database of ncRNA–RNA and ncRNA–protein interactions and its integration with the
STRING database of protein–protein interactions. These ncRNA associations cover four organisms and have been established from curated examples, experimental data, interaction
predictions and automatic literature mining. RAIN uses an integrative scoring scheme to assign a confidence score to each interaction. We demonstrate that RAIN outperforms the
underlying microRNA-target predictions in inferring ncRNA interactions. RAIN can be operated through an easily accessible web interface and all interaction data can be downloaded.
Database URL: http://rth.dk/resources/rain
C The Author(s) 2017. Published by Oxford University Press.
V
Page 1 of 9
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/),
which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact
(page number not for citation purposes)
Page 2 of 9
Introduction
set of sources and covers four organisms: human (Homo
sapiens), mouse (Mus musculus), rat (Rattus norvegicus)
and baker’s yeast (Saccharomyces cerevisiae). RAIN scores
the reliability of each interaction using a scoring scheme
based on the comparison to a curated set of interactions. It
finally integrates ncRNA–RNA and ncRNA–protein associations with protein–protein associations contained in the
STRING database. This enables researchers to explore
complex interaction networks in the powerful, yet intuitive
interactive STRING user interface.
Materials and Methods
Sources of evidence
We established four channels of evidence to support the
interactions found in RAIN, namely, (i) curated knowledge, (ii) experimental evidence, (iii) miRNA target predictions and (iv) automated literature mining, see Figure 1.
Each of the four evidence channels is generated by integrating a number of underlying resources.
(i) Curated knowledge. This comprises 867 human molecular interactions that are well established in the scientific literature and/or listed in expert curated databases.
The interactions were collected for nine classes of
ncRNAs, namely microRNA (miRNA) (3), ribosomal
RNA (rRNA) (10), transfer RNA (tRNA) (11), signal recognition particle RNA (SRP RNA) (12), Vault RNA (13–
15), Y RNA (16–18), Telomerase RNA (19), small nucleolar RNA (snoRNA) (20) and spliceosomal RNA (U1, U2,
U4, U4atac, U6, U6atac, U11, U12) (20). For further
Figure 1. Flow chart illustrating the development of the RAIN database,
ranging from establishing scoring schemes for the individual sources of
evidence, through integration of resources to evidence channels, to finally defining functional molecular networks.
The study of protein-coding genes and the accumulation of
data from expression studies and other complementary
methods have helped researchers to generate protein association networks compiled in resources such as the
STRING database (1). Using a probabilistic scoring
scheme, STRING assigns a score to each physical interaction and functional association (henceforth referred to as
interactions). The recent version 10 holds interactions for
>2000 organisms.
However, interaction networks containing only proteins
and their interactions remain incomplete until other important molecular interactions have been included. For this
reason, we have focused on complementing protein interaction networks with non-coding RNAs (ncRNAs)—a
large class of genes comprising 16 000 long and 10 000
short ncRNAs in human [GENCODE version 24 (2)].
Integration of these interactions allows for an analysis of
the complex functional interplay of ncRNA–RNA and
ncRNA–protein interactions. Data on such interactions,
complemented by co-expression and literature mining, are
currently emerging (3–5). This led to the generation of
databases storing ncRNA interactions such as miRTarBase
(6) and TarBase (7) containing microRNA (miRNA)–target interactions. NPInter (5), RAID (8) and StarBase (9)
are examples of databases collecting interactions between
ncRNAs and proteins.
The analysis of ncRNA interactions is challenged by
issues related to data heterogeneity, such as varying quality
as well as the usage of different identifiers and interaction
scoring schemes. The STRING database, used by thousands of researchers daily, has addressed these challenges
for proteins through the use of unified identifiers and calibrated scoring schemes (1). A resource similar to STRING
is not available for ncRNAs and their interactions.
Similar to protein interactions, ncRNA interactions are
supported by diverse sources of evidence such as expert
curation, experiments, text mining and predictions. In
order to compare these so (...truncated)