RNAmotifs: prediction of multivalent RNA motifs that control alternative splicing
Cereda et al. Genome Biology
RNAmotifs: prediction of multivalent RNA motifs that control alternative splicing
Matteo Cereda 0 1
Uberto Pozzoli 1
Gregor Rot 0
Peter Juvan 0
Anthony Schweitzer
Tyson Clark
Jernej Ule 0
0 MRC Laboratory of Molecular Biology , Hills Road, Cambridge CB2 0QH , UK
1 Scientific Institute IRCCS E. Medea , Via don L. Monza 20, 23842 Bosisio Parini, (LC) , Italy
RNA-binding proteins (RBPs) regulate splicing according to position-dependent principles, which can be exploited for analysis of regulatory motifs. Here we present RNAmotifs, a method that evaluates the sequence around differentially regulated alternative exons to identify clusters of short and degenerate sequences, referred to as multivalent RNA motifs. We show that diverse RBPs share basic positional principles, but differ in their propensity to enhance or repress exon inclusion. We assess exons differentially spliced between brain and heart, identifying known and new regulatory motifs, and predict the expression pattern of RBPs that bind these motifs. RNAmotifs is available at https://bitbucket.org/rogrro/rna_motifs.
-
Background
The majority of human genes produce multiple mRNA
isoforms via the process of alternative splicing [1].
Alternative splicing is regulated mainly by RNA-binding
proteins (RBPs), which often act according to positional
principles defined by an RNA splicing map to enhance
or repress exon inclusion [2,3]. These RBPs play key
roles in development and evolution, and mutations
perturbing protein-RNA interactions can lead to a variety of
diseases [4,5]. Therefore, to infer the splicing regulatory
programs and identify new disease-causing mutations,
algorithms are required that can assess the genomic
sequence at the differentially regulated exons to predict
the RNA motifs bound by these RBPs.
Great progress has been made over the past decade in
inferring the programs of splicing regulation [1]. However,
it is not yet clear which positional principles of splicing
regulation are shared between different RBPs. The sites of
protein-RNA interactions have been defined by different
crosslinking and immunoprecipitation (CLIP) methods
(HITS-CLIP, PAR-CLIP or iCLIP), but the differences
between these methods preclude precise comparisons
between the RNA maps that were derived for the different
RBPs [3]. Moreover, crosslinking-based methods are
affected by mild sequence biases [6]; thus, it is important
to develop methods that can derive the regulatory motifs
independently of the CLIP data. Therefore, a new
computational method is required to derive RNA maps solely
from the analysis of gene expression data.
Past studies that predicted splicing regulatory motifs from
analysis of the differentially regulated exons searched for
continuous motifs, which most often identified UGCAUG
as the most frequent motif [7-15]. This sequence is
recognized by RNA binding protein, fox-1 homologs 1 and 2
(RBFOX1 and RBFOX2), splicing regulators that recognize
three nucleotides via the canonical RNA binding surface
and an additional four nucleotides via the loops of a
quasiRRM (qRRM) domain [16]. However, RBFOX proteins are
exceptional in their ability to recognize a long continuous
motif, and most other splicing regulators recognize motifs
that are only three or four nucleotides long [17,18].
Studies of neuro-oncological ventral antigen 1 and 2
(NOVA1 and NOVA2), here collectively referred to as
NOVA proteins, demonstrated that three or more short
RNA motifs that are clustered closely together on the
pre-mRNA are required for NOVA proteins to mediate
splicing regulation [2]. Here we will refer to these motifs
as 'multivalent RNA motifs', since they enable RBPs to
achieve high-affinity binding by cooperative interactions
between multiple RNA-binding domains and the clustered
short RNA motifs [17,18]. Past computational methods
for analysis of multivalent RNA motifs have focused on
the known RNA motifs [19], or have predicted motifs
based on the CLIP studies of protein-RNA interactions
[17,18]. However, a method for de novo identification of
multivalent RNA motifs by analysis of the regulated exons
is not yet available.
Here, we present RNAmotifs, a method that identifies
clusters of short non-degenerate (ND) or degenerate (DG)
tetramers that are enriched at specific positions around
the enhanced and silenced exons. The method correctly
identified the multivalent RNA motifs bound by NOVA,
PTBP1, heterogeneous nuclear ribonucleoprotein C
(hnRNP C), TARDBP, and TIA1 and TIAL1 cytotoxic
granule-associated RNA binding proteins (here
collectively referred to as TIA proteins). Moreover, RNAmotifs
determines the RNA splicing map, which enabled us to
compare the positional principles of different RBPs.
Finally, we analyzed the exons that are differentially spliced
between brain and heart, identifying new candidate motifs
responsible for tissue-specific splicing regulation. Notably,
we demonstrate that the positional enrichment
informatio (...truncated)