SeedVicious: Analysis of microRNA target and near-target sites
SeedVicious: Analysis of microRNA target and near-target sites
Antonio Marco 0 1
0 School of Biological Sciences, University of Essex , Colchester , United Kingdom
1 Editor: Thomas Preiss, John Curtin School of Medical Research , AUSTRALIA
Here I describe seedVicious, a versatile microRNA target site prediction software that can be easily fitted into annotation pipelines and run over custom datasets. SeedVicious finds microRNA canonical sites plus other, less efficient, target sites. Among other novel features, seedVicious can compute evolutionary gains/losses of target sites using maximum parsimony, and also detect near-target sites, which have one nucleotide different from a canonical site. Near-target sites are important to study population variation in microRNA regulation. Some analyses suggest that near-target sites may also be functional sites, although there is no conclusive evidence for that, and they may actually be target alleles segregating in a population. SeedVicious does not aim to outperform but to complement existing microRNA prediction tools. For instance, the precision of TargetScan is almost doubled (from 11% to ~20%) when we filter predictions by the distance between target sites using this program. Interestingly, two adjacent canonical target sites are more likely to be present in bona fide target transcripts than pairs of target sites at slightly longer distances. The software is written in Perl and runs on 64-bit Unix computers (Linux and MacOS X). Users with no computing experience can also run the program in a dedicated web-server by uploading custom data, or browse pre-computed predictions. SeedVicious and its associated web-server and database (SeedBank) are distributed under the GPL/GNU license.
Data Availability Statement: Software and
webbased server are available at http://seedvicious.
essex.ac.uk and at FigShare https://figshare.com/
articles/seedVicious/6106073/1. All code and
resources provided are Open Source and released
under the Creative Commons BY license. A web
version is available, which allows users to run the
program using our server. Precomputed targets for
selected species can also be browsed from our
SeedBank database. The web server and the
database are accessed via http://seedvicious.essex.
Animal microRNAs target gene transcripts by partial sequence complementarity [
are multiple microRNA target prediction tools that use different strategies. Most prediction
programs look at the existence of seed sequences, six-nucleotide-long sequences in the
transcripts that are complementary to nucleotides 2 to 7 in the microRNA [
]. Depending on
additional nucleotide pairings these sites can be canonical or marginal. Many programs exploit
additional features, mainly evolutionary conservation and RNA folding features [
Although these programs have an increased accuracy, they lose power as many real targets
remain undetected. Often, microRNA target predictions are available only on selected datasets.
Additionally, stand-alone programs are not always ready for use on custom data, or they are
not available as web forms where users can analyse their own data without the need of
specialised computing skills. Here I describe a new microRNA target prediction software that does
Funding: This work was supported by a grant from
the Wellcome Trust to Antonio Marco [grant
number 200585/Z/16/Z]; funders website: https://
wellcome.ac.uk/. The funder had no role in study
design, data collection and analysis, decision to
publish, or preparation of the manuscript.
Competing interests: The authors have declared
that no competing interests exist.
not aim to replace but to complement the existing tool-kit, and allow the high-throughput
analysis of custom microRNA/transcript data as well as the exploration of additional features
not covered by other programs.
The first population genetics studies on microRNA targets sites already described selective
pressures on seed sequences [
]. In a recent study I showed that, in Drosophila populations,
there is selection against microRNA target sites . To study selection on non-target sites that
may become targets, I defined `one-mutant neighbors' as six-nucleotide-long sequences that
have one nucleotide different to a seed sequence. Here I define a broader term: near-target
sites (Fig 1), which have one nucleotide different to a putative microRNA target site, and are
not targets themselves. Detection of near-target sites are important for population genetics
and evolutionary studies, and will allow us to explore the selective pressures on gene
transcripts. Additionally, reference genome sequences, in which microRNA target sites are usually
predicted, do not capture the true diversity of a species and may miss bona fide target sites.
Scanning for near-target sites may reveal targets that would be otherwise ignored. SeedVicious
allows the exploration of near-target sites.
Methods and implementation
SeedVicious initially scans custom sequences to detect canonical microRNA target sites as
described in [
]. Optionally, marginal sites can be reported (Fig 1). This strategy produces a
large number of false positives, and other prediction tools use evolutionary conservation to
narrow down the number of potential target sites [
]. However, evolutionary conservation is
not necessarily associated with functionality of the target site . SeedVicious allows an
alternative (and complementary) way of finding putative target sites. The complete list of canonical
(and marginal) sites can be ranked according to selected features. First, seedVicious computes
the free energy of the microRNA:transcript duplex using the RNAeval program in the Vienna
]. The lower the energy, the more stable is the RNA:RNA pair, which can be
interpreted as a more efficient target site. Second, seedVicious can also report the minimum
distance between pairs of target sites for the same microRNA (the position of a target, as reported
by seedVicious, is the position of the nucleotide in the transcript that is paired with the 5' first
nucleotide of the microRNA).
Fig 1. MicroRNA target and near-target sites. Canonical and marginal sites are described in Bartel, 2009 [
Neartarget sites can be defined for all type of sites. The figure also shows that each seed (sixmer) 18 possible near-target sites
as defined by Marco, 2015 [
2 / 9
SeedVicious also permits the inference of gains and losses of microRNA target sites by first
predicting individual target sites at all transcripts in a given alignment, and then fitting a
maximum parsimony (MP) model to a tree provided by the user, following Dollo's criteria [
The MP reconstruction of ancestral states is computed with the 'dollop' program from the
Phylip package [
]. We first used this method to study the evolution of post-transcriptional
regulation in a gene family in Drosophila [
], and as far as I'm aware, this is the only
microRNA target prediction program that currently implements this type of analysis. A number of
additional analyses can be performed with seedVicious. It can compare different 3'UTRs and
detect either common microRNAs targeting pairs of transcripts or common target sites in an
alignment. A major feature of seedVicious is the detection of near-target sites which are
detected for both canonical and marginal target sites.
The main program can be run from the command line using Perl 5 or above, and the
required modules and external binary files (compiled in a 64-bit Unix computer) are included
in the distributed version. Input sequence files should be in FASTA format, and tree files in
NEWICK format. Input files can be compressed in Gzip format. A fully referenced User Guide
is available form the package or as Supplementary Information to this paper, and it includes
For the analysis presented below, all microRNAs were retrieved from miRBase release 21
], and all transcript sequences from ENSEMBL Genes 89 via Biomart [
targets were retrieved from targetscan.org, release 7.1 [
]. Predictions were compared to the
experimentally validated targets in miRTarBase 6.0 [
], for both weak and strong interactions.
Precision was calculated as the ratio TP/(TP+FP), where TP (True Positives) was the number
of predicted targets that were experimentally validated, and FP (False Positives) was the
number of predicted targets not validated in miRTarBase. To map near-target sites to polymorphic
nucleotides the genomic coordinates of the 3'UTR of a TP53 transcript (NM_000546.5) were
retrieved from UCSC Table Browser for the human genome assembly hg38 [
]. Using this
information, the coordinates of the predicted near-target sites (see Results for a detailed
description) were then converted into genomic coordinates. UCSC Table Browser was also
used to identify all SNPs mapped to the 3'UTR of the transcript. Following this protocol,
polymorphic nucleotides can be easily mapped to microRNA near-target sites.
The program is available for download at http://seedvicious.essex.ac.uk/download.html and
at FigShare https://figshare.com/articles/seedVicious/6106073/1. A web version is available,
which allows users to run the program using our server. Precomputed targets for selected
species can also be browsed from our SeedBank database. The web server and the database are
accessed via http://seedvicious.essex.ac.uk
Results and discussion
I describe in this section a few examples that illustrate different ways of analyzing target sites
with seedVicious. First, I computed with seedVicious the minimum distance between identical
canonical target sites for all human microRNAs in 3'UTRs (see Methods and Implementation).
This distance can be used as a filtering criteria for potential targets, based on a recent work that
suggests that neighboring sites for the same microRNA cooperate during lateral diffusion of the
Ago-miRNA complex during target recognition (see [
] and references therein). The
average minimum distance between pairs of target sites was smaller for strong (751.8) and weak
(773.5) validated interactions than in all set of predicted interactions (888.1). The differences
are visible when plotting the cumulative distribution of minimum distances (Fig 2A). The
observed differences were statistically significant when performing a one-tailed
Smirnov test between validated and predicted targets (strong versus predicted: p << 0.0001;
3 / 9
Fig 2. Multiple uses of seedVicious. A) Cumulative distribution of the number of transcripts with at least two target sites with respect to the minimum distance
between pairs of targets. The graph compares the distribution of all transcript (black), transcript/microRNA pairs with experimentally validated interactions (blue) and
those with strongly validated interactions (red) according to miRTarBase. B) Precision of microRNA target prediction for different cut-offs of minimum distance
between pairs of targets. The peak corresponds to a distance of 6, that is, two contiguous target sites. C) Precision of microRNA target prediction for multiple targets
sites for the same microRNA. D) Evolutionary turnover of microRNA canonical target sites for let-7 in lin-14 in roundworm species using maximum parsimony.
weak versus predicted p << 0.0001). Minimum distances were also significantly smaller in
strong interactions than in weak interactions (p = 0.00045). This indicates that the closest two
target sites are in a transcript, the more likely that the microRNA will regulate the targeted
transcript. As a matter of fact, two adjacent target sites (6 nt distance) significantly increases the
precision when detecting potential target interactions (Fig 2B).
4 / 9
Fig 3. Distance between canonical target sites. Distance between canonical target sites as defined by seedVicious (A), and the
partial overlap of targets that are 6 nucleotides away (B).
A distance of 6 implies that the seed sequence of two canonical target sites are adjacent, so
that positions 7 and 8 of the canonical left target overlaps with the positions 1 and 2 of the
right target (Fig 3). Among the microRNAs with the highest precision values at 6 nucleotides
of distance we find four whose seed sequence is a dinucleotide repetitive motif. For instance,
the seed sequence of miR-8485 is ACACACA so it will have multiple (and adjacent) target
sites in UG-rich UTRs. Despite its low complexity, there are 371 genes predicted to be targeted
by this microRNA, of which 234 interactions are also found in miRTarBase (mostly from
high-throughput PAR-CLIP and HITS-CLIP experiments). That yields a markedly high
precision value (63%) for this particular microRNA. Finding adjacent target sites for microRNAs
with repeated dinucleotides is not surprising. However, the fact that these interactions are
often found in miRTarBase is striking. An interesting possibility is that microRNA/RISC
complexes bounce forth and back between clustered target sites in a one-dimensional diffusion
5 / 9
process (see [
]), which is particularly visible for some microRNAs targeting repeated
regions. In other words, multiple target sites near to each other does not imply that several
microRNAs are targeting the transcript at the same time, but that a single microRNA/RISC
complex may be diffusing between sites.
By using a compilation of experimentally validated microRNA/transcript interactions (see
Methods and Implementation) I estimated the precision of target prediction based on the
minimum distance between canonical sites. If we predict as microRNA targets transcripts with at
least 2 canonical sites, the precision of the prediction is 0.041. This value significantly increases
(to 0.087) if we only consider pairs of canonical sites next to each other (distance of 6
nucleotides). Precision values in microRNA target prediction are generally low, in particular when
one uses a small dataset of validated targets (as in this case), but it is useful to compare the
predictive power of different strategies. Using this same dataset, the precision of TargetScan
(using conserved targets only) is 0.11. Strikingly, if we combine TargetScan and pairs of targets
at 10 or less nucleotides in distance, the precision goes up to 0.195. In conclusion, combining
some of the features of seedVicious to other programs can be use to increase the precision in
microRNA target prediction.
Pairs of target sites can also be for different microRNAs. Indeed, seedVicious also allows
the exploration of clustered target sites for multiple microRNAs. In Fig 4 I plot the proportion
of pairs of targets at a given distance for all pairs of microRNAs in the human genome. For
distances of up to about 20 nucleotides, pairs of target sites are more likely to be for the same
microRNA than to different microRNAs. However, this pattern is probably a consequence of
sequence bias composition: if a segment of a 3'UTR is rich in a given motif, it is more likely
that microRNAs with the same seed sequence (same family) will target at nearby places. For
instance, RNA-binding proteins (RBPs) often bind to clustered motifs (see [
] and references
Fig 4. Pairs of target sites for the same and different microRNAs. Proportion of pairs of target sites (y-axis) at a
given distance (x-axis) for the same microRNAs (solid black line) and for two different microRNAs (dashed gray line).
6 / 9
In previous studies, the occurrence of two or more seed sequences in transcripts has been
associated to a higher chance of predicting a bona fide targeting interaction [
]. Here I
compared the proportion of predicted targets that are validated by experiments (precision) and the
number of canonical target sites. The graph in Fig 2C shows that, indeed, as the number of
target sites increase, the proportion of validated targets also increases, although there is a global
maximum around 50. It will be interesting to explore whether distinct microRNAs have a
different optimum number of target sites for an efficient repression of the target transcript.
The analysis of targets and near-targets is of particular interest to analyse populations with
segregating alleles in target sites [
]. For instance, it has been shown that amplifications of
the microRNAs mir-17, mir-18 and mir-19 from the mir-17~mir-92 cluster lead to tumor
development (reviewed in ). That is, this cluster is an oncogenic loci. I scanned the 3'UTR
sequence of TP53 with seedVicious to find near-target sites of the oncogenic microRNAs of the
cluster and I found 14 sites. Comparing these sites with polymorphic nucleotides (see Methods)
I detected one specific target site for miR-18a-5p (dbSNP accession number rs17881366P) that
it is a non-target in the reference genome, but a small yet significant fraction of the human
population (0.6%) has a target allele. In populations of African ancestry the target allele is detected
in over 2% of the individuals sequenced. Whether a target site for a oncogenic microRNA in a
tumour-suppresor gene segregation in African populations have a significance from the
epidemiological point of view cannot be inferred, and this result remains largely speculative.
However, it demonstrates that this methodology can be adapted to explore the impact of microRNA
target sites segregating in human populations. Indeed, current work in our lab reveals that
selective pressures against this type of substitutions in cancer-associated genes is prevalent (Hatlen
and Marco, unpublished results).
The transcript from the gene lin-14 in Caenorhabditis elegans has seven targets sites for the
microRNA lin-4 [
]. However, other prediction programs detect only three (TargetScan) or
none, probably due to the stringent criteria. Using seedVicious I scanned the lin-14 3' UTR
] for canonical target and canonical near-target sites. Three canonical sites were detected,
the same that are reported in TargetScan. Additionally, five near-target sites were reported,
four of them corresponding to the other four sites originally described. One of the near-target
sites was not previously described and may be a good candidate to further explore. The
detailed analysis as well as the input files are available with the package and fully described in
the User Guide (S1 File). This example illustrates the potential of studying near-target sites,
not only in evolutionary studies, but in the detection of potential functional targets. However,
these near-target sites may actually be target sites segregating in a population. The exploration
of segregating microRNA target sites in populations is a promising area of research. Further
examples are described in the software manual.
The canonical target sites for lin-4-5p in lin-14 are highly conserved. Actually, in five
species of worm studies all sites remain the same. However, let-7-5p, which also targets lin-14, has
a more dynamic evolution. Using seedVicious I predicted the canonical target sites in five
worm species and to reconstruct the gains and losses using maximum parsimony. Results are
shown in Fig 2D, where we observe that there have been 4 target sites gained among the
studied species whilst Caenorhabditis briggsae lost its three ancestral target sites.
SeedVicious is a valuable addition to the toolkit of microRNA biologists. It works on custom
datasets and can be easily fitted into annotation pipelines. It provides a range of analysis that
can be combined with other programs to generate robust microRNA target predictions. The
analysis of near-target sites is also useful to study population dynamics at target sites, and the
7 / 9
inbuilt maximum parsimony functionality is valuable to study the evolution of
microRNAmediated regulation at a large scale.
cols and recipes.
S1 File. User guide and installation instructions for seedVicious software. It includes
protoThis software would not have been made publicly available without the encouragement and
support of my colleagues, in particular Mohab Helmy and Sam Griffiths-Jones. I am also
grateful to M. Helmy and Andrea Hatlen for discussion, to Roman Cheplyaka for feedback on the
program, to Stuart Newman for his invaluable help setting up the web-server, and to the
various anonymous reviewers (some more anonymous than others) for helpful comments and
suggestions. This work was supported by the Wellcome Trust [grant number 200585/Z/16/Z].
Conceptualization: Antonio Marco.
Data curation: Antonio Marco.
Formal analysis: Antonio Marco.
Funding acquisition: Antonio Marco.
Investigation: Antonio Marco.
Methodology: Antonio Marco.
Project administration: Antonio Marco.
Resources: Antonio Marco.
Software: Antonio Marco.
Supervision: Antonio Marco.
Validation: Antonio Marco.
Visualization: Antonio Marco.
Writing ± original draft: Antonio Marco.
Writing ± review & editing: Antonio Marco.
8 / 9
1. Bartel DP . MicroRNAs: target recognition and regulatory functions . Cell . 2009 ; 136 : 215 ± 233 . https:// doi.org/10.1016/j.cell. 2009 . 01 .002 PMID: 19167326
2. Agarwal V , Bell GW , Nam J-W , Bartel DP. Predicting effective microRNA target sites in mammalian mRNAs . eLife. 2015 ; 4 . https://doi.org/10.7554/eLife.05005 PMID: 26267216
3. Maragkakis M , Alexiou P , Papadopoulos G , Reczko M , Dalamagas T , Giannopoulos G , et al. Accurate microRNA target prediction correlates with protein repression levels . BMC Bioinformatics . 2009 ; 10 : 295 . https://doi.org/10.1186/ 1471 -2105-10-295 PMID: 19765283
4. Chen K , Rajewsky N. Natural selection on human microRNA binding sites inferred from SNP data . Nat Genet . 2006 ; 38 : 1452 ± 1456 . https://doi.org/10.1038/ng1910 PMID: 17072316
5. Saunders MA , Liang H , Li W-H . Human polymorphism at microRNAs and microRNA target sites . Proc Natl Acad Sci U S A . 2007 ; 104 : 3300 ± 3305 . https://doi.org/10.1073/pnas.0611347104 PMID: 17360642
6. Marco A . Selection Against Maternal microRNA Target Sites in Maternal Transcripts . G3 GenesGenomesGenetics . 2015 ; 5 : 2199 ± 2207 . https://doi.org/10.1534/g3. 115 .019497 PMID: 26306531
7. Paraskevopoulou MD , Georgakilas G , Kostoulas N , Vlachos IS , Vergoulis T , Reczko M , et al. DIANAmicroT web server v5 . 0: service integration into miRNA functional analysis workflows . Nucleic Acids Res . 2013 ; 41 : W169±W173 . https://doi.org/10.1093/nar/gkt393 PMID: 23680784
8. PinzoÂn N , Li B , Martinez L , Sergeeva A , Presumey J , Apparailly F , et al. The number of biologically relevant microRNA targets has been largely overestimated . Genome Res . 2016 ;
9. Hofacker IL . RNA secondary structure analysis using the Vienna RNA package . Curr Protoc Bioinforma Ed Board Andreas Baxevanis Al. 2009;Chapter 12 : Unit12 .2. https://doi.org/10.1002/0471250953. bi1202s26 PMID: 19496057
10. Felsenstein J. Inferring Phylogenies . Sinauer Associates; 2003 .
11. Felsenstein J. PHYLIP ( Phylogeny Inference Package) version 3.6. Distributed by the author . Department of Genome Sciences, University of Washington, Seattle.; 2005 .
12. Clifton BD , Librado P , Yeh S-D , Solares ES , Real DA , Jayasekera SU , et al. Rapid Functional and Sequence Differentiation of a Tandemly Repeated Species-Specific Multigene Family in Drosophila . Mol Biol Evol . 2017 ; 34 : 51 ± 65 . https://doi.org/10.1093/molbev/msw212 PMID: 27702774
13. Kozomara A , Griffiths-Jones S. miRBase: annotating high confidence microRNAs using deep sequencing data . Nucleic Acids Res . 2014 ; 42 : D68 ± 73 . https://doi.org/10.1093/nar/gkt1181 PMID: 24275495
14. Kinsella RJ , KaÈhaÈri A , Haider S , Zamora J , Proctor G , Spudich G , et al. Ensembl BioMarts: a hub for data retrieval across taxonomic space . Database J Biol Databases Curation . 2011 ; 2011 : bar030. https://doi.org/10.1093/database/bar030 PMID: 21785142
15. Chou C-H , Chang N-W , Shrestha S , Hsu S-D , Lin Y-L , Lee W-H , et al. miRTarBase 2016 : updates to the experimentally validated miRNA-target interactions database . Nucleic Acids Res . 2016 ; 44 : D239± D247 . https://doi.org/10.1093/nar/gkv1258 PMID: 26590260
16. Karolchik D , Hinrichs AS , Furey TS , Roskin KM , Sugnet CW , Haussler D , et al. The UCSC Table Browser data retrieval tool . Nucleic Acids Res . 2004 ; 32 : D493 ± 496 . https://doi.org/10.1093/nar/ gkh103 PMID: 14681465
17. Chandradoss SD , Schirle NT , Szczepaniak M , MacRae IJ , Joo C. A Dynamic Search Process Underlies MicroRNA Targeting. Cell . 2015 ; 162 : 96 ± 107 . https://doi.org/10.1016/j.cell. 2015 . 06 .032 PMID: 26140593
18. Denzler R , McGeary SE , Title AC , Agarwal V , Bartel DP , Stoffel M. Impact of MicroRNA Levels, TargetSite Complementarity, and Cooperativity on Competing Endogenous RNA-Regulated Gene Expression . Mol Cell . 2016 ; 64 : 565 ± 579 . https://doi.org/10.1016/j.molcel. 2016 . 09 .027 PMID: 27871486
19. Zhang C , Lee K-Y , Swanson MS , Darnell RB . Prediction of clustered RNA-binding protein motif sites in the mammalian genome . Nucleic Acids Res . 2013 ; 41 : 6793 ± 6807 . https://doi.org/10.1093/nar/gkt421 PMID: 23685613
20. Alexiou P , Maragkakis M , Papadopoulos GL , Reczko M , Hatzigeorgiou AG . Lost in translation: an assessment and perspective for computational microRNA target identification . Bioinformatics . 2009 ; 25 : 3049 ± 3055 . https://doi.org/10.1093/bioinformatics/btp565 PMID: 19789267
21. Concepcion CP , Bonetti C , Ventura A. The miR-17-92 family of microRNA clusters in development and disease . Cancer J Sudbury Mass . 2012 ; 18 : 262 ± 267 . https://doi.org/10.1097/PPO. 0b013e318258b60a PMID: 22647363
22. He L , Hannon GJ . MicroRNAs: small RNAs with a big role in gene regulation . Nat Rev Genet . 2004 ; 5 : 522 ± 531 . https://doi.org/10.1038/nrg1379 PMID: 15211354
23. Jan CH , Friedman RC , Ruby JG , Bartel DP . Formation, regulation and evolution of Caenorhabditis elegans 3[prime]UTRs . Nature . 2010 ; advance online publication . https://doi.org/10.1038/nature09616 PMID: 21085120