Joint analysis of expression profiles from multiple cancers improves the identification of microRNA–gene interactions
Advance Access publication June
Joint analysis of expression profiles from multiple cancers improves the identification of microRNA-gene interactions
Xiaowei Chen 2
Frank J. Slack 1
Hongyu Zhao 0 2 3
Associate Editor: Ivo Hofacker
0 Department of Biostatistics, Yale School of Public Health
1 Department of Molecular, Cellular and Developmental Biology, Yale University
2 Program in Computational Biology and Bioinformatics, Yale University
3 Department of Genetics, Yale School of Medicine , New Haven, CT 06511 , USA
Motivation: MicroRNAs (miRNAs) play a crucial role in tumorigenesis and development through their effects on target genes. The characterization of miRNA-gene interactions will lead to a better understanding of cancer mechanisms. Many computational methods have been developed to infer miRNA targets with/without expression data. Because expression datasets are in general limited in size, most existing methods concatenate datasets from multiple studies to form one aggregated dataset to increase sample size and power. However, such simple aggregation analysis results in identifying miRNA-gene interactions that are mostly common across datasets, whereas specific interactions may be missed by these methods. Recent releases of The Cancer Genome Atlas data provide paired expression profiling of miRNAs and genes in multiple tumors with sufficiently large sample size. To study both common and cancer-specific interactions, it is desirable to develop a method that can jointly analyze multiple cancers to study miRNA-gene interactions without combining all the data into one single dataset. Results: We developed a novel statistical method to jointly analyze expression profiles from multiple cancers to identify miRNA-gene interactions that are both common across cancers and specific to certain cancers. The benefit of this joint analysis approach is demonstrated by both simulation studies and real data analysis of The Cancer Genome Atlas datasets. Compared with simple aggregate analysis or single sample analysis, our method can effectively use the shared information among different but related cancers to improve the identification of miRNA-gene interactions. Another useful property of our method is that it can estimate similarity among cancers through their shared miRNA-gene interactions. Availability and implementation: The program, MCMG, implemented in R is available at http://bioinformatics.med.yale.edu/group/. Contact: The Author 2013. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail:
1 INTRODUCTION
MicroRNAs (miRNAs) ( 22 nt) are important non-coding
small RNAs regulating gene expression by repressing the
*To whom correspondence should be addressed.
translation or degrading target genes through complementary
base pairing to 30 untranslated regions (30 UTRs) of genes
(Bartel, 2004). They are involved in many cancer-related
processes, such as cell growth and differentiation, through regulating
their target gene expression (Esquela-Kerscher and Slack, 2006).
Considering the importance of miRNAs in cancers and that they
regulate a large number of genes, deciphering miRNA and gene
interactions at the genome level can lead to a better
understanding of tumorigenesis and development. In recent years, many
computational approaches have been developed to predict
miRNA targets. Sequence-based prediction algorithms build on
specific binding rules, including sequence complementarity,
secondary structure, energy, conservation and site accessibility, to
predict miRNA–gene interactions. Some representative methods
include TargetScanS/TargetScan (Lewis et al., 2003, 2005),
miRanda (Enright et al., 2003) and PicTar (Krek et al., 2005).
Although these methods provide a list of potential target genes
for each miRNA, they suffer from a relatively high
false-positive rate because of the complex nature of miRNA–gene
interactions (Sethupathy et al., 2006). In addition, the predictions
are static and may not capture those interactions that are
specific to certain diseases or conditions.
To improve sequence-based prediction specificities and
identify condition-specific interactions, efforts have been made to
incorporate expression profiles to study miRNA regulatory
mechanisms. The basic principle of these methods is that
genes regulated by a miRNA should exhibit negative expression
correlations with the miRNA. These methods include those
based on simple correlation analysis (Liu et al., 2010; Van der
Auwera et al., 2010), simple/regularized regression models (Kim
et al., 2009; Lu et al., 2011; Muniategui et al., 2012a) and
Bayesian inference (Huang et al., 2007; Su et al., 2011).
Pearson correlation in the category of simple correlation
analysis is the most straightforward way to study miRNA–gene
interactions. However, the simplicity of this method usually
results in relatively high false-positive results. Lasso regression
(Lu et al., 2011; Muniategui et al., 2012a) in the category of
reg (...truncated)