Identification of long non-coding RNAs and RNA binding proteins in breast cancer subtypes
www.nature.com/scientificreports
OPEN
Identification of long non‑coding
RNAs and RNA binding proteins
in breast cancer subtypes
Claudia Cava1*, Alexandros Armaos2,3, Benjamin Lang2,4, Gian G. Tartaglia2,3,5 &
Isabella Castiglioni6
Breast cancer is a heterogeneous disease classified into four main subtypes with different clinical
outcomes, such as patient survival, prognosis, and relapse. Current genetic tests for the differential
diagnosis of BC subtypes showed a poor reproducibility. Therefore, an early and correct diagnosis
of molecular subtypes is one of the challenges in the clinic. In the present study, we identified
differentially expressed genes, long non-coding RNAs and RNA binding proteins for each BC
subtype from a public dataset applying bioinformatics algorithms. In addition, we investigated
their interactions and we proposed interacting biomarkers as potential signature specific for each
BC subtype. We found a network of only 2 RBPs (RBM20 and PCDH20) and 2 genes (HOXB3 and
RASSF7) for luminal A, a network of 21 RBPs and 53 genes for luminal B, a HER2-specific network
of 14 RBPs and 30 genes, and a network of 54 RBPs and 302 genes for basal BC. We validated the
signature considering their expression levels on an independent dataset evaluating their ability to
classify the different molecular subtypes with a machine learning approach. Overall, we achieved good
performances of classification with an accuracy >0.80. In addition, we found some interesting novel
prognostic biomarkers such as RASSF7 for luminal A, DCTPP1 for luminal B, DHRS11, KLC3, NAGS,
and TMEM98 for HER2, and ABHD14A and ADSSL1 for basal. The findings could provide preliminary
evidence to identify putative new prognostic biomarkers and therapeutic targets for individual breast
cancer subtypes.
Abbreviations
BC Breast cancer
lncRNAs Long non-coding RNAs
RBPs RNA binding-proteins
TCGA The Cancer genome atlas
NS Normal samples
DEGs Differentially expressed genes
Breast cancer (BC) is one of the most common cancers around the world and was estimated the most frequent
cancer among women (25% of all new cancers recorded)1. The heterogeneity of BC reduces the specificity of
biological features (e.g., histological grade and hormone receptor status) which are usually utilized for the diagnosis and prognosis of BC and to address a therapy2,3. The classification of biological BC subtypes is based on
the use of techniques such as immunohistochemistry and gene expression p
rofiling4.
In 2011 The St. Gallen International Breast Cancer Conference reported a molecular subtype approach to
guide the therapy of BC based on immunohistochemical markers: estrogen receptor (ER), progesterone receptor
(PR), and human epidermal growth factor receptor 2 (HER2)4. In addition to the detection of these standard
biomarkers, St. Gallen in 2013 included the evaluation of a marker of cell proliferation: Ki-675. Luminal A is
1
Institute of Molecular Bioimaging and Physiology, National Research Council (IBFM-CNR), Via F.Cervi 93,
20090 Segrate‑Milan, Milan, Italy. 2Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and
Technology, C/ Dr. Aiguader 88, 08003 Barcelona, Spain. 3RNA System Biology Lab, Department of Neuroscience
and Brain Technologies, Istituto Italiano Di Tecnologia (IIT), Via Morego 30, 16163 Genoa, Italy. 4Department of
Structural Biology and Center for Data Driven Discovery (C3D), St. Jude Children’s Research Hospital, Memphis,
TN 38105, USA. 5Sapienza University of Rome, Piazzale Aldo Moro 5, 00185 Rome, Italy. 6Department of Physics
“Giuseppe Occhialini”, University of Milan-Bicocca Piazza dell’Ateneo Nuovo, 1 ‑ 20126, Milan, Italy. *email:
Scientific Reports |
(2022) 12:693
| https://doi.org/10.1038/s41598-021-04664-z
1
Vol.:(0123456789)
www.nature.com/scientificreports/
defined by ER positive and/or PR positive and Ki-67 < 14%, and luminal B by ER positive and/or PR positive
and Ki-67 ≥ 14%. ER negative, PR negative and Her2 positive tumors are classified as HER2 + 6. Triple negative
BC (TNBC) are characterized by ER negative and PR negative and Her2 n
egative6.
The development of gene expression profiling with microarray demonstrated that the classification based on
gene expression profiling reflects the differences of BC subtypes at the molecular level3. The pioneer study of
Perou et al. in 2000 reported that BC could be classified into four intrinsic molecular subtypes by gene expression
profiling: luminal A, luminal B, HER2-enriched (HER2), and basal7,8. Gene expression classifi-cation defines
TNBC of immunohistochemistry with term basal BC. However, previous studies reported that there is a concordance of 80% between TNBC and basal B
C9. Unlike the TNBC subtype, basal BC is characterized by the
expression of other proteins, such as cytokeratins 5,6 and 1 710.
BC molecular subtypes can be detected by different genetic tests with a different gene signature (e.g., PAM50,
MammaPrint, and Oncotype DX). Several studies, applied to publicly available gene expression datasets, demonstrated a poor reproducibility among different genetic tests. This can be explained by the differences of gene
signature in different genetic t ests11,12. These observations forced the research towards the discovery of new
biomarkers to be used for BC subtype characterization.
Luminal A is the most common BC subtype with a higher favorable prognosis and a slower evolution13.
Luminal B subtype is characterized by an intermediate prognosis compared with luminal A and HER2 BC
and an increased expression of genes associated with growth receptor s ignaling14. HER2 BC frequently tend
to metastasize in the brain, liver and lung. In addition, the overexpression of HER2 is implicated in the cell
proliferation, blocking apoptosis and cell s preading15. Basal BC subtype has a worse prognosis compared with
other subtypes and high cell proliferation. Non-luminal tumors form metastases into distant organs more frequently than luminal tumors, but surprisingly luminal A and basal subtypes develop the regional lymph node
metastases less often16,17. The luminal A is well differentiated compared to luminal B, HER2 and basal that are
poorly differentiated17.
Previous studies reported that the evolution from normal breast cell types to BC subtypes derives from mutations or genetic rearrangements in stem cells and progenitor cells giving rise to a heterogeneous population of
cells18.
New more accurate methods are needed to increase prognostic value and to personalize the most appropriate
treatment for patients with BC and to investigate the molecular mechanisms responsible of BC subtypes differentiation. In the recent years Long Non-Coding RNAs (lncRNAs) and RNA binding-proteins (RBPs) emerged as
key regulators of post-transcriptional events, and they are dysregulated in many human solid cancers, including
BC19,20.
LncRNAs, longer than 200 nucleotides in length, belong to a large class of noncoding RNAs and are (...truncated)