Microproteins: from behind the scenes to the spotlight
Genome Instability & Disease
https://doi.org/10.1007/s42764-021-00040-3
REVIEW ARTICLE
Microproteins: from behind the scenes to the spotlight
Meiqian Jiang1 · Huiqiang Lou1 · Wenya Hou1
Received: 1 April 2021 / Revised: 26 April 2021 / Accepted: 4 May 2021
© Shenzhen University School of Medicine; Fondazione Istituto FIRC di Oncologia Molecolare 2021
Abstract
Microproteins, less than 200 amino acids, have been usually regarded as non-functional for a long time. We do not recognize this “dark matter” in the proteome until developing new bioinformatic, biochemical and proteomic techniques. The
fast-growing number of the identified microproteins and their emerging roles in various biological processes begin to attract
more and more attention. Here, we summarize the recent progress in this field mainly from four aspects: (1) brief history and
various definitions of microproteins; (2) three main strategies to identify microproteins; (3) multiple regulatory mechanisms
of microproteins; (4) biological functions of microproteins, with a particular focus on genome stability maintenance and
their relevance to human diseases. Finally, we discuss some open questions and therapeutic opportunities hiding in these
small newcomers.
Keywords Microproteins · Small open reading frames · Genome stability · DNA repair · DNA replication · Cancer
Abbreviations
aa Amino acids
alt-ORFs Alternative-ORFs
bHLH Basic helix-loop-helix
CASIMO1 Cancer-associated small integral membrane open reading frame 1
circRNA Circular RNA
cNHEJ Canonical non-homologous end joining
CYREN Cell cycle regulator of NHEJ
dNDPs Deoxynucleotide diphosphates
dNTPs Deoxynucleotide triphosphates
DTL Denticleless
DSB DNA double-strand break
DYNLL1 Dynein light chain LC8-type 1
EPHX1 Epoxide hydrolase 1
ER Endoplasmic reticulum
ERLIC Electrostatic repulsion hydrophilic
interaction chromatography
EST Expressed sequence tags
FBXW7 F-box and WD repeat domain containing 7
HR Homologous recombination
* Wenya Hou
1
Guangdong Key Laboratory for Genome Stability & Disease
Prevention, Shenzhen University School of Medicine,
Shenzhen 518060, China
Id Inhibitor of DNA binding and/or
differentiation
LINC-PINT The long intergenic non-protein-coding
RNA p53-induced transcript
MDPs Mitochondria-derived peptides
miPs MicroProteins
MRI Modulator of retroviral infection
MS Mass spectrometry
MWCO Molecular weight cut-off
ncRNA Non-coding RNA
NDPs Nucleotide diphosphates
NGS Next-generation sequencing
NORFs Nonannotated ORFs
NOPs 3’ Neo open reading frame peptides
PAF1c Polymerase-associated factor complex
PCNA Proliferating cell nuclear antigen
pRb The retinoblastoma protein
RNR Ribonucleotide reductase
RPF Ribosome protected fragment
SEP smORF-encoded polypeptide
SHPRH SNF2 histone linker PHD RING
helicase
sORFs/smORFs Small open reading frames
SR Sarcoplasmic reticulum
TARP Translating ribosome affinity
purification
TFs Transcription factors
13
Vol.:(0123456789)
Genome Instability & Disease
uORF Upstream ORF
uMKKS uORF of the McKusick-Kaufman syndrome gene
Introduction
In this review, we define “microproteins” as functional
proteins from 2 amino acids (aa) and up to 200 aa, which
are usually encoded by small open reading frames (sORFs/
smORFs) (Basrai et al., 1997; Magnani et al., 2014; Yang
et al., 2011). Nowadays, researchers in this field agree that
microproteins are small in size, numerous in variety with
underestimated functions thus far due to several reasons.
First, microproteins are routinely discarded during the automatic gene annotation process in the past because smORFs
less than 100 codons are considered untranslatable and
“meaningless” in the eukaryotic genomes (Basrai et al.,
1997; Chugunova et al., 2018; Frith et al., 2006). Second,
eukaryotic mRNA is often mono-cistronic that encodes a
single protein, while the bacterial transcript is poly-cistronic
that encodes several proteins. It was well-acknowledged in
eukaryotic cells that canonical proteins are translated from
mRNA; thus, alternative transcription and translation are
overlooked.
The pioneering systematical studies on microproteins
started from yeast in 1997 (Basrai et al., 1997). Shortly
after that, the Boeke group reported functional analysis of
smORFs in the yeast genome and identified several microproteins that respond to different stress (Kastenmayer et al.,
2006). Then, Savard et al. (2006) found that mlpt mRNA
is poly-cistronic and encodes four small peptides. Three of
them shared conserved motif, and depletion of mlpt impaired
the embryo segmentation in Tribolium. Subsequently, two
groups independently reported that the homology gene of
mlpt in Drosophila, tal/pri, has similar biological functions
(Galindo et al., 2007; Kondo et al., 2007). These pioneer
studies kick off a microprotein gold rush, especially in
higher eukaryotes.
With the development of techniques including bioinformatic methods, mass spectrometry and newly developed
ribosome profiling (Andrews & Rothnagel, 2014; Couso &
Patraquim, 2017; Ingolia et al., 2009; Makarewich & Olson,
2017; Orr et al., 2020; Peeters & Menschaert, 2020; Plaza
et al., 2017; Pueyo et al., 2016; Schlesinger & Elsasser,
2021; Sousa & Farkas, 2018; Xiao et al., 2018), thousands
of translated smORFs or alternative-ORFs (alt-ORFs) in
the genomes have been identified thus far. Surprisingly,
microproteins are encoded by mRNAs as well as rRNA and
other kinds of RNA that were previously considered noncoding (Non-coding RNA, ncRNA), including alternative
transcripts and non-AUG start codon initiation transcripts,
13
uORF (upstream ORF) located in the 5’ UTR of mRNA and
circular RNA (circRNA) (Table 1).
As an emerging and rapidly evolving field, there are many
different definitions, classifications, and nomenclatures of
“smORF-coded proteins” (Table 1). For example, according to (1) their small size, from 2 to 200 aa, “microprotein”
(Makarewich, 2020; Rathore et al., 2018), “small protein”
(Basrai et al., 1997; Oyama et al., 2004; Storz et al., 2014),
“micropeptide” (Chen et al., 2021; Makarewich & Olson,
2017; Vitorino et al., 2021), “miniproteins” (Crook et al.,
2020); (2) their original sources, “SEP” (smORF-encoded
polypeptide) (Pueyo et al., 2016; Saghatelian & Couso,
2015; Yin et al., 2019) or “MDPs” (mitochondria-derived
peptides) (Kim et al., 2017); (3) their “microRNAs-like”
functions, that they bind and inhibit targeted proteins as negative regulators at the protein level, such as “microProtein,
miP” (Bhati et al., 2018; Staudt & Wenkel, 2011), “inhibitor of DNA binding, Id proteins” (Roschger & Cabrele,
2017); (4) the unique structure or function, “cyclotide” was
named after its head-to-tail cyclized backbone (Camarero,
2017; Craik & Du, 2017); (5) personalized naming like
“CASIMO1”, “CYREN”, “DYNLL1” (Arnoult et al., 2017;
He et al., 2018; Polycarpou-Schwarz et al., 2018).
Identification of microproteins
Bioinformatic prediction
Bioinformatic prediction is often (...truncated)