Microproteins: from behind the scenes to the spotlight

Genome Instability & Disease, May 2021

Microproteins, less than 200 amino acids, have been usually regarded as non-functional for a long time. We do not recognize this “dark matter” in the proteome until developing new bioinformatic, biochemical and proteomic techniques. The fast-growing number of the identified microproteins and their emerging roles in various biological processes begin to attract more and more attention. Here, we summarize the recent progress in this field mainly from four aspects: (1) brief history and various definitions of microproteins; (2) three main strategies to identify microproteins; (3) multiple regulatory mechanisms of microproteins; (4) biological functions of microproteins, with a particular focus on genome stability maintenance and their relevance to human diseases. Finally, we discuss some open questions and therapeutic opportunities hiding in these small newcomers.

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007/s42764-021-00040-3.pdf

Microproteins: from behind the scenes to the spotlight

Genome Instability & Disease https://doi.org/10.1007/s42764-021-00040-3 REVIEW ARTICLE Microproteins: from behind the scenes to the spotlight Meiqian Jiang1 · Huiqiang Lou1 · Wenya Hou1 Received: 1 April 2021 / Revised: 26 April 2021 / Accepted: 4 May 2021 © Shenzhen University School of Medicine; Fondazione Istituto FIRC di Oncologia Molecolare 2021 Abstract Microproteins, less than 200 amino acids, have been usually regarded as non-functional for a long time. We do not recognize this “dark matter” in the proteome until developing new bioinformatic, biochemical and proteomic techniques. The fast-growing number of the identified microproteins and their emerging roles in various biological processes begin to attract more and more attention. Here, we summarize the recent progress in this field mainly from four aspects: (1) brief history and various definitions of microproteins; (2) three main strategies to identify microproteins; (3) multiple regulatory mechanisms of microproteins; (4) biological functions of microproteins, with a particular focus on genome stability maintenance and their relevance to human diseases. Finally, we discuss some open questions and therapeutic opportunities hiding in these small newcomers. Keywords Microproteins · Small open reading frames · Genome stability · DNA repair · DNA replication · Cancer Abbreviations aa Amino acids alt-ORFs Alternative-ORFs bHLH Basic helix-loop-helix CASIMO1 Cancer-associated small integral membrane open reading frame 1 circRNA Circular RNA cNHEJ Canonical non-homologous end joining CYREN Cell cycle regulator of NHEJ dNDPs Deoxynucleotide diphosphates dNTPs Deoxynucleotide triphosphates DTL Denticleless DSB DNA double-strand break DYNLL1 Dynein light chain LC8-type 1 EPHX1 Epoxide hydrolase 1 ER Endoplasmic reticulum ERLIC Electrostatic repulsion hydrophilic interaction chromatography EST Expressed sequence tags FBXW7 F-box and WD repeat domain containing 7 HR Homologous recombination * Wenya Hou 1 Guangdong Key Laboratory for Genome Stability & Disease Prevention, Shenzhen University School of Medicine, Shenzhen 518060, China Id Inhibitor of DNA binding and/or differentiation LINC-PINT The long intergenic non-protein-coding RNA p53-induced transcript MDPs Mitochondria-derived peptides miPs MicroProteins MRI Modulator of retroviral infection MS Mass spectrometry MWCO Molecular weight cut-off ncRNA Non-coding RNA NDPs Nucleotide diphosphates NGS Next-generation sequencing NORFs Nonannotated ORFs NOPs 3’ Neo open reading frame peptides PAF1c Polymerase-associated factor complex PCNA Proliferating cell nuclear antigen pRb The retinoblastoma protein RNR Ribonucleotide reductase RPF Ribosome protected fragment SEP smORF-encoded polypeptide SHPRH SNF2 histone linker PHD RING helicase sORFs/smORFs Small open reading frames SR Sarcoplasmic reticulum TARP Translating ribosome affinity purification TFs Transcription factors 13 Vol.:(0123456789) Genome Instability & Disease uORF Upstream ORF uMKKS uORF of the McKusick-Kaufman syndrome gene Introduction In this review, we define “microproteins” as functional proteins from 2 amino acids (aa) and up to 200 aa, which are usually encoded by small open reading frames (sORFs/ smORFs) (Basrai et al., 1997; Magnani et al., 2014; Yang et al., 2011). Nowadays, researchers in this field agree that microproteins are small in size, numerous in variety with underestimated functions thus far due to several reasons. First, microproteins are routinely discarded during the automatic gene annotation process in the past because smORFs less than 100 codons are considered untranslatable and “meaningless” in the eukaryotic genomes (Basrai et al., 1997; Chugunova et al., 2018; Frith et al., 2006). Second, eukaryotic mRNA is often mono-cistronic that encodes a single protein, while the bacterial transcript is poly-cistronic that encodes several proteins. It was well-acknowledged in eukaryotic cells that canonical proteins are translated from mRNA; thus, alternative transcription and translation are overlooked. The pioneering systematical studies on microproteins started from yeast in 1997 (Basrai et al., 1997). Shortly after that, the Boeke group reported functional analysis of smORFs in the yeast genome and identified several microproteins that respond to different stress (Kastenmayer et al., 2006). Then, Savard et al. (2006) found that mlpt mRNA is poly-cistronic and encodes four small peptides. Three of them shared conserved motif, and depletion of mlpt impaired the embryo segmentation in Tribolium. Subsequently, two groups independently reported that the homology gene of mlpt in Drosophila, tal/pri, has similar biological functions (Galindo et al., 2007; Kondo et al., 2007). These pioneer studies kick off a microprotein gold rush, especially in higher eukaryotes. With the development of techniques including bioinformatic methods, mass spectrometry and newly developed ribosome profiling (Andrews & Rothnagel, 2014; Couso & Patraquim, 2017; Ingolia et al., 2009; Makarewich & Olson, 2017; Orr et al., 2020; Peeters & Menschaert, 2020; Plaza et al., 2017; Pueyo et al., 2016; Schlesinger & Elsasser, 2021; Sousa & Farkas, 2018; Xiao et al., 2018), thousands of translated smORFs or alternative-ORFs (alt-ORFs) in the genomes have been identified thus far. Surprisingly, microproteins are encoded by mRNAs as well as rRNA and other kinds of RNA that were previously considered noncoding (Non-coding RNA, ncRNA), including alternative transcripts and non-AUG start codon initiation transcripts, 13 uORF (upstream ORF) located in the 5’ UTR of mRNA and circular RNA (circRNA) (Table 1). As an emerging and rapidly evolving field, there are many different definitions, classifications, and nomenclatures of “smORF-coded proteins” (Table 1). For example, according to (1) their small size, from 2 to 200 aa, “microprotein” (Makarewich, 2020; Rathore et al., 2018), “small protein” (Basrai et al., 1997; Oyama et al., 2004; Storz et al., 2014), “micropeptide” (Chen et al., 2021; Makarewich & Olson, 2017; Vitorino et al., 2021), “miniproteins” (Crook et al., 2020); (2) their original sources, “SEP” (smORF-encoded polypeptide) (Pueyo et al., 2016; Saghatelian & Couso, 2015; Yin et al., 2019) or “MDPs” (mitochondria-derived peptides) (Kim et al., 2017); (3) their “microRNAs-like” functions, that they bind and inhibit targeted proteins as negative regulators at the protein level, such as “microProtein, miP” (Bhati et al., 2018; Staudt & Wenkel, 2011), “inhibitor of DNA binding, Id proteins” (Roschger & Cabrele, 2017); (4) the unique structure or function, “cyclotide” was named after its head-to-tail cyclized backbone (Camarero, 2017; Craik & Du, 2017); (5) personalized naming like “CASIMO1”, “CYREN”, “DYNLL1” (Arnoult et al., 2017; He et al., 2018; Polycarpou-Schwarz et al., 2018). Identification of microproteins Bioinformatic prediction Bioinformatic prediction is often (...truncated)


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1007/s42764-021-00040-3.pdf
Article home page: https://link.springer.com/article/10.1007/s42764-021-00040-3

Meiqian Jiang, Huiqiang Lou, Wenya Hou. Microproteins: from behind the scenes to the spotlight, Genome Instability & Disease, 2021, pp. 1-15, DOI: 10.1007/s42764-021-00040-3