Predicting essential proteins by integrating orthology, gene expressions, and PPI networks

PLOS ONE, Nov 2019

Identifying essential proteins is very important for understanding the minimal requirements of cellular life and finding human disease genes as well as potential drug targets. Experimental methods for identifying essential proteins are often costly, time-consuming, and laborious. Many computational methods for such task have been proposed based on the topological properties of protein-protein interaction networks (PINs). However, most of these methods have limited prediction accuracy due to the noisy and incomplete natures of PINs and the fact that protein essentiality may relate to multiple biological factors. In this work, we proposed a new centrality measure, OGN, by integrating orthologous information, gene expressions, and PINs together. OGN determines a protein’s essentiality by capturing its co-clustering and co-expression properties, as well as its conservation in the evolution process. The performance of OGN was tested on the species of Saccharomyces cerevisiae. Compared with several published centrality measures, OGN achieves higher prediction accuracy in both working alone and ensemble.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0195410&type=printable

Predicting essential proteins by integrating orthology, gene expressions, and PPI networks

April Predicting essential proteins by integrating orthology, gene expressions, and PPI networks Xue Zhang 0 1 Wangxin Xiao (WX 1 Xihao Hu 1 0 School of Medicine, Tufts University , Boston, MA , United States of America, 2 School of Computer and Software Engineering, Huaiyin Institute of Technology , Huai'an, Jiangsu , China , 3 Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute , Boston, MA , United States, 4 Harvard T.H. Chan, School of Public Health , Boston, MA , United States of America 1 Editor: Irene Sendiña-Nadal, Universidad Rey Juan Carlos , SPAIN Identifying essential proteins is very important for understanding the minimal requirements of cellular life and finding human disease genes as well as potential drug targets. Experimental methods for identifying essential proteins are often costly, time-consuming, and laborious. Many computational methods for such task have been proposed based on the topological properties of protein-protein interaction networks (PINs). However, most of these methods have limited prediction accuracy due to the noisy and incomplete natures of PINs and the fact that protein essentiality may relate to multiple biological factors. In this work, we proposed a new centrality measure, OGN, by integrating orthologous information, gene expressions, and PINs together. OGN determines a protein's essentiality by capturing its co-clustering and co-expression properties, as well as its conservation in the evolution process. The performance of OGN was tested on the species of Saccharomyces cerevisiae. Compared with several published centrality measures, OGN achieves higher prediction accuracy in both working alone and ensemble. - Data Availability Statement: All data used in this study are third party and freely accessible from public databases. Protein-protein interactions data are available from BioGRID database at http:// thebiogrid.org/download.php. Essential genes data from Saccharomyces genome deletion consortium are available at http://www-sequence.stanford.edu/ group/yeast_deletion_project/deletions3.html. Essential genes data from DEG database are available at http://tubic.tju.edu.cn/deg/. Essential genes data from SGD database are available at http://www.yeastgenome.org/. Gene expression data [ 24 ] was downloaded from Gene Expression Introduction Essential proteins are cellular functional molecules that are indispensable to the survival or reproduction of a living organism. Essential protein identification is crucial for understanding the minimal requirements of basic cell functions, and identifying human disease genes [ 1 ] and new drug targets [ 2 ]. Experimental methods for the discovery of essential proteins are often time-consuming, laborious, and costly. Computational methods can help to rank the genes based on publicly available biological resources and so greatly reduce the experimental cost needed for finding a novel gene target. With the accumulation of high-throughput experimental data, it's now possible to predict protein essentiality in network level. Many researchers have explored the correlations between network topological features and protein essentiality, and found that proteins highly connecting with other proteins in PIN are more likely to be essential than those of low connections. Omnibus (series accession GSE3431) at https:// www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE3431. Orthologous data was downloaded from InParanoid at http://inparanoid.sbc.su.se/cgi-bin/ index.cgi. Funding: This work was funded by National Natural Science Foundation of China, No. 61402423, XZ; National Natural Science Foundation of China, No. 51678282, WX; National Natural Science Foundation of China, No. 51378243, WX; Guizhou Provincial Science and Technology Fund with grant No. [2015]2135, XZ. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. This so-called centrality-lethality rule [ 3 ] has been observed in several species, such as Saccharomyces cerevisiae, Caenorhabditis elegans, and Drosophila melanogaster [4±5]. Many centrality measures have been proposed to capture the correlations between network topological properties and protein essentiality, including degree centrality (DC) [ 5 ], betweenness centrality (BC) [ 6 ], closeness centrality (CC) [ 7 ], eigenvector centrality (EC) [ 8 ], and subgraph centrality (SC) [ 9 ]. Since the existing PINs for many species are not complete and very noisy, the identification of essential proteins solely based on network topology is still very challenging. In addition, protein essentiality is expected to be affected by multiple biological factors, while network topological properties only capture some of its characteristics. Most centrality measures that are only based on PINs could be sensitive to the noise in each PIN, even though they have been found to correlate with the essentiality of protei (...truncated)


This is a preview of a remote PDF: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0195410&type=printable

Xue Zhang, Wangxin Xiao, Xihao Hu. Predicting essential proteins by integrating orthology, gene expressions, and PPI networks, PLOS ONE, 2018, Volume 13, Issue 4, DOI: 10.1371/journal.pone.0195410