SVM-Prot 2016: A Web-Server for Machine Learning Prediction of Protein Functional Families from Sequence Irrespective of Similarity

PLOS ONE, Dec 2019

Knowledge of protein function is important for biological, medical and therapeutic studies, but many proteins are still unknown in function. There is a need for more improved functional prediction methods. Our SVM-Prot web-server employed a machine learning method for predicting protein functional families from protein sequences irrespective of similarity, which complemented those similarity-based and other methods in predicting diverse classes of proteins including the distantly-related proteins and homologous proteins of different functions. Since its publication in 2003, we made major improvements to SVM-Prot with (1) expanded coverage from 54 to 192 functional families, (2) more diverse protein descriptors protein representation, (3) improved predictive performances due to the use of more enriched training datasets and more variety of protein descriptors, (4) newly integrated BLAST analysis option for assessing proteins in the SVM-Prot predicted functional families that were similar in sequence to a query protein, and (5) newly added batch submission option for supporting the classification of multiple proteins. Moreover, 2 more machine learning approaches, K nearest neighbor and probabilistic neural networks, were added for facilitating collective assessment of protein functions by multiple methods. SVM-Prot can be accessed at http://bidd2.nus.edu.sg/cgi-bin/svmprot/svmprot.cgi.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0155290&type=printable

SVM-Prot 2016: A Web-Server for Machine Learning Prediction of Protein Functional Families from Sequence Irrespective of Similarity

August SVM-Prot 2016: A Web-Server for Machine Learning Prediction of Protein Functional Families from Sequence Irrespective of Similarity Ying Hong Li 1 2 Jing Yu Xu 1 2 Lin Tao 1 2 Xiao Feng Li 1 2 Shuang Li 1 2 Xian Zeng 2 Shang 2 Ying Chen 2 Peng Zhang 2 Chu Qin 2 Cheng Zhang 2 Zhe Chen 0 2 Feng Zhu 1 2 Yu Zong Chen 2 0 Zhejiang Key Laboratory of Gastro-intestinal Pathophysiology, Zhejiang Hospital of Traditional Chinese Medicine, Zhejiang Chinese Medical University , Hangzhou , P. R. China , 4 School of Mathematics and Statistics, Beijing Institute of Technology , Beijing , China 1 Innovative Drug Research and Bioinformatics Group, Innovative Drug Research Centre and School of Pharmaceutical Sciences, Chongqing University , Chongqing, 401331, China , 2 Bioinformatics and Drug Discovery group, Department of Pharmacy, National University of Singapore , Singapore, 117543 , Singapore 2 Editor: Bin Liu, Harbin Institute of Technology Shenzhen Graduate School , CHINA Knowledge of protein function is important for biological, medical and therapeutic studies, but many proteins are still unknown in function. There is a need for more improved functional prediction methods. Our SVM-Prot web-server employed a machine learning method for predicting protein functional families from protein sequences irrespective of similarity, which complemented those similarity-based and other methods in predicting diverse classes of proteins including the distantly-related proteins and homologous proteins of different functions. Since its publication in 2003, we made major improvements to SVM-Prot with (1) expanded coverage from 54 to 192 functional families, (2) more diverse protein descriptors protein representation, (3) improved predictive performances due to the use of more enriched training datasets and more variety of protein descriptors, (4) newly integrated BLAST analysis option for assessing proteins in the SVM-Prot predicted functional families that were similar in sequence to a query protein, and (5) newly added batch submission option for supporting the classification of multiple proteins. Moreover, 2 more machine learning approaches, K nearest neighbor and probabilistic neural networks, were added for facilitating collective assessment of protein functions by multiple methods. SVM-Prot can be accessed at http://bidd2.nus.edu.sg/cgi-bin/svmprot/svmprot.cgi. - OPEN ACCESS Data Availability Statement: All relevant data are within the paper and its Supporting Information files. Funding: FZ is supported by grants from the Fundamental Research Funds for the Central Universities (CDJZR14468801, CDJKXB14011, 2015CDJXY); Ministry of Science and Technology, 863 Hi-Tech Program (2007AA02Z160); Key Special Project Grant 2009ZX09501-004 China; and Singapore Academic Research Fund grant R-148000-208-112. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Introduction The knowledge of protein function is essential for studying biological processes [ 1 ], understanding disease mechanisms [ 2 ], and exploring novel therapeutic targets [ 3,4 ]. Apart from experimental methods, a number of in-silico approaches have been developed and extensively Competing Interests: The authors have declared that no competing interests exist. used for protein function prediction. These methods include sequence similarity [ 5 ], sequence clustering [ 6 ], evolutionary analysis [ 7 ], gene fusion [ 8 ], protein interaction [ 9 ], protein remote homology detection [ 10,11 ], protein functional family classification based on sequence-derived [ 12,13 ] or domain [ 1 ] features, and the integrated approaches that combine multiple methods, algorithms and/or data sources for enhanced functional predictions [ 5,14–16 ]. A protein functional family is a group of proteins with specific type of molecular functions (e.g. proteases [17]), binding activities (e.g. RNA-binding [ 18 ]), or involved in specific biological processes defined by the Gene Ontology [ 19 ] (e.g. DNA repair [ 20 ]). Moreover, models of protein function prediction have been constructed for more broadly-defined functional families such as transmembrane [ 21 ], virulent [ 22 ] and secretory [ 23 ] proteins, and a large-scale communitybased critical assessment of protein function annotation (CAFA) revealed that the improvements of current protein function prediction tools were in urgent need [ 24 ]. Despite the development and extensive exploration of these methods, there is still a huge gap between proteins with and without functional characterizations. Continuous efforts are therefore needed for developing new methods and improving existing methods. These efforts have been made possible by the rapidly expanding knowledge of protein sequence [ 25 ], structural [ 26 ], functional [ 19 ] and other [ 27–30 ] data. The uncharacterized proteins comprise a substantial perc (...truncated)


This is a preview of a remote PDF: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0155290&type=printable

Ying Hong Li, Jing Yu Xu, Lin Tao, Xiao Feng Li, Shuang Li, Xian Zeng, Shang Ying Chen, Peng Zhang, Chu Qin, Cheng Zhang, Zhe Chen, Feng Zhu, Yu Zong Chen. SVM-Prot 2016: A Web-Server for Machine Learning Prediction of Protein Functional Families from Sequence Irrespective of Similarity, PLOS ONE, 2016, Volume 11, Issue 8, DOI: 10.1371/journal.pone.0155290