The PANTHER database of protein families, subfamilies, functions and pathways

Nucleic Acids Research, Jan 2005

PANTHER is a large collection of protein families that have been subdivided into functionally related subfamilies, using human expertise. These subfamilies model the divergence of specific functions within protein families, allowing more accurate association with function (ontology terms and pathways), as well as inference of amino acids important for functional specificity. Hidden Markov models (HMMs) are built for each family and subfamily for classifying additional protein sequences. The latest version, 5.0, contains 6683 protein families, divided into 31 705 subfamilies, covering ∼90% of mammalian protein-coding genes. PANTHER 5.0 includes a number of significant improvements over previous versions, most notably (i) representation of pathways (primarily signaling pathways) and association with subfamilies and individual protein sequences; (ii) an improved methodology for defining the PANTHER families and subfamilies, and for building the HMMs; (iii) resources for scoring sequences against PANTHER HMMs both over the web and locally; and (iv) a number of new web resources to facilitate analysis of large gene lists, including data generated from high-throughput expression experiments. Efforts are underway to add PANTHER to the InterPro suite of databases, and to make PANTHER consistent with the PIRSF database. PANTHER is now publicly available without restriction at http://panther.appliedbiosystems.com.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://nar.oxfordjournals.org/content/33/suppl_1/D284.full.pdf

The PANTHER database of protein families, subfamilies, functions and pathways

The PANTHER database of protein families, subfamilies, functions and pathways Huaiyu Mi 1 Betty Lazareva-Ulitsky 1 Rozina Loo 1 Anish Kejariwal 1 Jody Vandergriff 1 Steven Rabkin 1 Nan Guo 1 Anushya Muruganujan 1 Olivier Doremieux 1 Michael J. Campbell 1 Hiroaki Kitano 0 1 Paul D. Thomas 1 0 The Systems Biology Institute and ERATO-SORST Kitano Symbiotic Systems Project/Japan Science and Technology Agency , Suite 6A, M31, 6-31-15 Jingumae, Shibuya, Tokyo 150-0001 , Japan 1 Computational Biology, Applied Biosystems , 850 Lincoln Center Drive, Foster City, CA 94404 , USA PANTHER is a large collection of protein families that have been subdivided into functionally related subfamilies, using human expertise. These subfamilies model the divergence of specific functions within protein families, allowing more accurate association with function (ontology terms and pathways), as well as inference of amino acids important for functional specificity. Hidden Markov models (HMMs) are built for each family and subfamily for classifying additional protein sequences. The latest version, 5.0, contains 6683 protein families, divided into 31 705 subfamilies, covering 90% of mammalian protein-coding genes. PANTHER 5.0 includes a number of significant improvements over previous versions, most notably (i) representation of pathways (primarily signaling pathways) and association with subfamilies and individual protein sequences; (ii) an improved methodology for defining the PANTHER families and subfamilies, and for building the HMMs; (iii) resources for scoring sequences against PANTHER HMMs both over the web and locally; and (iv) a number of new web resources to facilitate analysis of large gene lists, including data generated from high-throughput expression experiments. Efforts are underway to add PANTHER to the InterPro suite of databases, and to make PANTHER consistent with the PIRSF database. PANTHER is now publicly available without restriction at http://panther.appliedbio systems.com. - The philosophy, as well as the basic methodology, behind the PANTHER database has been described previously (1,2); therefore, we focus here on the recent improvements to the database and to the functionality available on the website. In brief, there are two main parts to PANTHER: PANTHER/LIB, a library of protein families and subfamilies; and PANTHER/X, a set of ontology terms describing protein function. The database’s main advantage is in the curator-defined grouping of protein sequences into functional subfamilies, allowing more detailed and accurate association with the ontology terms, and now biological pathways. Each family and subfamily is represented by a phylogenetic tree of ‘training sequences’, and a hidden Markov model (HMM) that represents these sequences as a statistical model. The HMM library can be searched to classify new sequences, or to provide a score to predict the likely functional consequence of a mutation (1). PANTHER is quite comprehensive for the annotation of protein sequences encoded by metazoan genomes: 90% of mammalian proteincoding genes, and nearly two-thirds of Drosophila genes, are hit by a PANTHER HMM. The PANTHER database has recently been expanded to include associations between protein sequences and the biological pathways they participate in. Like the molecular function and biological process ontology terms, these pathways are associated with individual protein sequences, and when possible with PANTHER subfamily HMMs, by expert curators. We have also improved the methodology used to define protein families and subfamilies. These improvements are mainly in two areas: global clustering of protein sequence space to allow definition of family boundaries, and new algorithms that make use of ontology terms to provide a guide for curators to define both families and subfamilies. There are also a number of significant improvements to the website. Perhaps most importantly for users, the site is now free of the previous restrictions on its use (3). In addition, HMMs can be downloaded, and/or searched interactively using a protein sequence as a query. Pathways can be interactively browsed and queried. Gene lists (e.g. from mRNA expression data) can be uploaded to the site and analyzed relative to molecular functions, biological processes and pathways. STATISTICS FOR PANTHER 5.0 PANTHER/LIB (library of protein family and subfamily HMMs), version 5.0 contains 256 413 training sequences, grouped into 6683 families. These families were then divided further into 31 705 subfamilies. PANTHER HMMs have been used to annotate the proteincoding genes annotated in the human, mouse, rat and Drosophila melanogaster genomes. The fractions of these genes that were given a functional annotation by PANTHER 5.0 are shown in Table 1. PANTHER WEBSITE FUNCTIONALITY Several resources are now available at the PANTHER website. Interactive (i) Ontology term browser. The PANTHER Prowler (1) is designed for browsing ontology ter (...truncated)


This is a preview of a remote PDF: https://nar.oxfordjournals.org/content/33/suppl_1/D284.full.pdf

Huaiyu Mi, Betty Lazareva-Ulitsky, Rozina Loo, Anish Kejariwal, Jody Vandergriff, Steven Rabkin, Nan Guo, Anushya Muruganujan, Olivier Doremieux, Michael J. Campbell, Hiroaki Kitano, Paul D. Thomas. The PANTHER database of protein families, subfamilies, functions and pathways, Nucleic Acids Research, 2005, pp. D284-D288, 33/suppl 1, DOI: 10.1093/nar/gki078