TreeFam: 2008 Update

Nucleic Acids Research, Jan 2008

TreeFam (http://www.treefam.org) was developed to provide curated phylogenetic trees for all animal gene families, as well as orthologue and paralogue assignments. Release 4.0 of TreeFam contains curated trees for 1314 families and automatically generated trees for another 14 351 families. We have expanded TreeFam to include 25 fully sequenced animal genomes, as well as four genomes from plant and fungal outgroup species. We have also introduced more accurate approaches for automatically grouping genes into families, for building phylogenetic trees, and for inferring orthologues and paralogues. The user interface for viewing phylogenetic trees and family information has been improved. Furthermore, a new perl API lets users easily extract data from the TreeFam mysql database.

Article PDF cannot be displayed. You can download it here:

https://nar.oxfordjournals.org/content/36/suppl_1/D735.full.pdf

TreeFam: 2008 Update

Published online 1 December 2007 Nucleic Acids Research, 2008, Vol. 36, Database issue D735–D740 doi:10.1093/nar/gkm1005 TreeFam: 2008 Update Jue Ruan1, Heng Li2, Zhongzhong Chen1, Avril Coghlan2, Lachlan James M. Coin3, Yiran Guo1, Jean-Karim Hériché2, Yafeng Hu1, Karsten Kristiansen4, Ruiqiang Li1,4, Tao Liu1, Alan Moses2, Junjie Qin1, Søren Vang5, Albert J. Vilella6, Abel Ureta-Vidal6, Lars Bolund1,7, Jun Wang1,4,7 and Richard Durbin2,* 1 Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China, 2Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, 3 Department of Epidemiology & Public Health, Imperial College, St Mary’s Campus, Norfolk Place, London W2 1PG, UK, 4Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, 5Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health Sciences, University of Aarhus, DK-8200 Aarhus N, Denmark, 6EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK and 7Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark ABSTRACT TreeFam (http://www.treefam.org) was developed to provide curated phylogenetic trees for all animal gene families, as well as orthologue and paralogue assignments. Release 4.0 of TreeFam contains curated trees for 1314 families and automatically generated trees for another 14 351 families. We have expanded TreeFam to include 25 fully sequenced animal genomes, as well as four genomes from plant and fungal outgroup species. We have also introduced more accurate approaches for automatically grouping genes into families, for building phylogenetic trees, and for inferring orthologues and paralogues. The user interface for viewing phylogenetic trees and family information has been improved. Furthermore, a new perl API lets users easily extract data from the TreeFam mysql database. INTRODUCTION Biologists studying a gene in one model organism often wish to transfer functional information between species. To do this, it is essential to know how the gene is related to other genes in a family. Using a phylogenetic tree, it is possible to infer orthologues—related genes in different species that diverged at the time of a speciation event— and paralogues, that is related genes that originated via a duplication event within a species (1). In his original definition of orthology, Fitch defined orthologues in terms of a phylogenetic tree of a gene family (1). It has now been well established that analysis of phylogenetic trees is a very accurate way to determine orthology (2,3), which led us to develop the TreeFam database and accompanying website in 2005 (4). TreeFam aims to be a curated database of phylogenetic trees of all animal gene families, focusing on gene sets from animals with completely sequenced genomes. In TreeFam, orthologues and paralogues are inferred from the phylogenetic tree of a gene family. Tree-based inference of orthologues is more robust to rate differences than BLAST-based orthologue inference, which has been used in other databases such as InParanoid (5), KOGs (6), HomoloGene (7) and OrthoMCL-DB (8). Furthermore, tree-based results can be easily visualized and for some purpose are more informative, since gene losses and duplications can be inferred and dated on a tree. In addition to the databases mentioned above, many other databases provide animal gene families on the genome-wide scale, such as PANTHER (9), Phylofacts (10), PhIGs (11) and SYSTERS (12). They usually display the phylogenetic trees, but most do not computationally infer orthologues from the gene trees. Like TreeFam, a few databases explicitly predict orthologues based on phylogenetic trees. These include HOGENOM (13) and PhylomeDB (14). While HOGENOM allows users to calculate the orthologues on the fly with a program that connects to their database, PhylomeDB presents orthologues as directly searchable results. Furthermore, Ensembl now collaborates with TreeFam, and uses the same *To whom correspondence should be addressed. Tel: +44 (0) 1223 834244; Fax: +44 (0) 1223 494919; Email: Correspondence may also be addressed to Jun Wang. Tel: +86 (0) 10 804 81664; Fax: +86 (0) 10 804 98676; Email: The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors. ß 2007 The Author(s) This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. Received September 14, 2007; Revised October 21, 2007; Accepted October 23, 2007 D736 Nucleic Acids Research, 2008, Vol. 36, Database issue tree-building and orthologue inference algorithms (15). It is clear that the tree-based methods are theoretically attractive, but building accurate gene trees remains a major challenge. In this update, we have expanded TreeFam to include 25 fully sequenced animal genomes and four outgroup genomes. Furthermore, we have made many software improvements since the first release of TreeFam. These include (i) new algorithms for phylogenetic inference, (ii) a more user-friendly website and (iii) a perl interface (API) to the publicly available database. Together with the new features, TreeFam is an even more useful resource for identifying orthologues and paralogues in animal species and for studying evolution of animal gene families. MATERIALS AND METHODS Seventeen new species have been added since TreeFam v1 (4). TreeFam v4 contains predicted protein sequences from the fully sequenced genomes of 25 animal species: human, chimpanzee, macaque, mouse, rat, cow, dog, opossum, chicken, frog, two pufferfish (Takifugu and Tetraodon), zebrafish, medaka, stickleback, sea squirts (Ciona intestinalis and C. savignyi), two fruit-flies (Drosophila melanogaster and D. pseudoobscura), two mosquitoes (Aedes aegypti and Anopheles gambiae), the flatworm Schistosoma mansoni, and the nematodes Caenorhabditis elegans, C. briggsae and C. remanei. In addition, four outgroup genomes are included: baker’s yeast, fission yeast, rice and thale cress (Arabidopsis). The C. briggsae and C. remanei proteins were downloaded from WormBase (16), D. pseudoobscura proteins from FlyBase (17), fission yeast and flatworm proteins from GeneDB (18), thale cress proteins from TIGR (19), rice proteins from the Beijing Genomics Institute (20) and the remaining sequences from Ensembl (15). In addition to these species, TreeFam includes UniProt (21) proteins from animal species whose genomes have not been fully sequenced. For TreeFam v4, all sequences were downloaded in October 2006. Overall strategy TreeFam is a two-part database: a first part consisting of automatically generated trees (TreeFam-B) and a second part that (...truncated)


This is a preview of a remote PDF: https://nar.oxfordjournals.org/content/36/suppl_1/D735.full.pdf
Article home page: http://nar.oxfordjournals.org/content/36/suppl_1/D735.abstract

Jue Ruan, Heng Li, Zhongzhong Chen, Avril Coghlan, Lachlan James M. Coin, Yiran Guo, Jean-Karim Hériché, Yafeng Hu, Karsten Kristiansen, Ruiqiang Li, Tao Liu, Alan Moses, Junjie Qin, Søren Vang, Albert J. Vilella, Abel Ureta-Vidal, Lars Bolund, Jun Wang, Richard Durbin. TreeFam: 2008 Update, Nucleic Acids Research, 2008, pp. D735-D740, 36/suppl 1, DOI: 10.1093/nar/gkm1005