PhyloFinder: An intelligent search engine for phylogenetic tree databases

BMC Evolutionary Biology, Mar 2008

Bioinformatic tools are needed to store and access the rapidly growing phylogenetic data. These tools should enable users to identify existing phylogenetic trees containing a specified taxon or set of taxa and to compare a specified phylogenetic hypothesis to existing phylogenetic trees. PhyloFinder is an intelligent search engine for phylogenetic databases that we have implemented using trees from TreeBASE. It enables taxonomic queries, in which it identifies trees in the database containing the exact name of the query taxon and/or any synonymous taxon names, and it provides spelling suggestions for the query when there is no match. Additionally, PhyloFinder can identify trees containing descendants or direct ancestors of the query taxon. PhyloFinder also performs phylogenetic queries, in which it identifies trees that contain the query tree or topologies that are similar to the query tree. PhyloFinder can enhance the utility of any tree database by providing tools for both taxonomic and phylogenetic queries as well as visualization tools that highlight the query results and provide links to NCBI and TBMap. An implementation of PhyloFinder using trees from TreeBASE is available from the web client application found in the availability and requirements section.

Article PDF cannot be displayed. You can download it here:

https://bmcevolbiol.biomedcentral.com/track/pdf/10.1186/1471-2148-8-90

PhyloFinder: An intelligent search engine for phylogenetic tree databases

BMC Evolutionary Biology BioMed Central Software Open Access PhyloFinder: An intelligent search engine for phylogenetic tree databases Duhong Chen*1, J Gordon Burleigh2, Mukul S Bansal1 and David Fernández-Baca1 Address: 1Department of Computer Science, Iowa State University, Ames, IA 50011, USA and 2NESCent, Durham, NC 27705, USA Email: Duhong Chen* - ; J Gordon Burleigh - ; Mukul S Bansal - ; David FernándezBaca - * Corresponding author Published: 21 March 2008 BMC Evolutionary Biology 2008, 8:90 doi:10.1186/1471-2148-8-90 Received: 10 September 2007 Accepted: 21 March 2008 This article is available from: http://www.biomedcentral.com/1471-2148/8/90 © 2008 Chen et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Background: Bioinformatic tools are needed to store and access the rapidly growing phylogenetic data. These tools should enable users to identify existing phylogenetic trees containing a specified taxon or set of taxa and to compare a specified phylogenetic hypothesis to existing phylogenetic trees. Results: PhyloFinder is an intelligent search engine for phylogenetic databases that we have implemented using trees from TreeBASE. It enables taxonomic queries, in which it identifies trees in the database containing the exact name of the query taxon and/or any synonymous taxon names, and it provides spelling suggestions for the query when there is no match. Additionally, PhyloFinder can identify trees containing descendants or direct ancestors of the query taxon. PhyloFinder also performs phylogenetic queries, in which it identifies trees that contain the query tree or topologies that are similar to the query tree. Conclusion: PhyloFinder can enhance the utility of any tree database by providing tools for both taxonomic and phylogenetic queries as well as visualization tools that highlight the query results and provide links to NCBI and TBMap. An implementation of PhyloFinder using trees from TreeBASE is available from the web client application found in the availability and requirements section. Background The rapidly expanding wealth of phylogenetic information from across the tree of life offers unprecedented opportunities for large-scale evolutionary studies and for examining an array of biological questions in a phylogenetic context [1]. However, much of the published phylogenetic data is not easily accessible. Therefore, the storage and efficient retrieval of phylogenetic data are important challenges for bioinformatics [1-5]. TreeBASE is the larg- est relational database of published phylogenetic information. It stores more than 4,400 trees that contain over 75,000 taxa, the data matrices used to infer the trees, and additional meta-data, such as bibliographic information and details of the phylogenetic analyses [6,7]. Though TreeBASE is a valuable repository for phylogenetic data, it is often difficult to identify and access relevant phylogenetic data from within TreeBASE. In this paper, we present PhyloFinder, a new phylogenetic tree search engine that Page 1 of 11 (page number not for citation purposes) BMC Evolutionary Biology 2008, 8:90 greatly expands upon the current search features in TreeBASE and thus can enhance the utility of TreeBASE, or any phylogenetic database. To utilize the existing phylogenetic data effiectively, we need tools that can quickly identify phylogenetic trees containing a specified taxon or set of taxa and that can compare a specified phylogenetic hypothesis to existing phylogenetic trees. The complexity of taxonomy presents a first major challenge for identifying and accessing phylogenetic data [3,4,6,7]. Taxonomic names used in stored phylogenetic trees often are based on various inconsistent taxonomies [6]. Furthermore, taxonomic classifications and names frequently change, and these changes may not be reflected in database trees. Consequently, repositories such as TreeBASE contain many species that are represented by multiple equivalent names. Taxonomic queries are further complicated by misspellings or unique subspecies designations in stored trees, both of which are common in TreeBASE [6]. Many of these taxonomic issues have been addressed by TBMap, a database that maps names of taxa found in TreeBASE to other taxonomic databases and clusters equivalent taxonomic names [6]. However, TBMap is not incorporated in TreeBASE or in any other phylogenetic search engines. The hierarchical nature of taxonomic classifications presents further challenges for accessing phylogenetic data. The leaves in stored phylogenetic trees may represent different taxonomic levels, such as families, genera, species, or subspecies. It should be possible for a tree database query to identify trees containing not only the specific taxon name used in the query, but also trees containing descendants or ancestors of the query taxon [3,4]. For example, a query using the plant family name "Pinaceae" ideally would identify not only trees that contain the exact name "Pinaceae" but also trees containing Pinaceae genera such as "Pinus" or "Abies" or species such as "Pinus thunbergii" or "Abies alba". It also would be useful to identify trees containing direct ancestors (the internal nodes on the path from the root of a taxonomy tree to the query taxon) of the query taxon. Thus, a query on the species name "Pinus thunbergii" would identify trees that contain the genus name "Pinus" or the family name "Pinaceae" as leaves. Currently, TreeBASE does not directly utilize information from taxonomic classifications to allow the user to find trees containing ancestors or descendants of the query taxon [3,4]. Instead, the user can find all the taxa matching a partial name taxon query. For example, querying "Pinus@" or even "Pinu@" in TreeBASE will identify all trees containing "Pinus" in their species name. However, querying using "Pinaceae@" will not identify trees with "Pinus" or "Abies" species, because they do not contain "Pinaceae" in the species name. Alternately, the user can identify trees with related taxa through http://www.biomedcentral.com/1471-2148/8/90 "tree surfing", in which the user identifies neighboring trees (trees with shared taxa) of a specified tree(s). Tree surfing can be time consuming, and it is difficult if not impossible for the user to determine if s/he has found all the trees containing the relevant taxa. Another important feature of an effective phylogenetic search engine is the ability to make phylogenetic queries, in which the user can assess a specified tree by comparing it to the trees in the database [3,5]. Tree mining queries must first be able to identify all trees that contain or agree with a query tree, or the trees in the database in which the quer (...truncated)


This is a preview of a remote PDF: https://bmcevolbiol.biomedcentral.com/track/pdf/10.1186/1471-2148-8-90
Article home page: https://bmcevolbiol.biomedcentral.com/articles/10.1186/1471-2148-8-90

Duhong Chen, J Gordon Burleigh, Mukul S Bansal, David Fernández-Baca. PhyloFinder: An intelligent search engine for phylogenetic tree databases, BMC Evolutionary Biology, 2008, pp. 90, Volume 8, Issue 1, DOI: 10.1186/1471-2148-8-90