Treehouse: a user-friendly application to obtain subtrees from large phylogenies
BMC Research Notes
(2019) 12:541
Steenwyk and Rokas BMC Res Notes
https://doi.org/10.1186/s13104-019-4577-5
Open Access
RESEARCH NOTE
Treehouse: a user‑friendly application
to obtain subtrees from large phylogenies
Jacob L. Steenwyk and Antonis Rokas*
Abstract
Objective: Phylogenetic trees that contain hundreds to thousands of taxa are now routinely generated. Retrieving
the relationships among a subset of taxa in these large phylogenies can be a challenging or time-consuming task.
Addressing this challenge requires the development of tools that facilitate the easy retrieval of subtrees from any
user-specified set of taxa in a given phylogeny.
Results: We developed treehouse, an open source tool that enables the retrieval of any subtree from a given large
phylogeny. With a three-step workflow, treehouse successfully allows a user to obtain a subtree from any phylogeny.
Treehouse can help researchers to explore the relationships among any set of taxa from across the tree of life. Treehouse is implemented as a shiny application in the R programming language. Treehouse software and usage instructions are publicly available at https://github.com/JLSteenwyk/treehouse.
Keywords: Phylogenomics, Phylogenetics, Big data, Tree, Tree pruning, Shiny, Graphical user interface
Introduction
Evolutionary biology relies on understanding the phylogenetic relationships among sets of genes, traits, and
organisms under investigation. However, large phylogenies that contain hundreds of taxa are increasingly
becoming inaccessible to researchers interested in the
relationships of just a few representatives. For example,
some phylogenies are so large that taxon information is
often challenging or impossible to visualize and is often
excluded [1–4]; similarly, the lengths of many internal
branches are often very short and the constraints of displaying a large tree in a letter-sized page make the tracing
of relationships among a subset of taxa challenging and
unnecessarily time-consuming. These issues will increase
in frequency as the numbers of taxa included in phylogenies of genes, metagenomes, genomes, etc. continues to
rapidly rise.
To address these issues, we introduce treehouse, a
user-friendly application with minimal dependencies that facilitates the retrieval of subtrees from any
*Correspondence:
Department of Biological Sciences, Vanderbilt University, Nashville,
TN 37235, USA
user-specified set of taxa in a given phylogeny. Our simple three-step workflow allows users to obtain subtrees
from a curated and growing database of large-scale phylogenetic trees from across the tree of life. Additionally,
users may obtain subtrees from their own phylogenies
which, can facilitate data exploration and inter-disciplinary collaboration. For easy integration into pre-existing
project workflows, subtrees obtained from treehouse can
be easily be downloaded as a newick file or PDF file that
retains branch length information. Treehouse enables
beginner and expert evolutionary biologists alike to reap
the benefits of large-scale phylogenetic projects and use
them to test evolutionary-based hypotheses.
Main text
Materials and methods
Data acquisition
The treehouse contains a database of 20 representative
large phylogenies from across the tree of life (Table 1).
Description of the software
Using treehouse requires the R packages phytools,
version 0.6–60 [21], and shiny, version 1.2.0 (https
://shiny.rstudio.com/). Dependencies of phytools
© The Author(s) 2019. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License
(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium,
provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license,
and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/
publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Steenwyk and Rokas BMC Res Notes
(2019) 12:541
Page 2 of 4
Table 1 Curated phylogenies currently available in treehouse’s database
Highest level of taxonomic organization
Taxon or taxa represented
Number of taxa
References
Animals
Birds
198 taxa
[5]
Animals
Birds
48 taxa
[6]
Animals
Insects
144 taxa
[7]
Animals
Mammals
37 taxa
[8]
Animals
Mammals
36 taxa
[9]
Animals
Metazoans
36 taxa
[10]
Animals
Metazoans
70 taxa
[11]
Animals
Vertebrates
58 taxa
[12]
Animals
Worms
100 taxa
[13]
Fungi
Aspergillus and Penicillium
81 taxa
[14]
Fungi
Cryptococcus neoformans
387 strains
[15]
Fungi
Fungi
214 taxa
[16]
Fungi
Agaricomycetes
5284 taxa
[2]
Fungi
Saccharomyces cerevisiae
1011 strains
[1]
Fungi
Saccharomycotina
86 taxa
[17]
Fungi
Saccharomycotina
332 taxa
[4]
Plant
Caryophyllales
95 taxa
[18]
Plant
Flowering plants
45 taxa
[19]
Plant
Land plants
103 taxa
[20]
Tree of life
Tree of life
3083 taxa
[3]
includes maps, version 3.3.0 (https://cran.r-proje
ct.org/web/packages/maps/index.html), and ape, version 5.3 [22]. To present the phylogeny as depicted by
the original authors, phylogenies from treehouse’s database are rooted. The taxa chosen to root the phylogeny
on are inferred from figures presented in the original
manuscript or, in the case of phylogenies presented
without taxa names, personal communications with
the authors. Phylogenies are rooted using phytools’s
root() function. Using the list of taxa provided by the
user, treehouse determines the list of taxa to remove
from the phylogeny using the setdiff() function. The
resulting list is then used to remove taxa in the phylogeny using phytools’s drop.tip() function. To write out
the resulting phylogeny in a newick-formatted text file
or display it in a scalable-vector-graphic-formatted pdf
file, we use the write.tree() and plot.phylo() functions in
Ape, respectively. To create a user-friendly and intuitive
user-interface, we used shiny.
Results
A three‑step workflow to obtain subtrees
Treehouse is designed to have a simple user-interface that
guides a user through an intuitive three-step workflow
(Fig. 1A) and user interface (Fig. 1B).
1. Tree selection
A user can choose between five tabs—userTree,
Animals, Fungi, Plants, and Tree of Life—located at
the top of the user interface (Fig. 1Ba). When using
phylogenies from the treehouse database, a user
selects the desired phylogeny using a dropdown
menu (Fig. 1Bi; left). In userTree, a user selects
a phylogeny in newick format from their local
computer (Fig. 1Bi; right).
2. Selection of Taxa
A user next uploads a text file containing the single-column list of taxa that they want a subtree for
(Fig. 1Bii). Here, each taxon name must be identical
to a taxon name in the full phylogeny.
3. Subtree output
By clicking the ‘Update’ button, the user launches
treehouse subtree retrieval. The subtree is (...truncated)