RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies
Alexandros Stamatakis
0
1
Associate Editor: Jonathan Wren
0
Department of Informatics, Institute of Theoretical Informatics, Karlsruhe Institute of Technology
, 76128 Karlsruhe,
Germany
1
Scientific Computing Group,
Heidelberg Institute for Theoretical Studies
,
69118 Heidelberg
Motivation: Phylogenies are increasingly used in all fields of medical and biological research. Moreover, because of the next-generation sequencing revolution, datasets used for conducting phylogenetic analyses grow at an unprecedented pace. RAxML (Randomized Axelerated Maximum Likelihood) is a popular program for phylogenetic analyses of large datasets under maximum likelihood. Since the last RAxML paper in 2006, it has been continuously maintained and extended to accommodate the increasingly growing input datasets and to serve the needs of the user community. Results: I present some of the most notable new features and extensions of RAxML, such as a substantial extension of substitution models and supported data types, the introduction of SSE3, AVX and AVX2 vector intrinsics, techniques for reducing the memory requirements of the code and a plethora of operations for conducting postanalyses on sets of trees. In addition, an up-to-date 50-page user manual covering all new RAxML options is available. Availability and implementation: The code is available under GNU GPL at https://github.com/stamatak/standard-RAxML. Contact: Supplementary information: Supplementary data are available at Bioinformatics online.
Phylogenetics
1 INTRODUCTION
RAxML (Randomized Axelerated Maximum Likelihood) is a
popular program for phylogenetic analysis of large datasets
under maximum likelihood. Its major strength is a fast maximum
likelihood tree search algorithm that returns trees with good
likelihood scores. Since the last RAxML paper (Stamatakis,
2006), it has been continuously maintained and extended to
accommodate the increasingly growing input datasets and to serve
the needs of the user community. In the following, I will present
some of the most notable new features and extensions of RAxML.
NEW FEATURES
Bootstrapping and support values
standard bootstrap search that relies on algorithmic shortcuts
and approximations to speed up the search process.
It also offers an option to calculate the so-called SH-like
support values (Guindon et al., 2010). I recently implemented
a method that allows for computing RELL (Resampling
Estimated Log Likelihoods) bootstrap support as described by
Minh et al. (2013).
Apart from this, RAxML also offers a so-called bootstopping
option (Pattengale et al., 2010). When this option is used,
RAxML will automatically determine how many bootstrap
replicates are required to obtain stable support values.
Models and data types
Apart from DNA and protein data, RAxML now also supports
binary, multi-state morphological and RNA secondary structure
data. It can correct for ascertainment bias (Lewis, 2001) for all of
the above data types. This might be useful not only for
morphological data matrices that only contain variable sites but also for
alignments of SNPs.
The number of available protein substitution models has been
significantly extended and comprises a general time reversible
(GTR) model, as well as the computationally more complex
LG4M and LG4X models (Le et al., 2012). RAxML can also
automatically determine the best-scoring protein substitution
model.
Finally, a new option for conducting a maximum likelihood
estimate of the base frequencies has become available.
Parallel versions
RAxML offers a fine-grain parallelization of the likelihood
function for multi-core systems via the PThreads-based version and a
coarse-grain parallelization of independent tree searches via MPI
(Message Passing Interface). It also supports
coarse-grain/finegrain parallelism via the hybrid MPI/PThreads version (Pfeiffer
and Stamatakis, 2010).
Note that, for extremely large analyses on supercomputers,
using the dedicated sister program ExaML [Exascale Maximum
Likelihood (Stamatakis and Aberer, 2013)] is recommended.
Post-analysis of trees
RAxML offers four different ways to obtain bootstrap support.
It implements the standard non-parametric bootstrap and also
the so-called rapid bootstrap (Stamatakis et al., 2008), which is a
RAxML offers a plethora of post-analysis functions for sets of
trees. Apart from standard statistical significance tests, it offers
efficient (and partially parallelized) operations for computing
RobinsonFoulds distances, as well as extended majority rule,
majority rule and strict consensus trees (Aberer et al., 2010).
Beyond this, it implements a method for identifying the
socalled rogue taxa (Pattengale et al., 2011), and I recently
implemented options for calculating the TC (Tree Certainty) and IC
(Internode Certainty) measures as introduced by Salichos and
Rokas (2013).
Finally, there is the new plausibility checker option (Dao et al.,
2013) that allows computing the RF distances between a huge
phylogeny w (...truncated)