RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies

Bioinformatics, May 2014

Motivation: Phylogenies are increasingly used in all fields of medical and biological research. Moreover, because of the next-generation sequencing revolution, datasets used for conducting phylogenetic analyses grow at an unprecedented pace. RAxML (Randomized Axelerated Maximum Likelihood) is a popular program for phylogenetic analyses of large datasets under maximum likelihood. Since the last RAxML paper in 2006, it has been continuously maintained and extended to accommodate the increasingly growing input datasets and to serve the needs of the user community. Results: I present some of the most notable new features and extensions of RAxML, such as a substantial extension of substitution models and supported data types, the introduction of SSE3, AVX and AVX2 vector intrinsics, techniques for reducing the memory requirements of the code and a plethora of operations for conducting post-analyses on sets of trees. In addition, an up-to-date 50-page user manual covering all new RAxML options is available. Availability and implementation: The code is available under GNU GPL at https://github.com/stamatak/standard-RAxML. Contact: alexandros.stamatakis{at}h-its.org Supplementary information: Supplementary data are available at Bioinformatics online.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://bioinformatics.oxfordjournals.org/content/30/9/1312.full.pdf

RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies

Alexandros Stamatakis 0 1 Associate Editor: Jonathan Wren 0 Department of Informatics, Institute of Theoretical Informatics, Karlsruhe Institute of Technology , 76128 Karlsruhe, Germany 1 Scientific Computing Group, Heidelberg Institute for Theoretical Studies , 69118 Heidelberg Motivation: Phylogenies are increasingly used in all fields of medical and biological research. Moreover, because of the next-generation sequencing revolution, datasets used for conducting phylogenetic analyses grow at an unprecedented pace. RAxML (Randomized Axelerated Maximum Likelihood) is a popular program for phylogenetic analyses of large datasets under maximum likelihood. Since the last RAxML paper in 2006, it has been continuously maintained and extended to accommodate the increasingly growing input datasets and to serve the needs of the user community. Results: I present some of the most notable new features and extensions of RAxML, such as a substantial extension of substitution models and supported data types, the introduction of SSE3, AVX and AVX2 vector intrinsics, techniques for reducing the memory requirements of the code and a plethora of operations for conducting postanalyses on sets of trees. In addition, an up-to-date 50-page user manual covering all new RAxML options is available. Availability and implementation: The code is available under GNU GPL at https://github.com/stamatak/standard-RAxML. Contact: Supplementary information: Supplementary data are available at Bioinformatics online. Phylogenetics 1 INTRODUCTION RAxML (Randomized Axelerated Maximum Likelihood) is a popular program for phylogenetic analysis of large datasets under maximum likelihood. Its major strength is a fast maximum likelihood tree search algorithm that returns trees with good likelihood scores. Since the last RAxML paper (Stamatakis, 2006), it has been continuously maintained and extended to accommodate the increasingly growing input datasets and to serve the needs of the user community. In the following, I will present some of the most notable new features and extensions of RAxML. NEW FEATURES Bootstrapping and support values standard bootstrap search that relies on algorithmic shortcuts and approximations to speed up the search process. It also offers an option to calculate the so-called SH-like support values (Guindon et al., 2010). I recently implemented a method that allows for computing RELL (Resampling Estimated Log Likelihoods) bootstrap support as described by Minh et al. (2013). Apart from this, RAxML also offers a so-called bootstopping option (Pattengale et al., 2010). When this option is used, RAxML will automatically determine how many bootstrap replicates are required to obtain stable support values. Models and data types Apart from DNA and protein data, RAxML now also supports binary, multi-state morphological and RNA secondary structure data. It can correct for ascertainment bias (Lewis, 2001) for all of the above data types. This might be useful not only for morphological data matrices that only contain variable sites but also for alignments of SNPs. The number of available protein substitution models has been significantly extended and comprises a general time reversible (GTR) model, as well as the computationally more complex LG4M and LG4X models (Le et al., 2012). RAxML can also automatically determine the best-scoring protein substitution model. Finally, a new option for conducting a maximum likelihood estimate of the base frequencies has become available. Parallel versions RAxML offers a fine-grain parallelization of the likelihood function for multi-core systems via the PThreads-based version and a coarse-grain parallelization of independent tree searches via MPI (Message Passing Interface). It also supports coarse-grain/finegrain parallelism via the hybrid MPI/PThreads version (Pfeiffer and Stamatakis, 2010). Note that, for extremely large analyses on supercomputers, using the dedicated sister program ExaML [Exascale Maximum Likelihood (Stamatakis and Aberer, 2013)] is recommended. Post-analysis of trees RAxML offers four different ways to obtain bootstrap support. It implements the standard non-parametric bootstrap and also the so-called rapid bootstrap (Stamatakis et al., 2008), which is a RAxML offers a plethora of post-analysis functions for sets of trees. Apart from standard statistical significance tests, it offers efficient (and partially parallelized) operations for computing RobinsonFoulds distances, as well as extended majority rule, majority rule and strict consensus trees (Aberer et al., 2010). Beyond this, it implements a method for identifying the socalled rogue taxa (Pattengale et al., 2011), and I recently implemented options for calculating the TC (Tree Certainty) and IC (Internode Certainty) measures as introduced by Salichos and Rokas (2013). Finally, there is the new plausibility checker option (Dao et al., 2013) that allows computing the RF distances between a huge phylogeny w (...truncated)


This is a preview of a remote PDF: https://bioinformatics.oxfordjournals.org/content/30/9/1312.full.pdf

Alexandros Stamatakis. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies, Bioinformatics, 2014, pp. 1312-1313, 30/9, DOI: 10.1093/bioinformatics/btu033