FPGA acceleration of the phylogenetic likelihood function for Bayesian MCMC inference methods
Zierke and Bakos BMC Bioinformatics 2010, 11:184
http://www.biomedcentral.com/1471-2105/11/184
Open Access
RESEARCH ARTICLE
FPGA acceleration of the phylogenetic likelihood
function for Bayesian MCMC inference methods
Research article
Stephanie Zierke† and Jason D Bakos*†
Abstract
Background: Likelihood (ML)-based phylogenetic inference has become a popular method for estimating the
evolutionary relationships among species based on genomic sequence data. This method is used in applications such
as RAxML, GARLI, MrBayes, PAML, and PAUP. The Phylogenetic Likelihood Function (PLF) is an important kernel
computation for this method. The PLF consists of a loop with no conditional behavior or dependencies between
iterations. As such it contains a high potential for exploiting parallelism using micro-architectural techniques. In this
paper, we describe a technique for mapping the PLF and supporting logic onto a Field Programmable Gate Array
(FPGA)-based co-processor. By leveraging the FPGA's on-chip DSP modules and the high-bandwidth local memory
attached to the FPGA, the resultant co-processor can accelerate ML-based methods and outperform state-of-the-art
multi-core processors.
Results: We use the MrBayes 3 tool as a framework for designing our co-processor. For large datasets, we estimate that
our accelerated MrBayes, if run on a current-generation FPGA, achieves a 10× speedup relative to software running on
a state-of-the-art server-class microprocessor. The FPGA-based implementation achieves its performance by deeply
pipelining the likelihood computations, performing multiple floating-point operations in parallel, and through a
natural log approximation that is chosen specifically to leverage a deeply pipelined custom architecture.
Conclusions: Heterogeneous computing, which combines general-purpose processors with special-purpose coprocessors such as FPGAs and GPUs, is a promising approach for high-performance phylogeny inference as shown by
the growing body of literature in this field. FPGAs in particular are well-suited for this task because of their low power
consumption as compared to many-core processors and Graphics Processor Units (GPUs) [1].
Background
The problem of phylogenetic inference is to construct a
phylogeny that most closely resembles the actual relative
evolutionary history of a set of species. The species,
which consist of a set of nucleotide sequences, amino acid
sequences, or gene orderings, are referred to as taxa.
One of the challenges in phylogenetic inference is the
size of the tree space. The number of possible unrooted
phylogenetic trees for n taxa is:
∏i =3 (2n − 5) [2]
n
In many cases, performing an exhaustive search to find
the optimal tree is computationally intractible so heuristics are often used.
* Correspondence:
1 Department of Computer Science and Engineering, University of South
Carolina, Columbia, SC, USA
† Contributed equally
Another challenge in phylogenetic inference is determining the accuracy of a given tree. Maximum likelihood
(ML) and Bayesian inference methods typically employ
Felsenstein's pruning algorithm to compute the Phylogenetic Likelihood Function (PLF) in order to determine
the statistical likelihood score for a tree [3,4].
This paper describes a reconfigurable hardware implementation of the Phylogenetic Likelihood Function (PLF),
as well as the normalization and log-likelihood steps used
in MrBayes [5]. Our design includes enhancements
designed to leverage the high-bandwidth local memory
on our co-processor card to store the likelihood vectors
for each of the tree nodes.
MrBayes uses the PLF to evaluate the likelihood of trees
[21] (which consumes nearly all of the execution time),
and uses the Metropolis-coupled Markov chain Monte
Carlo (MCMC) search to move through the tree space.
Full list of author information is available at the end of the article
© 2010 Zierke and Bakos; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Com-
BioMed Central mons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Zierke and Bakos BMC Bioinformatics 2010, 11:184
http://www.biomedcentral.com/1471-2105/11/184
Related Work
ML and Bayesian phylogeny inference tools include
RAxML [6], GARLI [7], MrBayes [8], and PAML [9]. In
many cases parallelized versions of these tools have been
developed for cluster and shared-memory systems [1016]. This paper instead focuses specifically on heterogeneous computing methods for likelihood-based phylogenetic inference, which requires finer-grain parallelization
of the kernel computations using special-purpose co-processors.
Mak and Lam are perhaps the first team to implement
likelihood-based phylogeny inference on an FPGA, but
they took an embedded computing approach as opposed
to a high-performance computing approach [17]. Specifically, they used the FPGA's integrated embedded processor to perform a genetic algorithm tree search method
called GAML (Genetic Algorithm for Maximum Likelihood) and used special-purpose logic in the FPGA fabric
to perform the PLF using fixed-point arithmetic on
behalf of the software. They do not report speedups over
software running on a state-of-the-art CPU, as the goal of
this work was apparently to demonstrate phylogenetic
inference using an FPGA-based embedded heterogeneous system-on-chip (called "platform FPGA") and not
to accelerate a high-performance computer.
Alachiotis et al recently published a series of papers
that describe their FPGA-based accelerator for ML-based
methods [18,19]. Similar to the work by Mak and Lam,
they implemented the PLF in special-purpose hardware,
but their co-processor was hosted by a server running
optimized C code and their PLF was double precision
floating-point. In their experiments, they reconstructed
trees with up to 512 taxa and achieved an average
speedup of 8 relative to software on a single processor
core and an average speedup of 4 relative to software on a
sixteen core processor. They store the likelihood vectors,
which serve as both the input and output of the PLF, in
the FPGA card's local memory for high-bandwidth lowlatency access. Their accelerator design also includes
control logic for traversing the entire tree, reporting only
the likelihood score back to the host. However, their
architecture does not compute the more expensive log
likelihood score, nor does it perform scaling or normalization (performed in MrBayes to prevent numerical
underflow of the conditional probability vectors).
There has also been recent work in using Graphics Processor Units (GPUs) as co-processors for likelihoodbased phylogenetic inference. In recent work, Suchard et
al used the NVIDIA CUDA GTX280 many-core architecture to implement single and double precision versions of
the PLF under a Bayesian framework using both the
codon and n (...truncated)