FPGA acceleration of the phylogenetic likelihood function for Bayesian MCMC inference methods (pdf)

Article PDF cannot be displayed. You can download it here:

http://www.biomedcentral.com/content/pdf/1471-2105-11-184.pdf

FPGA acceleration of the phylogenetic likelihood function for Bayesian MCMC inference methods

Zierke and Bakos BMC Bioinformatics 2010, 11:184 http://www.biomedcentral.com/1471-2105/11/184 Open Access RESEARCH ARTICLE FPGA acceleration of the phylogenetic likelihood function for Bayesian MCMC inference methods Research article Stephanie Zierke† and Jason D Bakos*† Abstract Background: Likelihood (ML)-based phylogenetic inference has become a popular method for estimating the evolutionary relationships among species based on genomic sequence data. This method is used in applications such as RAxML, GARLI, MrBayes, PAML, and PAUP. The Phylogenetic Likelihood Function (PLF) is an important kernel computation for this method. The PLF consists of a loop with no conditional behavior or dependencies between iterations. As such it contains a high potential for exploiting parallelism using micro-architectural techniques. In this paper, we describe a technique for mapping the PLF and supporting logic onto a Field Programmable Gate Array (FPGA)-based co-processor. By leveraging the FPGA's on-chip DSP modules and the high-bandwidth local memory attached to the FPGA, the resultant co-processor can accelerate ML-based methods and outperform state-of-the-art multi-core processors. Results: We use the MrBayes 3 tool as a framework for designing our co-processor. For large datasets, we estimate that our accelerated MrBayes, if run on a current-generation FPGA, achieves a 10× speedup relative to software running on a state-of-the-art server-class microprocessor. The FPGA-based implementation achieves its performance by deeply pipelining the likelihood computations, performing multiple floating-point operations in parallel, and through a natural log approximation that is chosen specifically to leverage a deeply pipelined custom architecture. Conclusions: Heterogeneous computing, which combines general-purpose processors with special-purpose coprocessors such as FPGAs and GPUs, is a promising approach for high-performance phylogeny inference as shown by the growing body of literature in this field. FPGAs in particular are well-suited for this task because of their low power consumption as compared to many-core processors and Graphics Processor Units (GPUs) [1]. Background The problem of phylogenetic inference is to construct a phylogeny that most closely resembles the actual relative evolutionary history of a set of species. The species, which consist of a set of nucleotide sequences, amino acid sequences, or gene orderings, are referred to as taxa. One of the challenges in phylogenetic inference is the size of the tree space. The number of possible unrooted phylogenetic trees for n taxa is: ∏i =3 (2n − 5) [2] n In many cases, performing an exhaustive search to find the optimal tree is computationally intractible so heuristics are often used. * Correspondence: 1 Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, USA † Contributed equally Another challenge in phylogenetic inference is determining the accuracy of a given tree. Maximum likelihood (ML) and Bayesian inference methods typically employ Felsenstein's pruning algorithm to compute the Phylogenetic Likelihood Function (PLF) in order to determine the statistical likelihood score for a tree [3,4]. This paper describes a reconfigurable hardware implementation of the Phylogenetic Likelihood Function (PLF), as well as the normalization and log-likelihood steps used in MrBayes [5]. Our design includes enhancements designed to leverage the high-bandwidth local memory on our co-processor card to store the likelihood vectors for each of the tree nodes. MrBayes uses the PLF to evaluate the likelihood of trees [21] (which consumes nearly all of the execution time), and uses the Metropolis-coupled Markov chain Monte Carlo (MCMC) search to move through the tree space. Full list of author information is available at the end of the article © 2010 Zierke and Bakos; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Com- BioMed Central mons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Zierke and Bakos BMC Bioinformatics 2010, 11:184 http://www.biomedcentral.com/1471-2105/11/184 Related Work ML and Bayesian phylogeny inference tools include RAxML [6], GARLI [7], MrBayes [8], and PAML [9]. In many cases parallelized versions of these tools have been developed for cluster and shared-memory systems [1016]. This paper instead focuses specifically on heterogeneous computing methods for likelihood-based phylogenetic inference, which requires finer-grain parallelization of the kernel computations using special-purpose co-processors. Mak and Lam are perhaps the first team to implement likelihood-based phylogeny inference on an FPGA, but they took an embedded computing approach as opposed to a high-performance computing approach [17]. Specifically, they used the FPGA's integrated embedded processor to perform a genetic algorithm tree search method called GAML (Genetic Algorithm for Maximum Likelihood) and used special-purpose logic in the FPGA fabric to perform the PLF using fixed-point arithmetic on behalf of the software. They do not report speedups over software running on a state-of-the-art CPU, as the goal of this work was apparently to demonstrate phylogenetic inference using an FPGA-based embedded heterogeneous system-on-chip (called "platform FPGA") and not to accelerate a high-performance computer. Alachiotis et al recently published a series of papers that describe their FPGA-based accelerator for ML-based methods [18,19]. Similar to the work by Mak and Lam, they implemented the PLF in special-purpose hardware, but their co-processor was hosted by a server running optimized C code and their PLF was double precision floating-point. In their experiments, they reconstructed trees with up to 512 taxa and achieved an average speedup of 8 relative to software on a single processor core and an average speedup of 4 relative to software on a sixteen core processor. They store the likelihood vectors, which serve as both the input and output of the PLF, in the FPGA card's local memory for high-bandwidth lowlatency access. Their accelerator design also includes control logic for traversing the entire tree, reporting only the likelihood score back to the host. However, their architecture does not compute the more expensive log likelihood score, nor does it perform scaling or normalization (performed in MrBayes to prevent numerical underflow of the conditional probability vectors). There has also been recent work in using Graphics Processor Units (GPUs) as co-processors for likelihoodbased phylogenetic inference. In recent work, Suchard et al used the NVIDIA CUDA GTX280 many-core architecture to implement single and double precision versions of the PLF under a Bayesian framework using both the codon and n (...truncated)