Research And Application Of Parallel Computing Algorithms For Statistical Phylogenetic Inference
Files
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
Estimating the evolutionary history of organisms, phylogenetic inference, is a
critical step in many analyses involving biological sequence data such as DNA.
The likelihood calculations at the heart of the most effective methods for
statistical phylogenetic analyses are extremely computationally intensive, and
hence these analyses become a bottleneck in many studies. Recent progress in
computer hardware, specifically the increase in pervasiveness of highly
parallel, many-core processors has created opportunities for new approaches to
computationally intensive methods, such as those in phylogenetic inference.
We have developed an open source library, BEAGLE, which uses parallel
computing methods to greatly accelerate statistical phylogenetic inference,
for both maximum likelihood and Bayesian approaches. BEAGLE defines a uniform
application programming interface and includes a collection of efficient
implementations that use NVIDIA CUDA, OpenCL, and C++ threading frameworks
for evaluating likelihoods under a wide variety of evolutionary models, on
GPUs as well as on multi-core CPUs. BEAGLE employs a number of different
parallelization techniques for phylogenetic inference, at different
granularity levels and for distinct processor architectures. On CUDA and
OpenCL devices, the library enables concurrent computation of site likelihoods,
data subsets, and independent subtrees. The general design features of the
library also provide a model for software development using parallel computing
frameworks that is applicable to other domains.
BEAGLE has been integrated with some of the leading programs in the field,
such as MrBayes and BEAST, and is used in a diverse range of evolutionary
studies, including those of disease causing viruses. The library can provide
significant performance gains, with the exact increase in performance
depending on the specific properties of the data set, evolutionary model, and
hardware. In general, nucleotide analyses are accelerated on the order of
10-fold and codon analyses on the order of 100-fold.