On the design of shared memory approaches to parallelize a multiobjective bee-inspired proposal for phylogenetic reconstruction
Introduction
In the last decade, the growing availability of parallel architectures has opened the door to novel programming models where multiple processing units collaborate to address computationally demanding problems. In this sense, several research lines in bioinformatics try to overcome challenging biological problems by taking advantage of shared memory Multiple Instruction Multiple Data (MIMD) multiprocessor systems, commonly represented by symmetric multiprocessors (SMP). A wide variety of biological processes can be modelled as optimization problems which seek to explore a search space Φ with the aim of obtaining an optimal solution x ∈ Φ in accordance with an objective function . As pointed out in [49], the NP-hard complexities shown by such problems (including multiple sequence alignment, DNA fragment assembling, biological sequence analysis, etc. [56]) motivate that the current needs of biological processing cannot be satisfied by using traditional serial approaches. This issue has led to an increasing interest in developing parallel algorithms for computational biology.
In order to model the behaviour of real natural systems, biological optimization procedures are often conducted by considering a number of different principles, defined as optimality criteria. This multiobjective nature has motivated the formulation of a wide range of biological problems as Multiobjective Optimization Problems (MOPs) [14]. When addressing MOPs, we aim to find a Pareto set containing solutions which optimize two or more objective functions simultaneously. This idea represents an additional challenge, as the process of finding a good approximation to the Pareto-optimal set implies a high computational cost. Recent advances in parallel architectures represent an opportunity to deal with these problems, giving support to proposals which combine bioinspired computing and parallelism.
The development of parallel metaheuristics follows two main directions [51]. Firstly, the self-contained parallel cooperation approach aims to improve the quality of results by running independent algorithms which cooperate through the interchange of promising solutions. Secondly, the idea of intra-algorithm parallelization focuses on reducing the execution times required to run an algorithm by distributing tasks among processes or threads. In bioinformatics, this second work line represents the most popular approach to make feasible the analysis of growing amounts of biological sequences. By developing parallel metaheuristics, major computational challenges in bioinformatics can be addressed. In this research, we address one of the most outstanding problems in this field, phylogenetic inference [32].
By reconstructing the evolutionary history of living species, phylogeneticists provide useful knowledge in a number of scientific fields, including biomedicine, chemistry, and paleontology. The need to infer satisfying phylogenies over exponentially increasing search spaces explains the NP-hard complexity of the problem and, therefore, the close relationship between phylogenetics, parallelism, and bioinspired computing. Throughout the years, phylogenetic searches have been carried out following optimality criteria which measure the quality of solutions. However, multiple researches [7], [35] have given account of the major sources of incongruence introduced by different optimality criteria, reporting conflicting evolutionary histories. This key issue can be addressed by formulating a multiobjective view of phylogenetics. By applying multiobjective approaches, we can describe evolutionary trees supported by multiple objective functions simultaneously. In this way, biologists will be able to infer a set of alternative trade-off solutions, along with high-quality phylogenetic topologies according to each objective separately.
In this paper, we tackle the phylogeny reconstruction problem under two criteria: parsimony and likelihood. For this purpose, we study shared memory schemes based on OpenMP [10] to parallelize a Multiobjective Artificial Bee Colony (MOABC) algorithm for phylogenetics [45]. In order to exploit pure shared memory environments, two parallel approaches are proposed. Firstly, a synchronous generational design which parallelizes the main sections that compose each generation of the algorithm, in such a way that the next generation does not take place until all the tasks in the previous one have been completed. Secondly, an asynchronous nature-inspired scheme designed to model the inherently asynchronous behaviour of honey bees, dismissing the traditional concept of generation. The main goal is to decide which parallel design leads to improved parallel performance while keeping the multiobjective and biological quality of the results. Each version will be assessed by experimentation on six real nucleotide data sets, making comparisons with other biological and parallel methods for inferring phylogenies.
This paper is organized as follows. The next section highlights the relationship between parallelism and bioinspired computing and reviews their applications to the field of phylogenetics. Section 3 introduces the problem, discussing its computational complexity and the biological relevance of multiobjective optimization. In Section 4, we detail the algorithmic design of MOABC, defining the proposed parallelization schemes in Section 5. The experimental analysis of parallel, multiobjective, and biological performance is found in Section 6. Finally, Section 7 includes conclusions and future work.
Section snippets
Related work
This section provides insight into the different proposals reported in the literature to combine bioinspired computing and parallelism, focusing on those approaches published to address the phylogenetic inference problem.
One of the most important characteristics shown by bioinspired algorithms is their suitability to be improved by means of parallel computing. These methods often involve a population of solutions whose evolution can be carried out by mapping the computation of evolutionary
Phylogenetic reconstruction problem
This section introduces the basis of the problem tackled in this research, analyses its algorithmic complexity, and discusses the biological implications of applying multiobjective optimization to phylogenetics.
A bee-inspired approach for inferring phylogenies
Honey bees in nature are governed by a number of rules which allow them to share information about food sources to guarantee the survival of the hive [47]. The division of tasks in honey bee swarms is organized by defining three classes of bees [29]. Those bees which exploit the food sources currently known by the hive are called employed bees. When exploiting these sources, employed bees also check the environment for promising food sources in the neighbourhood, retaining a useful information
MOABC parallel implementations
This research focuses on studying parallel implementations of MOABC to address high-complexity optimization problems like phylogenetic inference on shared memory SMP architectures. With the aim of exploiting such architectures, we propose two different parallel designs based on the standard OpenMP: a synchronous generational implementation and a non-generational implementation inspired by the asynchronous behaviour of honey bees.
When developing parallel algorithms, we must undertake firstly the
Experimental results
This section details the experimental methodology we have used to measure the performance of our parallel designs, reporting results from three different perspectives. Firstly, we will conduct the analysis of parallel scalability for SG-MOABC and AN-MOABC, studying their speedup factors and efficiency values for increasing problem and system sizes. Secondly, the evaluation of multiobjective performance will be carried out, checking if search capabilities are affected by the strategy used to
Conclusions and future lines
In this research, we have studied synchronous and asynchronous designs to parallelize a multiobjective proposal inspired by honey bees to conduct phylogenetic analyses on shared memory multicore architectures. The synchronous approach implemented in SG-MOABC aims to minimize execution times by parallelizing the most computationally demanding sections included in the main loop of MOABC, maintaining the generational behaviour of the sequential algorithm. On the other hand, the asynchronous
Acknowledgments
This work was partially funded by the Spanish Ministry of Economy and Competitiveness and the ERDF (European Regional Development Fund), under the contract TIN2012-30685 (BIO project). Sergio Santander-Jiménez is supported by the grant FPU12/04101 from the Spanish Government.
References (56)
- et al.
Computational grand challenges in assembling the tree of life: problems and solutions
Advances in Computers
(2006) - et al.
Hybrid MPI/Pthreads parallelization of the RAxML phylogenetics code
Proceedings of HiCOMB 2010
(2010) MrBayes 3.2: Efficient Bayesian phylogenetic inference and model choice across a large model space
Syst. Biol.
(2012)- et al.
Evaluating the performance of a parallel multiobjective artificial bee colony algorithm for inferring phylogenies on multicore architectures
Proceedings of ISPA 2012
(2012) - et al.
Asynchronous master/slave MOEAs and heterogeneous evaluation costs
Genetic and Evolutionary Computation Conference
(2012) Beagle: an application programming interface and high-performance computing library for statistical phylogenetics
Syst. Biol.
(2012)- et al.
Genetic algorithms and parallel processing in maximum-likelihood phylogeny inference
Mol. Biol. Evol.
(2002) A classification of consensus methods for phylogenies
BioConsensus
(2003)- et al.
Likelihood calculations in molecular phylogenetics
Mathematics of Evolution and Phylogeny
(2005) Explaining optimization in genetic algorithms with uniform crossover
Proceedings of the Twelfth Workshop on Foundations of Genetic Algorithms, FOGA XII ’13
(2013)
Assessing systematic error in the inference of seed plant phylogeny
Int. J. Plant Sci.
A multi-criterion evolutionary approach applied to phylogenetic reconstruction
New Achievements in Evolutionary Computation
Parallel multi-objective approaches for inferring phylogenies
Proceedings of EVOBIO’2010, vol. 6023: LNCS
Using OpenMP: Portable Shared Memory Parallel Programming
Phylogenetics of seed plants: an analysis of nucleotide sequences from the plastid gene rbcL
Ann. Mo. Bot. Gard.
Phylogenetic evidence for a major reversal of life history evolution in Plethodontid salamanders
Evolution
An immune-inspired multi–objective approach to the reconstruction of phylogenetic trees
Neural Comput. Appl.
Advances in Multi-Objective Nature Inspired Computing
The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis
Nucleic Acids Res.
Phylogenetic trees using evolutionary search: initial progress in extending gaphyl to work with genetic data
CEC 2003
jModelTest 2: more models, new heuristics and parallel computing
Nat. Methods
A fast and elitist multi-objective genetic algorithm: NSGA-II
IEEE Trans. Evol. Comput.
A study of master-slave approaches to parallelize NSGA-II
Proceedings of IPDPS 2008
TNT, a free program for phylogenetic analysis
Cladistics
Dataflow parallelism in genetic algorithms
Parallel Problem Solving from Nature 2
Bio++: efficient extensible libraries and tools for computational molecular evolution
Mol. Biol. Evol.
Cited by (9)
Performance evaluation of dominance-based and indicator-based multi objective approaches for phylogenetic inference
2016, Information SciencesCitation Excerpt :Complete details on MOABC design are available in [37]. In this study, we consider an improved implementation (reported in [40]) which introduces distance-based individual representation and evolutionary operators into MOABC. In order to conduct the comparison of multiobjective performance between IMOBA and MOABC, we will evaluate the MOABC Pareto sets generated from 31 independent runs per dataset by using the hypervolume, set coverage, and spacing metrics.
Parallelization of Swarm Intelligence Algorithms: Literature Review
2022, International Journal of Parallel ProgrammingA New Model Of Parallel Particle Swarm Optimization Algorithm For Solving Numerical Problems
2021, Malaysian Journal of Computer ScienceA Memetic Algorithm Based on an NSGA-II Scheme for Phylogenetic Tree Inference
2019, IEEE Transactions on Evolutionary ComputationAnalysis of MOEA/D Approaches for Inferring Ancestral Relationships
2019, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)