On the design of shared memory approaches to parallelize a multiobjective bee-inspired proposal for phylogenetic reconstruction

doi:10.1016/j.ins.2015.06.040

Information Sciences

Volume 324, 10 December 2015, Pages 163-185

https://doi.org/10.1016/j.ins.2015.06.040 Get rights and content

Abstract

Current efforts in solving computationally demanding optimization problems in bioinformatics rely on the combination of bioinspired computing and parallelism. The multiobjective nature shown by a wide variety of these problems represents an additional challenge, as the optimization of multiple objective functions involves growing computational requirements. In this work, a multiobjective metaheuristic inspired by honey bees is applied to tackle the phylogenetic inference problem. For this purpose, two parallel implementations for shared memory architectures are proposed: a synchronous generational model and a novel non-generational design inspired by the asynchronous behaviour of bees in nature. Experiments on six real biological data sets and comparisons with other parallel biological methods point out the relevance of applying nature-inspired parallelization strategies, addressing the performance pitfalls shown by traditional parallelization schemes.

Introduction

In the last decade, the growing availability of parallel architectures has opened the door to novel programming models where multiple processing units collaborate to address computationally demanding problems. In this sense, several research lines in bioinformatics try to overcome challenging biological problems by taking advantage of shared memory Multiple Instruction Multiple Data (MIMD) multiprocessor systems, commonly represented by symmetric multiprocessors (SMP). A wide variety of biological processes can be modelled as optimization problems which seek to explore a search space Φ with the aim of obtaining an optimal solution x ∈ Φ in accordance with an objective function $f : Φ \to R$ . As pointed out in [49], the NP-hard complexities shown by such problems (including multiple sequence alignment, DNA fragment assembling, biological sequence analysis, etc. [56]) motivate that the current needs of biological processing cannot be satisfied by using traditional serial approaches. This issue has led to an increasing interest in developing parallel algorithms for computational biology.

In order to model the behaviour of real natural systems, biological optimization procedures are often conducted by considering a number of different principles, defined as optimality criteria. This multiobjective nature has motivated the formulation of a wide range of biological problems as Multiobjective Optimization Problems (MOPs) [14]. When addressing MOPs, we aim to find a Pareto set containing solutions which optimize two or more objective functions simultaneously. This idea represents an additional challenge, as the process of finding a good approximation to the Pareto-optimal set implies a high computational cost. Recent advances in parallel architectures represent an opportunity to deal with these problems, giving support to proposals which combine bioinspired computing and parallelism.

The development of parallel metaheuristics follows two main directions [51]. Firstly, the self-contained parallel cooperation approach aims to improve the quality of results by running independent algorithms which cooperate through the interchange of promising solutions. Secondly, the idea of intra-algorithm parallelization focuses on reducing the execution times required to run an algorithm by distributing tasks among processes or threads. In bioinformatics, this second work line represents the most popular approach to make feasible the analysis of growing amounts of biological sequences. By developing parallel metaheuristics, major computational challenges in bioinformatics can be addressed. In this research, we address one of the most outstanding problems in this field, phylogenetic inference [32].

By reconstructing the evolutionary history of living species, phylogeneticists provide useful knowledge in a number of scientific fields, including biomedicine, chemistry, and paleontology. The need to infer satisfying phylogenies over exponentially increasing search spaces explains the NP-hard complexity of the problem and, therefore, the close relationship between phylogenetics, parallelism, and bioinspired computing. Throughout the years, phylogenetic searches have been carried out following optimality criteria which measure the quality of solutions. However, multiple researches [7], [35] have given account of the major sources of incongruence introduced by different optimality criteria, reporting conflicting evolutionary histories. This key issue can be addressed by formulating a multiobjective view of phylogenetics. By applying multiobjective approaches, we can describe evolutionary trees supported by multiple objective functions simultaneously. In this way, biologists will be able to infer a set of alternative trade-off solutions, along with high-quality phylogenetic topologies according to each objective separately.

In this paper, we tackle the phylogeny reconstruction problem under two criteria: parsimony and likelihood. For this purpose, we study shared memory schemes based on OpenMP [10] to parallelize a Multiobjective Artificial Bee Colony (MOABC) algorithm for phylogenetics [45]. In order to exploit pure shared memory environments, two parallel approaches are proposed. Firstly, a synchronous generational design which parallelizes the main sections that compose each generation of the algorithm, in such a way that the next generation does not take place until all the tasks in the previous one have been completed. Secondly, an asynchronous nature-inspired scheme designed to model the inherently asynchronous behaviour of honey bees, dismissing the traditional concept of generation. The main goal is to decide which parallel design leads to improved parallel performance while keeping the multiobjective and biological quality of the results. Each version will be assessed by experimentation on six real nucleotide data sets, making comparisons with other biological and parallel methods for inferring phylogenies.

This paper is organized as follows. The next section highlights the relationship between parallelism and bioinspired computing and reviews their applications to the field of phylogenetics. Section 3 introduces the problem, discussing its computational complexity and the biological relevance of multiobjective optimization. In Section 4, we detail the algorithmic design of MOABC, defining the proposed parallelization schemes in Section 5. The experimental analysis of parallel, multiobjective, and biological performance is found in Section 6. Finally, Section 7 includes conclusions and future work.

Section snippets

Related work

This section provides insight into the different proposals reported in the literature to combine bioinspired computing and parallelism, focusing on those approaches published to address the phylogenetic inference problem.

One of the most important characteristics shown by bioinspired algorithms is their suitability to be improved by means of parallel computing. These methods often involve a population of solutions whose evolution can be carried out by mapping the computation of evolutionary

Phylogenetic reconstruction problem

This section introduces the basis of the problem tackled in this research, analyses its algorithmic complexity, and discusses the biological implications of applying multiobjective optimization to phylogenetics.

A bee-inspired approach for inferring phylogenies

Honey bees in nature are governed by a number of rules which allow them to share information about food sources to guarantee the survival of the hive [47]. The division of tasks in honey bee swarms is organized by defining three classes of bees [29]. Those bees which exploit the food sources currently known by the hive are called employed bees. When exploiting these sources, employed bees also check the environment for promising food sources in the neighbourhood, retaining a useful information

MOABC parallel implementations

This research focuses on studying parallel implementations of MOABC to address high-complexity optimization problems like phylogenetic inference on shared memory SMP architectures. With the aim of exploiting such architectures, we propose two different parallel designs based on the standard OpenMP: a synchronous generational implementation and a non-generational implementation inspired by the asynchronous behaviour of honey bees.

When developing parallel algorithms, we must undertake firstly the

Experimental results

This section details the experimental methodology we have used to measure the performance of our parallel designs, reporting results from three different perspectives. Firstly, we will conduct the analysis of parallel scalability for SG-MOABC and AN-MOABC, studying their speedup factors and efficiency values for increasing problem and system sizes. Secondly, the evaluation of multiobjective performance will be carried out, checking if search capabilities are affected by the strategy used to

Conclusions and future lines

In this research, we have studied synchronous and asynchronous designs to parallelize a multiobjective proposal inspired by honey bees to conduct phylogenetic analyses on shared memory multicore architectures. The synchronous approach implemented in SG-MOABC aims to minimize execution times by parallelizing the most computationally demanding sections included in the main loop of MOABC, maintaining the generational behaviour of the sequential algorithm. On the other hand, the asynchronous

Acknowledgments

This work was partially funded by the Spanish Ministry of Economy and Competitiveness and the ERDF (European Regional Development Fund), under the contract TIN2012-30685 (BIO project). Sergio Santander-Jiménez is supported by the grant FPU12/04101 from the Spanish Government.

References (56)

D.A. Bader et al.
Computational grand challenges in assembling the tree of life: problems and solutions
Advances in Computers
(2006)
W. Pfeiffer et al.
Hybrid MPI/Pthreads parallelization of the RAxML phylogenetics code
Proceedings of HiCOMB 2010
(2010)
F. Ronquist
MrBayes 3.2: Efficient Bayesian phylogenetic inference and model choice across a large model space
Syst. Biol.
(2012)
S. Santander-Jiménez et al.
Evaluating the performance of a parallel multiobjective artificial bee colony algorithm for inferring phylogenies on multicore architectures
Proceedings of ISPA 2012
(2012)
M. Yagoubi et al.
Asynchronous master/slave MOEAs and heterogeneous evaluation costs
Genetic and Evolutionary Computation Conference
(2012)
D.L. Ayres
Beagle: an application programming interface and high-performance computing library for statistical phylogenetics
Syst. Biol.
(2012)
M.J. Brauer et al.
Genetic algorithms and parallel processing in maximum-likelihood phylogeny inference
Mol. Biol. Evol.
(2002)
D. Bryant
A classification of consensus methods for phylogenies
BioConsensus
(2003)
D. Bryant et al.
Likelihood calculations in molecular phylogenetics
Mathematics of Evolution and Phylogeny
(2005)
K.M. Burjorjee
Explaining optimization in genetic algorithms with uniform crossover
Proceedings of the Twelfth Workshop on Foundations of Genetic Algorithms, FOGA XII ’13
(2013)

J.G. Burleigh et al.

Assessing systematic error in the inference of seed plant phylogeny

Int. J. Plant Sci.

(2007)

W. Cancino et al.

A multi-criterion evolutionary approach applied to phylogenetic reconstruction

New Achievements in Evolutionary Computation

(2010)

W. Cancino et al.

Parallel multi-objective approaches for inferring phylogenies

Proceedings of EVOBIO’2010, vol. 6023: LNCS

(2010)

B. Chapman et al.

Using OpenMP: Portable Shared Memory Parallel Programming

(2007)

M.W. Chase

Phylogenetics of seed plants: an analysis of nucleotide sequences from the plastid gene rbcL

Ann. Mo. Bot. Gard.

(1993)

P.T. Chippindale et al.

Phylogenetic evidence for a major reversal of life history evolution in Plethodontid salamanders

Evolution

(2004)

G.P. Coelho et al.

An immune-inspired multi–objective approach to the reconstruction of phylogenetic trees

Neural Comput. Appl.

(2010)

C. Coello et al.

Advances in Multi-Objective Nature Inspired Computing

(2010)

J.R. Cole

The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis

Nucleic Acids Res.

(2005)

C.B. Congdon et al.

Phylogenetic trees using evolutionary search: initial progress in extending gaphyl to work with genetic data

CEC 2003

(2003)

D. Darriba et al.

jModelTest 2: more models, new heuristics and parallel computing

Nat. Methods

(2012)

K. Deb et al.

A fast and elitist multi-objective genetic algorithm: NSGA-II

IEEE Trans. Evol. Comput.

(2002)

J.J. Durillo et al.

A study of master-slave approaches to parallelize NSGA-II

Proceedings of IPDPS 2008

(2008)

J. Felsenstein, 2000, PHYLIP (phylogeny inference package),...

P.A. Goloboff et al.

TNT, a free program for phylogenetic analysis

Cladistics

(2008)

V.S. Gordon et al.

Dataflow parallelism in genetic algorithms

Parallel Problem Solving from Nature 2

(1992)

L. Guéquen

Bio++: efficient extensible libraries and tools for computational molecular evolution

Mol. Biol. Evol.

(2013)

2005, HIV Sequence Database,...

Cited by (9)

Performance evaluation of dominance-based and indicator-based multi objective approaches for phylogenetic inference
2016, Information Sciences
Citation Excerpt :
Complete details on MOABC design are available in [37]. In this study, we consider an improved implementation (reported in [40]) which introduces distance-based individual representation and evolutionary operators into MOABC. In order to conduct the comparison of multiobjective performance between IMOBA and MOABC, we will evaluate the MOABC Pareto sets generated from 31 independent runs per dataset by using the hypervolume, set coverage, and spacing metrics.
One of the main research lines in bioinformatics focuses on the optimization of biological processes involving several objective functions. Due to the variety of multiobjective strategies which are available, comparative studies are needed to decide which algorithmic designs lead to improved results. This work tackles the inference of phylogenetic relationships by means of multiobjective metaheuristics. More specifically, we perform the comparative assessment of two lines of multiobjective schemes: dominance-based and indicator-based approaches. On the dominance-based side, we consider two algorithms: Fast Non-Dominated Sorting Genetic Algorithm II and Strength Pareto Evolutionary Algorithm 2. Indicator-based designs are represented by the Indicator-Based Evolutionary Algorithm and a new Indicator-Based Multiobjective Bat Algorithm. The experimental evaluation of these methods is conducted over six real biological datasets, making comparisons with multiple state-of-the-art phylogenetic tools. Our experimentation verifies the significant performance achieved when combining indicator-based approaches and swarm intelligence. Particularly, different multiobjective metrics (hypervolume, set coverage, and spacing) and biological testing procedures highlight the promising results reported by this kind of algorithmic designs.
Parallelization of Swarm Intelligence Algorithms: Literature Review
2022, International Journal of Parallel Programming
A New Model Of Parallel Particle Swarm Optimization Algorithm For Solving Numerical Problems
2021, Malaysian Journal of Computer Science
Parallel MOEA Based on Consensus and Membrane Structure for Inferring Phylogenetic Reconstruction
2020, IEEE Access
A Memetic Algorithm Based on an NSGA-II Scheme for Phylogenetic Tree Inference
2019, IEEE Transactions on Evolutionary Computation
Analysis of MOEA/D Approaches for Inferring Ancestral Relationships
2019, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

View all citing articles on Scopus

View full text

On the design of shared memory approaches to parallelize a multiobjective bee-inspired proposal for phylogenetic reconstruction

Abstract

Introduction

Section snippets

Related work

Phylogenetic reconstruction problem

A bee-inspired approach for inferring phylogenies

MOABC parallel implementations

Experimental results

Conclusions and future lines

Acknowledgments

Syst. Biol.

Beagle: an application programming interface and high-performance computing library for statistical phylogenetics

Syst. Biol.

Genetic algorithms and parallel processing in maximum-likelihood phylogeny inference

Mol. Biol. Evol.

A classification of consensus methods for phylogenies

BioConsensus

Likelihood calculations in molecular phylogenetics

Mathematics of Evolution and Phylogeny

Explaining optimization in genetic algorithms with uniform crossover

Proceedings of the Twelfth Workshop on Foundations of Genetic Algorithms, FOGA XII ’13

Assessing systematic error in the inference of seed plant phylogeny

Int. J. Plant Sci.

A multi-criterion evolutionary approach applied to phylogenetic reconstruction

New Achievements in Evolutionary Computation

Parallel multi-objective approaches for inferring phylogenies

Proceedings of EVOBIO’2010, vol. 6023: LNCS

Using OpenMP: Portable Shared Memory Parallel Programming

Phylogenetics of seed plants: an analysis of nucleotide sequences from the plastid gene rbcL

Ann. Mo. Bot. Gard.

Phylogenetic evidence for a major reversal of life history evolution in Plethodontid salamanders

Evolution

An immune-inspired multi–objective approach to the reconstruction of phylogenetic trees

Neural Comput. Appl.

Advances in Multi-Objective Nature Inspired Computing

The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis

Nucleic Acids Res.

Phylogenetic trees using evolutionary search: initial progress in extending gaphyl to work with genetic data

CEC 2003

jModelTest 2: more models, new heuristics and parallel computing

Nat. Methods

A fast and elitist multi-objective genetic algorithm: NSGA-II

IEEE Trans. Evol. Comput.

A study of master-slave approaches to parallelize NSGA-II

Proceedings of IPDPS 2008

TNT, a free program for phylogenetic analysis

Cladistics

Dataflow parallelism in genetic algorithms

Parallel Problem Solving from Nature 2

Bio++: efficient extensible libraries and tools for computational molecular evolution

Mol. Biol. Evol.