Elsevier

Information Sciences

Volume 324, 10 December 2015, Pages 163-185
Information Sciences

On the design of shared memory approaches to parallelize a multiobjective bee-inspired proposal for phylogenetic reconstruction

https://doi.org/10.1016/j.ins.2015.06.040Get rights and content

Abstract

Current efforts in solving computationally demanding optimization problems in bioinformatics rely on the combination of bioinspired computing and parallelism. The multiobjective nature shown by a wide variety of these problems represents an additional challenge, as the optimization of multiple objective functions involves growing computational requirements. In this work, a multiobjective metaheuristic inspired by honey bees is applied to tackle the phylogenetic inference problem. For this purpose, two parallel implementations for shared memory architectures are proposed: a synchronous generational model and a novel non-generational design inspired by the asynchronous behaviour of bees in nature. Experiments on six real biological data sets and comparisons with other parallel biological methods point out the relevance of applying nature-inspired parallelization strategies, addressing the performance pitfalls shown by traditional parallelization schemes.

Introduction

In the last decade, the growing availability of parallel architectures has opened the door to novel programming models where multiple processing units collaborate to address computationally demanding problems. In this sense, several research lines in bioinformatics try to overcome challenging biological problems by taking advantage of shared memory Multiple Instruction Multiple Data (MIMD) multiprocessor systems, commonly represented by symmetric multiprocessors (SMP). A wide variety of biological processes can be modelled as optimization problems which seek to explore a search space Φ with the aim of obtaining an optimal solution xΦ in accordance with an objective function f:ΦR. As pointed out in [49], the NP-hard complexities shown by such problems (including multiple sequence alignment, DNA fragment assembling, biological sequence analysis, etc. [56]) motivate that the current needs of biological processing cannot be satisfied by using traditional serial approaches. This issue has led to an increasing interest in developing parallel algorithms for computational biology.

In order to model the behaviour of real natural systems, biological optimization procedures are often conducted by considering a number of different principles, defined as optimality criteria. This multiobjective nature has motivated the formulation of a wide range of biological problems as Multiobjective Optimization Problems (MOPs) [14]. When addressing MOPs, we aim to find a Pareto set containing solutions which optimize two or more objective functions simultaneously. This idea represents an additional challenge, as the process of finding a good approximation to the Pareto-optimal set implies a high computational cost. Recent advances in parallel architectures represent an opportunity to deal with these problems, giving support to proposals which combine bioinspired computing and parallelism.

The development of parallel metaheuristics follows two main directions [51]. Firstly, the self-contained parallel cooperation approach aims to improve the quality of results by running independent algorithms which cooperate through the interchange of promising solutions. Secondly, the idea of intra-algorithm parallelization focuses on reducing the execution times required to run an algorithm by distributing tasks among processes or threads. In bioinformatics, this second work line represents the most popular approach to make feasible the analysis of growing amounts of biological sequences. By developing parallel metaheuristics, major computational challenges in bioinformatics can be addressed. In this research, we address one of the most outstanding problems in this field, phylogenetic inference [32].

By reconstructing the evolutionary history of living species, phylogeneticists provide useful knowledge in a number of scientific fields, including biomedicine, chemistry, and paleontology. The need to infer satisfying phylogenies over exponentially increasing search spaces explains the NP-hard complexity of the problem and, therefore, the close relationship between phylogenetics, parallelism, and bioinspired computing. Throughout the years, phylogenetic searches have been carried out following optimality criteria which measure the quality of solutions. However, multiple researches [7], [35] have given account of the major sources of incongruence introduced by different optimality criteria, reporting conflicting evolutionary histories. This key issue can be addressed by formulating a multiobjective view of phylogenetics. By applying multiobjective approaches, we can describe evolutionary trees supported by multiple objective functions simultaneously. In this way, biologists will be able to infer a set of alternative trade-off solutions, along with high-quality phylogenetic topologies according to each objective separately.

In this paper, we tackle the phylogeny reconstruction problem under two criteria: parsimony and likelihood. For this purpose, we study shared memory schemes based on OpenMP [10] to parallelize a Multiobjective Artificial Bee Colony (MOABC) algorithm for phylogenetics [45]. In order to exploit pure shared memory environments, two parallel approaches are proposed. Firstly, a synchronous generational design which parallelizes the main sections that compose each generation of the algorithm, in such a way that the next generation does not take place until all the tasks in the previous one have been completed. Secondly, an asynchronous nature-inspired scheme designed to model the inherently asynchronous behaviour of honey bees, dismissing the traditional concept of generation. The main goal is to decide which parallel design leads to improved parallel performance while keeping the multiobjective and biological quality of the results. Each version will be assessed by experimentation on six real nucleotide data sets, making comparisons with other biological and parallel methods for inferring phylogenies.

This paper is organized as follows. The next section highlights the relationship between parallelism and bioinspired computing and reviews their applications to the field of phylogenetics. Section 3 introduces the problem, discussing its computational complexity and the biological relevance of multiobjective optimization. In Section 4, we detail the algorithmic design of MOABC, defining the proposed parallelization schemes in Section 5. The experimental analysis of parallel, multiobjective, and biological performance is found in Section 6. Finally, Section 7 includes conclusions and future work.

Section snippets

Related work

This section provides insight into the different proposals reported in the literature to combine bioinspired computing and parallelism, focusing on those approaches published to address the phylogenetic inference problem.

One of the most important characteristics shown by bioinspired algorithms is their suitability to be improved by means of parallel computing. These methods often involve a population of solutions whose evolution can be carried out by mapping the computation of evolutionary

Phylogenetic reconstruction problem

This section introduces the basis of the problem tackled in this research, analyses its algorithmic complexity, and discusses the biological implications of applying multiobjective optimization to phylogenetics.

A bee-inspired approach for inferring phylogenies

Honey bees in nature are governed by a number of rules which allow them to share information about food sources to guarantee the survival of the hive [47]. The division of tasks in honey bee swarms is organized by defining three classes of bees [29]. Those bees which exploit the food sources currently known by the hive are called employed bees. When exploiting these sources, employed bees also check the environment for promising food sources in the neighbourhood, retaining a useful information

MOABC parallel implementations

This research focuses on studying parallel implementations of MOABC to address high-complexity optimization problems like phylogenetic inference on shared memory SMP architectures. With the aim of exploiting such architectures, we propose two different parallel designs based on the standard OpenMP: a synchronous generational implementation and a non-generational implementation inspired by the asynchronous behaviour of honey bees.

When developing parallel algorithms, we must undertake firstly the

Experimental results

This section details the experimental methodology we have used to measure the performance of our parallel designs, reporting results from three different perspectives. Firstly, we will conduct the analysis of parallel scalability for SG-MOABC and AN-MOABC, studying their speedup factors and efficiency values for increasing problem and system sizes. Secondly, the evaluation of multiobjective performance will be carried out, checking if search capabilities are affected by the strategy used to

Conclusions and future lines

In this research, we have studied synchronous and asynchronous designs to parallelize a multiobjective proposal inspired by honey bees to conduct phylogenetic analyses on shared memory multicore architectures. The synchronous approach implemented in SG-MOABC aims to minimize execution times by parallelizing the most computationally demanding sections included in the main loop of MOABC, maintaining the generational behaviour of the sequential algorithm. On the other hand, the asynchronous

Acknowledgments

This work was partially funded by the Spanish Ministry of Economy and Competitiveness and the ERDF (European Regional Development Fund), under the contract TIN2012-30685 (BIO project). Sergio Santander-Jiménez is supported by the grant FPU12/04101 from the Spanish Government.

References (56)

  • J.G. Burleigh et al.

    Assessing systematic error in the inference of seed plant phylogeny

    Int. J. Plant Sci.

    (2007)
  • W. Cancino et al.

    A multi-criterion evolutionary approach applied to phylogenetic reconstruction

    New Achievements in Evolutionary Computation

    (2010)
  • W. Cancino et al.

    Parallel multi-objective approaches for inferring phylogenies

    Proceedings of EVOBIO’2010, vol. 6023: LNCS

    (2010)
  • B. Chapman et al.

    Using OpenMP: Portable Shared Memory Parallel Programming

    (2007)
  • M.W. Chase

    Phylogenetics of seed plants: an analysis of nucleotide sequences from the plastid gene rbcL

    Ann. Mo. Bot. Gard.

    (1993)
  • P.T. Chippindale et al.

    Phylogenetic evidence for a major reversal of life history evolution in Plethodontid salamanders

    Evolution

    (2004)
  • G.P. Coelho et al.

    An immune-inspired multi–objective approach to the reconstruction of phylogenetic trees

    Neural Comput. Appl.

    (2010)
  • C. Coello et al.

    Advances in Multi-Objective Nature Inspired Computing

    (2010)
  • J.R. Cole

    The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis

    Nucleic Acids Res.

    (2005)
  • C.B. Congdon et al.

    Phylogenetic trees using evolutionary search: initial progress in extending gaphyl to work with genetic data

    CEC 2003

    (2003)
  • D. Darriba et al.

    jModelTest 2: more models, new heuristics and parallel computing

    Nat. Methods

    (2012)
  • K. Deb et al.

    A fast and elitist multi-objective genetic algorithm: NSGA-II

    IEEE Trans. Evol. Comput.

    (2002)
  • J.J. Durillo et al.

    A study of master-slave approaches to parallelize NSGA-II

    Proceedings of IPDPS 2008

    (2008)
  • J. Felsenstein, 2000, PHYLIP (phylogeny inference package),...
  • P.A. Goloboff et al.

    TNT, a free program for phylogenetic analysis

    Cladistics

    (2008)
  • V.S. Gordon et al.

    Dataflow parallelism in genetic algorithms

    Parallel Problem Solving from Nature 2

    (1992)
  • L. Guéquen

    Bio++: efficient extensible libraries and tools for computational molecular evolution

    Mol. Biol. Evol.

    (2013)
  • 2005, HIV Sequence Database,...
  • Cited by (9)

    • Performance evaluation of dominance-based and indicator-based multi objective approaches for phylogenetic inference

      2016, Information Sciences
      Citation Excerpt :

      Complete details on MOABC design are available in [37]. In this study, we consider an improved implementation (reported in [40]) which introduces distance-based individual representation and evolutionary operators into MOABC. In order to conduct the comparison of multiobjective performance between IMOBA and MOABC, we will evaluate the MOABC Pareto sets generated from 31 independent runs per dataset by using the hypervolume, set coverage, and spacing metrics.

    • Parallelization of Swarm Intelligence Algorithms: Literature Review

      2022, International Journal of Parallel Programming
    • Analysis of MOEA/D Approaches for Inferring Ancestral Relationships

      2019, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    View all citing articles on Scopus
    View full text