Elsevier

Data & Knowledge Engineering

Volumes 96–97, March–May 2015, Pages 19-31
Data & Knowledge Engineering

Editorial
Discovery of pathways in protein–protein interaction networks using a genetic algorithm

https://doi.org/10.1016/j.datak.2015.04.002Get rights and content

Abstract

Biological pathways have played an important role in understanding cell activities and evolution. In order to find these pathways, it is necessary to orient protein–protein interactions, which are usually given in forms of undirected networks or graphs. Previous findings indicate that orienting protein interactions can improve the process of pathway discovery. However, assigning orientation for protein interactions is a combinatorial optimization problem which has been proved to be NP-hard, making it critical to develop efficient algorithms.

This paper proposes a method for orienting protein–protein interaction networks (PPIs) and discovering pathways. For our proposal, the mathematical model of the problem is given and then a genetic algorithm is designed to find the solution for the problem taking into account the problem's characteristics. We conducted multiple runs on the data of yeast PPI networks to test the best option for the problem. The obtained results were compared with a well-known algorithm (ROLS), which was shown to be the best in dealing with this problem, in terms of the run time, fitness function values, and especially the ratio of matching gold standard pathways. The results show the good performance of our approach in addressing this problem.

Introduction

Recently, there is a great interest in protein–protein interaction (PPI) databases, the source of interaction information for case studies in bioinformatics, being aggregated over time from the experimental findings. Given the large amount of PPI data collected, a challenging problem is to get biological insights, in particular to discover biological pathways from the data. Note that edges representing PPIs have been experimentally defined and tested. Certainly the reconstruction of the biological processes of cell (pathway or networks) has attracted a lot of attentions: the reconstruction of regulatory networks [1], [2], [3], [4], [5], the analysis of metabolic networks [6], [7], [8], [9], and the discovery of signaling networks and pathways [10], [11], [12], [13]. However, directionality of interactions in networks has not been thoroughly investigated, while direction is essential in finding how information is moved from one to another. The orientation of the signaling network is more difficult than the regulatory and metabolic networks, due to the lack of orientation information. For example, orientation of gene regulatory network is often determined by transcription factors regulating genes, studies of microRNAs often look for targets and motif studies are implemented upstream of genes [14], [15], [16]. Similarly, metabolic networks are modeled by knowledge about the order of genes and enzymes [17]. Meanwhile, it is a fact that PPI data is almost always undirected; therefore the problem of orienting interaction edges for signal transmission in signaling networks is costly. Typical works in this area can be found in [12], [18], [19] underlining the need for finding an efficient algorithm for edge-orientation in PPI networks, which has been identified as an NP-hard problem.

In [12], the authors presented a random orientation (plus local search) algorithm (ROLS) to perform edge orientation and evaluated calculated results with the data from biological experiments in order to determine if the path found was consistent with the experimental or not. The results were also compared with several algorithms proposed in [20], [21], [22]. When evaluating the algorithm results, the authors found out 37 standard pathways that had been tested through biological experiments. But there were still paths that did not appear in the standard set and such interactions could not occur in biological experiments, even though the objective function values of these pathways were high.

In the framework of this paper, we extended further our preliminary results on PPI edge orienting [23]. In particular, we designed a genetic algorithm (GA) for it. GA is one of the popular and successful computational models in the field of intelligent computing [24], especially for dealing with NP-hard problems. Along with other intelligent computing techniques such as fuzzy computing, neural networks and multi-agent systems, GAs develop more and more strongly and are widely applied in different fields [25], [26]. Our GA design takes into account conflicting elements in PPI networks in order to reduce unnecessary edges, thus greatly improves computing speed. We examined different aspects including running time and objective values. Results showed that our algorithm found a good solution for the problem and this finding was supported by comparison to other algorithms' results. Especially, we answered the question of what is the meaning of the obtained pathways by extending biological validation.

The structure of our paper consists of 5 sections: Section 1 introduces the problem, Section 2 gives general knowledge of the problem and the GA, Section 3 describes in detail the GA algorithm designed to solve the problem posed, Section 4 presents actual experimental data on PPIs of yeast and make an assessment of the results achieved by the algorithm. The final part is the paper conclusion.

Section snippets

Problem of orienting edges in protein interaction networks

Proteins are important components in the cell 's structure. They are involved in most of biological processes. During cell functioning, they interact with each other or with macromolecules such as DNA and RNA. They together form a complex network of interactions to perform biological functions. An example is given in Fig. 1 where the graph shows a part of the network of protein interactions in yeast created by the Cytospase software. From the graph, we can see that the protein interaction

Methodology

The main idea is to design a GA to tailor the orientation problem characteristics making the search process effective. It starts with a randomly initialized population (population P) of individuals in which the number of individuals of the population is a constant natural number n, each individual is represented by the sequence of the chromosomes. Population will be evolved over many generations. The best individual of each generation is kept for the next population and we apply the local

Yeast's interaction network

For experiments, we used the database of yeast PPIs taken from database BioGRID (http://thebiogrid.org). This is an online database of genetic interactions of organisms on a large scale. As mentioned above, this database is extensively updated over time basing on new researches and findings by experiments from biologists. Therefore, for ease of comparison between our results and those of existing algorithms, we use the same database version 2.0.51 BioGrid with the authors [29]. This database is

Conclusion

In this paper, we proposed the GA design for problem of orienting protein interaction network. This is a challenging problem for computational biology. We presented a method to perform population individuals that fit the problem, especially that our designs take into account conflicting elements for solution representation, thus greatly improving computing speed. Results show that our algorithm properly settles this problem. As evidence of the correctness of our algorithm, we find that our

References (37)

  • D.A. Ravcheev et al.

    Genomic reconstruction of transcriptional regulatory networks in lactic acid bacteria

    BMC Genomics

    (2013)
  • G. xia Liu et al.

    Reconstruction of gene regulatory networks based on two-stage Bayesian network structure learning algorithm

    J. Bionic Eng.

    (2009)
  • J. Kitagawa et al.

    Identifying metabolic pathways and gene regulation networks with evolutionary algorithms

    Evol. Comput. Bioinforma.

    (2003)
  • E. Fischer et al.

    Large-scale in vivo flux analysis shows rigidity and suboptimal performance of bacillus subtilis metabolism

    Nat. Genet.

    (2005)
  • D. McCloskey, B. Palsson, A. M. Feist, Basic and applied uses of genome-scale metabolic network reconstructions of...
  • J. Scott et al.

    Efficient algorithms for detecting signaling pathways in protein interaction networks

    J. Comput. Biol.

    (2006)
  • G. Bebek et al.

    Pathfinder: mining signal transduction pathway segments from protein–protein interaction networks

    BMC Bioinf.

    (2007)
  • A. Gitter et al.

    Discovering pathways by orienting edges in protein interaction networks

    Nucleic Acids Res.

    (2011)
  • Cited by (6)

    View full text