Discovery of pathways in protein–protein interaction networks using a genetic algorithm

doi:10.1016/j.datak.2015.04.002

Data & Knowledge Engineering

Volumes 96–97, March–May 2015, Pages 19-31

https://doi.org/10.1016/j.datak.2015.04.002 Get rights and content

Abstract

Biological pathways have played an important role in understanding cell activities and evolution. In order to find these pathways, it is necessary to orient protein–protein interactions, which are usually given in forms of undirected networks or graphs. Previous findings indicate that orienting protein interactions can improve the process of pathway discovery. However, assigning orientation for protein interactions is a combinatorial optimization problem which has been proved to be NP-hard, making it critical to develop efficient algorithms.

This paper proposes a method for orienting protein–protein interaction networks (PPIs) and discovering pathways. For our proposal, the mathematical model of the problem is given and then a genetic algorithm is designed to find the solution for the problem taking into account the problem's characteristics. We conducted multiple runs on the data of yeast PPI networks to test the best option for the problem. The obtained results were compared with a well-known algorithm (ROLS), which was shown to be the best in dealing with this problem, in terms of the run time, fitness function values, and especially the ratio of matching gold standard pathways. The results show the good performance of our approach in addressing this problem.

Introduction

Recently, there is a great interest in protein–protein interaction (PPI) databases, the source of interaction information for case studies in bioinformatics, being aggregated over time from the experimental findings. Given the large amount of PPI data collected, a challenging problem is to get biological insights, in particular to discover biological pathways from the data. Note that edges representing PPIs have been experimentally defined and tested. Certainly the reconstruction of the biological processes of cell (pathway or networks) has attracted a lot of attentions: the reconstruction of regulatory networks [1], [2], [3], [4], [5], the analysis of metabolic networks [6], [7], [8], [9], and the discovery of signaling networks and pathways [10], [11], [12], [13]. However, directionality of interactions in networks has not been thoroughly investigated, while direction is essential in finding how information is moved from one to another. The orientation of the signaling network is more difficult than the regulatory and metabolic networks, due to the lack of orientation information. For example, orientation of gene regulatory network is often determined by transcription factors regulating genes, studies of microRNAs often look for targets and motif studies are implemented upstream of genes [14], [15], [16]. Similarly, metabolic networks are modeled by knowledge about the order of genes and enzymes [17]. Meanwhile, it is a fact that PPI data is almost always undirected; therefore the problem of orienting interaction edges for signal transmission in signaling networks is costly. Typical works in this area can be found in [12], [18], [19] underlining the need for finding an efficient algorithm for edge-orientation in PPI networks, which has been identified as an NP-hard problem.

In [12], the authors presented a random orientation (plus local search) algorithm (ROLS) to perform edge orientation and evaluated calculated results with the data from biological experiments in order to determine if the path found was consistent with the experimental or not. The results were also compared with several algorithms proposed in [20], [21], [22]. When evaluating the algorithm results, the authors found out 37 standard pathways that had been tested through biological experiments. But there were still paths that did not appear in the standard set and such interactions could not occur in biological experiments, even though the objective function values of these pathways were high.

In the framework of this paper, we extended further our preliminary results on PPI edge orienting [23]. In particular, we designed a genetic algorithm (GA) for it. GA is one of the popular and successful computational models in the field of intelligent computing [24], especially for dealing with NP-hard problems. Along with other intelligent computing techniques such as fuzzy computing, neural networks and multi-agent systems, GAs develop more and more strongly and are widely applied in different fields [25], [26]. Our GA design takes into account conflicting elements in PPI networks in order to reduce unnecessary edges, thus greatly improves computing speed. We examined different aspects including running time and objective values. Results showed that our algorithm found a good solution for the problem and this finding was supported by comparison to other algorithms' results. Especially, we answered the question of what is the meaning of the obtained pathways by extending biological validation.

The structure of our paper consists of 5 sections: Section 1 introduces the problem, Section 2 gives general knowledge of the problem and the GA, Section 3 describes in detail the GA algorithm designed to solve the problem posed, Section 4 presents actual experimental data on PPIs of yeast and make an assessment of the results achieved by the algorithm. The final part is the paper conclusion.

Section snippets

Problem of orienting edges in protein interaction networks

Proteins are important components in the cell 's structure. They are involved in most of biological processes. During cell functioning, they interact with each other or with macromolecules such as DNA and RNA. They together form a complex network of interactions to perform biological functions. An example is given in Fig. 1 where the graph shows a part of the network of protein interactions in yeast created by the Cytospase software. From the graph, we can see that the protein interaction

Methodology

The main idea is to design a GA to tailor the orientation problem characteristics making the search process effective. It starts with a randomly initialized population (population P) of individuals in which the number of individuals of the population is a constant natural number n, each individual is represented by the sequence of the chromosomes. Population will be evolved over many generations. The best individual of each generation is kept for the next population and we apply the local

Yeast's interaction network

For experiments, we used the database of yeast PPIs taken from database BioGRID (http://thebiogrid.org). This is an online database of genetic interactions of organisms on a large scale. As mentioned above, this database is extensively updated over time basing on new researches and findings by experiments from biologists. Therefore, for ease of comparison between our results and those of existing algorithms, we use the same database version 2.0.51 BioGrid with the authors [29]. This database is

Conclusion

In this paper, we proposed the GA design for problem of orienting protein interaction network. This is a challenging problem for computational biology. We presented a method to perform population individuals that fit the problem, especially that our designs take into account conflicting elements for solution representation, thus greatly improving computing speed. Results show that our algorithm properly settles this problem. As evidence of the correctness of our algorithm, we find that our

References (37)

E. Ruppin et al.
Metabolic reconstruction, constraint-based analysis and game theory to probe genome-scale metabolic networks
Curr. Opin. Biotechnol.
(2010)
B. Lewis et al.
Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets
Cell
(2005)
S. Cox et al.
Genetically constrained metabolic flux analysis
Metab. Eng.
(2005)
L. Araujo et al.
Structure of morphologically expanded queries: a genetic algorithm approach
Data Knowl. Eng.
(2010)
H. Liu et al.
Sentence identification of biological interactions using Patricia tree generated patterns and genetic algorithm optimized parameters
Data Knowl. Eng.
(2010)
R. Bueno et al.
Genetic algorithms for approximate similarity queries
Data Knowl. Eng.
(2007)
L. Bardwell
A walk-through of the yeast mating pheromone response pathway
Peptides
(2004)
E. Segal et al.
Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data
Nat. Genet.
(2003)
A.A. Margolin et al.
Aracne: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context
BMC Bioinf.
(2006)
M. Grzegorczyk et al.
Improvements in the reconstruction of time-varying gene regulatory networks: dynamic programming and regularization by information sharing among genes
Bioinformatics
(2011)

D.A. Ravcheev et al.

Genomic reconstruction of transcriptional regulatory networks in lactic acid bacteria

BMC Genomics

(2013)

G. xia Liu et al.

Reconstruction of gene regulatory networks based on two-stage Bayesian network structure learning algorithm

J. Bionic Eng.

(2009)

J. Kitagawa et al.

Identifying metabolic pathways and gene regulation networks with evolutionary algorithms

Evol. Comput. Bioinforma.

(2003)

E. Fischer et al.

Large-scale in vivo flux analysis shows rigidity and suboptimal performance of bacillus subtilis metabolism

Nat. Genet.

(2005)

D. McCloskey, B. Palsson, A. M. Feist, Basic and applied uses of genome-scale metabolic network reconstructions of...

J. Scott et al.

Efficient algorithms for detecting signaling pathways in protein interaction networks

J. Comput. Biol.

(2006)

G. Bebek et al.

Pathfinder: mining signal transduction pathway segments from protein–protein interaction networks

BMC Bioinf.

(2007)

A. Gitter et al.

Discovering pathways by orienting edges in protein interaction networks

Nucleic Acids Res.

(2011)

Cited by (6)

Network biology and applications
2021, Bioinformatics: Methods and Applications
A biological system is a network of mutually dependent and thus interconnected components comprising a unified whole. Network biology is a study of how molecules interact and come together to give rise to subcellular machinery that forms the functional units capable of operations that are needed for cell and tissue/organ-level physiological functions. It is an interdisciplinary field comprising of genomics, proteomics, metabolomics, etc., that uses a holistic approach. Various types of biological networks are ecological, gene regulatory network, protein–protein interaction network, and metabolic network among others. Biological networks are predicted, after combining data from various experimental methods, literature mining as well as computational methods and can help in identifying the emergent properties of the whole system through statistical, network, or dynamical models. With the advent of high-throughput omics technologies and resulting biological data explosion, network analysis of the complex biological system to generate realistic models is plausible.
An Enhanced Genetic Algorithm for Determining the Pathways in Protein-Protein Interaction Networks
2023, Research Square
Orienting Conflicted Graph Edges Using Genetic Algorithms to Discover Pathways in Protein-Protein Interaction Networks
2021, IEEE/ACM Transactions on Computational Biology and Bioinformatics
MGT-SM: A method for constructing cellular signal transduction networks
2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics
In silico identification of essential proteins in Corynebacterium pseudotuberculosis based on protein-protein interaction networks
2016, BMC Systems Biology
An overview of bioinformatics methods for modeling biological pathways in yeast
2016, Briefings in Functional Genomics

View full text

EditorialDiscovery of pathways in protein–protein interaction networks using a genetic algorithm

Abstract

Introduction

Section snippets

Problem of orienting edges in protein interaction networks

Methodology

Yeast's interaction network

Conclusion

Curr. Opin. Biotechnol.

Cell

Metab. Eng.

Data Knowl. Eng.

Data Knowl. Eng.

Data Knowl. Eng.

Peptides

Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data

Nat. Genet.

Aracne: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context

BMC Bioinf.

Improvements in the reconstruction of time-varying gene regulatory networks: dynamic programming and regularization by information sharing among genes

Bioinformatics

Genomic reconstruction of transcriptional regulatory networks in lactic acid bacteria

BMC Genomics

Reconstruction of gene regulatory networks based on two-stage Bayesian network structure learning algorithm

J. Bionic Eng.

Identifying metabolic pathways and gene regulation networks with evolutionary algorithms

Evol. Comput. Bioinforma.

Large-scale in vivo flux analysis shows rigidity and suboptimal performance of bacillus subtilis metabolism

Nat. Genet.

Efficient algorithms for detecting signaling pathways in protein interaction networks

J. Comput. Biol.

Pathfinder: mining signal transduction pathway segments from protein–protein interaction networks

BMC Bioinf.

Discovering pathways by orienting edges in protein interaction networks

Nucleic Acids Res.

Editorial
Discovery of pathways in protein–protein interaction networks using a genetic algorithm