A hybrid framework for reverse engineering of robust Gene Regulatory Networks

https://doi.org/10.1016/j.artmed.2017.05.004Get rights and content

Highlights

  • In this paper, a fast and accurate predictor set inference framework which linearly combines some inference methods is proposed.

  • The purpose of the combination of various methods is to increase the accuracy of inferred GRN Please proceed.

Abstract

The inference of Gene Regulatory Networks (GRNs) using gene expression data in order to detect the basic cellular processes is a key issue in biological systems. Inferring GRN correctly requires inferring predictor set accurately. In this paper, a fast and accurate predictor set inference framework which linearly combines some inference methods is proposed. The purpose of the combination of various methods is to increase the accuracy of inferred GRN. The proposed framework offers a linear weighted combination of Pearson Correlation Coefficient (PCC) and two different feature selection approaches, namely: Information Gain (IG) and ReliefF. In order to set the appropriate weights, Genetic Algorithm (GA) is used. Similarity measure is considered as fitness function to guide GA. At the end, based on the obtained weights, the best predictor set of GRN using three aforementioned inference methods is selected and the network topology is formed. Due to the huge volume of gene expression data, GRN inference algorithms should infer GRN at a reasonable runtime. Hence, a novel criterion is provided to evaluate GRNs based on runtime and accuracy. The simulation results using biological data indicate that the proposed framework is fast and more reliable compared to other recent methods [1], [2], [3], [4], [5], [6], [7].

Introduction

Each protein has its own unique amino acid sequence that is determined by the nucleotide sequence of the gene encoding this protein. Furthermore, proteins can act as transcription factors that regulate the expression of other genes; therefore, a living organism can be considered as a complex and interconnected network of molecules connected by biochemical reactions [8]. This regulatory mechanism forms a complex system of sending and receiving signals, which can be inquired to recognize the cell control mechanisms and the relationships among various biological entities. Understanding the relationships among genes and gene regulation through signal transmission is a crucial goal in biological systems [2], [9], [10].

The development of technologies to extract gene expression data like microarray DNA [11], SAGE [12] and also RNA Sequencing [13] have allowed to rapidly measure tens of thousands of gene expressions at once. By increasing the availability of these data, the researchers have focused on the interaction among genes and their functionality. Interactions among genes form a complex and interconnected network called Gene Regulatory Network (GRN). GRNs are essential to uncover details about key principles of biological systems and can be used to explain how cells control the expression of genes. Generally, GRN is a worthy approach to show the cell behavior through modeling relationships among genes and the effects of a set of genes on the another set of genes. The correct construction of GRN has various usages, some of the most remarkable of which are examining the behavior of a set of genes, identifying the occurrence of biological processes as well as faults in the processes (disease) and last but not least, prescribing the most effective drug treatment (removing faults). In a GRN, a ‘predictor’ regulates a target gene; moreover, a group of predictors that regulates a target gene is called ‘predictor subset’ and whole set of predictor subsets in a GRN is called ‘predictor set’ [10]. Fig. 1 shows the predictor subset and predictor set in a GRN.

The inference of GRN using gene expression data, which is also known as reverse engineering, is a crucial and difficult task [14]. Although many methods have been developed to infer GRNs from gene expression data [15], the major challenge in this research area is to infer GRNs based on purely observed gene expression data. Generally, the inference process without fault is impossible due to the lack of enough biological information. Some of the factors that make the inference process hard and challenging task are: lack of precision to measure the gene expressions that leads to create noisy data, the huge volume of genes and the small sample size [16]; thus, still there is a need of efficient methods to infer reliable GRNs. There are several recent initiatives to overcome data limitations by incorporating other biological information to discover the dependency of genes [9], [17].

The purpose of this paper is to provide a fast and efficient framework to infer the predictor set in the GRN, taking into account the limitations of gene expression data. The process of predictor set inference consists of realizing the dependence of target genes and their potential predictors. In this work, a novel framework is proposed to infer predictor set in GRN which linearly combines some inference methods. As a matter of fact, the proposed framework offers a linear weighted combination of Pearson Correlation Coefficients (PCC), Information Gain (IG) and ReliefF scores. To fine-tune the weights, Genetic Algorithm (GA) is used and moreover similarity criterion as the fitness function is utilized to guide the GA. The proposed framework is investigated using biological data based on weights inferred from GA. Besides, a novel evaluation criterion is proposed based on runtime and accuracy of GRN inference methods. Experimental results on biological data reveal that despite the large number of genes and the small size of samples, the proposed framework can infer predictor set of the GRN in a robust manner.

The main contribution of this paper can be stated as follows:

  • Utilizing an ensemble filter feature selection method (the combination of IG and ReliefF) combining with PCC instead of using only one feature selection method results in higher accuracy in terms of inferring the predictor set.

  • In order to infer the weight each of the aforementioned methods executes the GA on a small subset of the data. This results in a higher accuracy avoiding over fitting when analyzing big datasets and in reduced runtime.

  • Proposing a novel measure to consider time and accuracy when the algorithm is evaluated because time is a remarkable issue especially in big data.

To put it in a nutshell, the aforementioned points bring in a better accuracy and lower runtime regarding to other similar methods.

The rest of this paper is organized as follows. Section 2 introduces the related works for GRN inference. The proposed method is introduced in Section 3. Section 4 performs a preliminary comparison between the proposed framework and a recent similar method. Finally, Section 5 draws the conclusions and presents some future works.

Section snippets

Gene Regulatory Network inference

Genes interact indirectly with each other and with other substances in the cell. As a matter of fact, there is a dense set of associations among biological entities that need to be modeled and represented to increase the knowledge in molecular biology [18]. As previously mentioned, with the advent of high-throughput technologies, it has become relatively possible to measure gene expression very quickly [19]. Over the last decade, many researchers have turned to such methodologies with the aim

A hybrid framework to infer predictor set of GRN

In this section, the proposed framework is introduced. PCC and two feature selection methods based on IG and ReliefF are used to infer predictor set in a GRN. As a matter of fact, each of these methods have the ability to infer the topology of GRNs in least run-time (in fact, these methods are able to identify GRN structure online); But the inferred GRN using one of these methods has a low reliability. In order to increase the reliability of the inferred GRNs, the proposed framework combines

Experimental results and discussion

In this section, experimental results of the proposed framework are compared to some recent methods [2]. The proposed framework is implemented in MATLAB. The whole analysis is done using a 2.50 GHz Intel Core i5 processor having 4GB RAM.

Conclusion

To reduce the side effects of medications of genetic diseases, there is a need of treatment based on the person's genetic information. The relationship among genes is a sort of network; thus, it is needed to infer the Gene Regulatory Network (GRN). Inferring GRN correctly requires the accurate inference of the predictor set. In this paper, a fast and accurate predictor set inference framework which linearly combines three methods is proposed. As a matter of fact, the proposed framework offers a

References (40)

  • M. Jafari et al.

    The inference of predictor set in gene regulatory networks using gravitational search algorithm

    2016 1st conference on swarm intelligence and evolutionary computation (CSIEC)

    (2016)
  • A.A. Margolin et al.

    ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context

    BMC Bioinform

    (2006)
  • J. Yu et al.

    Advances to Bayesian network inference for generating causal networks from observational biological data

    Bioinformatics

    (2004)
  • R.D. Jimenez et al.

    One genetic algorithm per gene to infer gene networks from expression data

    Netw Model Anal Health Inform Bioinform

    (2015)
  • D. Voet et al.

    Fundamentals of biochemistry: life at the molecular level

    (2006)
  • P.-C.K. Lin et al.

    Logic synthesis for genetic diseases

    (2014)
  • J.-F. Schmouth et al.

    Combined serial analysis of gene expression and transcription factor binding site prediction identifies novel-candidate-target genes of Nr2e1 in neocortex development

    BMC Genom

    (2015)
  • F. Ozsolak et al.

    RNA sequencing: advances, challenges and opportunities

    Nat Rev Genet

    (2011)
  • D. Marbach et al.

    Wisdom of crowds for robust gene network inference

    Nat Methods

    (2012)
  • C. Lian En et al.

    A review on the computational approaches for gene regulatory network construction

    Comput Biol Med

    (2014)
  • Cited by (0)

    View full text