Scalable approach for effective control of gene regulatory networks

https://doi.org/10.1016/j.artmed.2009.10.002Get rights and content

Abstract

Objective: Interactions between genes are realized as gene regulatory networks (GRNs). The control of such networks is essential for investigating issues like different diseases. Control is the process of studying the states and behavior of a given system under different conditions. The system considered in this study is a gene regulatory network (GRN), and one of the most important aspects in the control of GRNs is scalability. Consequently, the objective of this study is to develop a scalable technique that facilitates the control of GRNs.

Method: As the approach described in this paper concentrates on the control of GRNs, we argue that it is possible to improve scalability by reducing the number of genes to be considered by the control policy. Consequently, we propose a novel method that considers gene relevancy to estimate genes that are less important for control. This way, it is possible to get a reduced model after identifying genes that can be ignored in model-building. The latter genes are located based on a threshold value which is expected to be provided by a domain expert. Some guidelines are listed to help the domain expert in setting appropriate threshold value.

Results: We run experiments using both synthetic and real data, including metastatic melanoma and budding yeast (Saccharomyces cerevisiae). The reported test results identified genes that could be eliminated from each of the investigated GRNs. For instance, test results on budding yeast identified the two genes SWI5 and MCM1 as candidates to be eliminated. This considerably reduces the computation cost and hence demonstrate the applicability and effectiveness of the proposed approach.

Conclusion: Employing the proposed reduction strategy results in close to optimal solutions to the control of GRNs, which are otherwise intractable due to the huge state space implied by the large number of genes.

Introduction

Protein synthesis is a key process for living organisms. All proteins are encoded by messenger RNA (mRNA), which is extracted from a gene in DNA; proteins are produced in a process that involves two stages, namely transcription and translation. In transcription, a sequence of the gene is used to produce mRNA, which is then used to create a protein during translation. For a gene to be transcribed into mRNA, it is often necessary for a specific protein called transcription factor to bind to the DNA in a specific location. A transcription factor can have a positive or negative regulatory effect on the binding site. So, the transcription level (or the expression level) of the gene can change based on the binding of the transcription factor. Since the transcription factor is also a protein, which is decoded from a gene, it is possible to describe and discuss a set of interactions among genes; these interactions constitute a GRN.

As described in the literature, there are various methods to represent and model a GRN [1]. These include (dynamic) Bayesian networks, (probabilistic) Boolean networks (BNs), neural networks, petri-net models and differential equation-based models. Modeling may provide an opportunity to estimate the future state of a cell based on the current state and the conditions affecting the cell. To justify the need for control, consider a cell which is estimated to be in an undesirable state (e.g., cancerous state) in the near future; this brings the necessity to intervene the current state of the network in order to avoid reaching undesirable state(s). But, it is important to intervene as efficiently and effectively as possible because of the urgency of the situation and the cost of the intervention. This motivates for the need to control GRNs, the problem may be stated as follows: find an efficient policy to interact (by interventions) with the network in order to change the behavior in a way that satisfies some prespecified objective(s). On the other hand, the size of the state space is the most crucial issue in GRN control; this consideration is common to all control problems. Here it is also worth mentioning that the term control in the context of GRN is slightly different from control theory because of the limitedness of possible intervention means for GRNs.

For a discrete GRN (where the expression levels of genes are discretized), the size of the state space is proportional to the number of genes and the number of levels of discretization for each gene. Even if the expression levels of the genes are discretized to binary levels, the size of the state space is 2N for an N-gene network; this makes the problem hard to cope with even for small values of N. So, to find an efficient policy for the GRN control problem, appropriate methods must be introduced to reduce the state space to a reasonable size, whenever possible and desired.

The relevancy of a given gene in terms of control depends on the objective to be satisfied because genes in a GRN have varying effects on each other; this means that a gene might have minimal or negligible effect on the solution of a GRN control problem. In this paper, we utilize the relevancy measure to propose a kind of feature reduction method capable of identifying genes which are less relevant for control. Such genes are candidates to be eliminated in building a model so that an approximate control policy can be reached faster. The feature reduction process may identify more than one gene as candidates to be eliminated. But, even when one gene is eliminated, the state space reduces significantly. Obviously, this positively reflects on the scalability of the GRN control problem to be investigated.

Generally, the control of GRNs has been studied on Markovian models, e.g., [2], [3], [4], [5], [6], [7], [8], [9], [10]. For instance, Shmulevich et al. [10] considered control in a Markovian model by exploiting the Markov chain theory [11]. It has been shown how to select the gene to intervene in order to minimize the time required to reach some set of desirable states, given the current state. Structural intervention is also considered for reaching desired states [9]. On the other hand, Datta et al. [5] formulated the interventions in terms of altering transition probabilities by using some external control variables. They used dynamic programming to formulate and solve a finite horizon controlled Markov chain, where a horizon is the duration of applying external actions and the Markov chain is defined similar to a Markov decision process [12]. Optimal infinite-horizon control extension of this work is described in [8].

Almost all the above mentioned studies use probabilistic Boolean networks (PBNs) [13] as the Markovian model. A slightly different model in the context of control is investigated in [7], where the switching between the BNs forming a probabilistic Boolean network (PBN) is not performed in every step, but probabilistically. Attractors in a PBN are the states to be reached after a finite number of steps and this stable situation will not change in the absence of perturbations. Control with the lack of the knowledge of switching probabilities between BNs that have common attractors has been studied by Choudhary et al. [4].

A previous study in our group [2], [3] is based on the following argument: if the control process is considered as a treatment then observing the patient after the treatment can also be taken into account while solving the control problem. As a result, the solution described in [2] was developed; it considers a monitoring horizon after the control horizon. The solution is given for various settings depending on the control and monitoring horizons being finite or infinite. The problem was also formulated as a multi-objective problem where the objectives are state cost and state-action cost defined by domain experts [3].

To sum up, the above mentioned works focus on solving the control problem in GRNs for different settings using dynamic programming. A gene is assumed to be relevant if it is chosen for modeling, i.e., it exists in the same GRN with others. But, we observed that the relevancy also depends on the objective(s); and consequently we argue that the component of the GRN we should focus on may change according to the given set of objectives. Based on this argument, we propose a feature reduction method that successfully maintains scalability in the control of GRNs [14], [15]. By feature reduction, we provide the choice to reduce the number of genes to be considered in the control process and hence maintain scalability. Neglecting scalability turns control into an unmanageable process, though control is essential to study and understand the behavior of any given system. To the best of our knowledge, this is a major contribution as the first attempt of applying feature reduction in the context of GRNs; our initial results have encouraged us to expand the work as described in this paper. The results reported in this paper demonstrate the applicability and effectiveness of the proposed approach. Although GRN control studies are not yet directly applicable to clinical practice, the promising results demonstrate the potential to be used in real applications. We reported test results using both synthetic and real gene expression data.

The rest of this paper is organized as follows. Section 2 includes the necessary background information. Section 3 covers the details of the proposed reduction based approach. Section 4 reports experimental results on synthetic and real gene expression data. Section 5 is conclusions and future research directions.

Section snippets

Background

In this section, we cover the background necessary for the scope of the work described in this paper. In particular, we present an overview of the Markov decision problems (MDPs) and discuss the control problem in the context of GRNs.

Scalable control by feature reduction

Feature reduction is the process of finding and excluding from further consideration, features that are expected to have reasonably negligible or minimal effect on the output quality. In general, feature reduction or feature selection is performed to improve the performance of some predictors [17]. The features in the case of gene expression data are the genes, the samples, or both. In this paper, we consider feature reduction as decreasing the number of genes. What we consider as output is the

Experimental results

We have conducted some experiments to demonstrate the applicability and effectiveness of the proposed reduction approach. We used PBNs [13], [10] as the modeling technique. The basic idea in PBNs as different from BNs is to use for each target gene more than one Boolean function. So, PBN is a more general and probabilistic modification of a Boolean network. We used the PBN Toolbox software [13] to derive a PBN from a given data. The algorithm of deriving a PBN from the data depends on a concept

Conclusions and future research directions

In this paper, we proposed a feature reduction based method to handle the problem of finding approximate solution to the control of GRNs. For each gene, a score is computed to estimate whether the gene is relevant in solving the resulting MDP. The score is based on MDP minimization theory and estimation of the degree of the genes in determining the next state of each other. The results are promising in the sense that given a threshold value, the score can be used to remove some genes with

Acknowledgments

The research of Mehmet Tan is partially supported by The Scientific and Technological Research Council of Turkey. The research of Reda Alhajj is partially supported by NSERC, Canada.

References (27)

  • R. Givan et al.

    Equivalence notions and model minimization in markov decision processes

    Artificial Intelligence

    (2003)
  • H. de Jong

    Modeling and simulation of genetic regulatory systems: A literature review”

    Journal of Computational Biology

    (2002)
  • Abul O, Alhajj R, Polat F. Markov decision processes based optimal control policies for probabilistic Boolean networks....
  • Abul O, Alhajj R, Polat F. An optimal multi-objective control method for discrete genetic regulatory networks. In:...
  • A. Choudhary et al.

    Intervention in a family of Boolean networks

    Bioinformatics

    (2006)
  • A. Datta et al.

    External control in markovian genetic regulatory networks

    Machine Learning

    (2003)
  • A. Datta et al.

    External control in markovian genetic regulatory networks: The imperfect information case”

    Bioinformatics

    (2004)
  • R. Pal et al.

    Intervention in context-sensitive probabilistic Boolean networks”

    Bioinformatics

    (2005)
  • R. Pal et al.

    Optimal infinite-horizon control for probabilistic Boolean networks

    IEEE Transactions on Signal Processing

    (2006)
  • I. Shmulevich et al.

    Control of stationary behaviour in probabilistic Boolean networks by means of structural intervention

    Biological Systems

    (2002)
  • I. Shmulevich et al.

    Gene perturbation and intervention in probabilistic Boolean networks

    Bioinformatics

    (2002)
  • V.G. Kulkarni

    Modeling and analysis of stochastic systems

    (1996)
  • R.S. Sutton et al.

    Reinforcement learning

    (1998)
  • Cited by (5)

    • Employing decomposable partially observable Markov decision processes to control gene regulatory networks

      2017, Artificial Intelligence in Medicine
      Citation Excerpt :

      The main focus on these previous works by our group was formulating the GRN in PBN model and trying to solve the MDP problem in different settings and exploring different aspects such as finite or infinite horizon reward mechanisms, factored representations of the MDP problems, and improved modeling and solution techniques for plain and factored MDP problems. Although our techniques have been effective and well received by the research community [22–26], we realized the need for incorporating partial observability in the problem definition; and accordingly, the target of our approach described in this paper is to develop appropriate solutions for the problem augmented to be partially observable. The need to cover partial observability has been realized by some other researchers; however, the problem has not yet received enough and comprehensive attention.

    • Sparse biologically-constrained optimal perturbation of gene regulatory networks

      2013, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
    View full text