Scalable approach for effective control of gene regulatory networks

doi:10.1016/j.artmed.2009.10.002

Artificial Intelligence in Medicine

Volume 48, Issue 1, January 2010, Pages 51-59

https://doi.org/10.1016/j.artmed.2009.10.002 Get rights and content

Abstract

Objective: Interactions between genes are realized as gene regulatory networks (GRNs). The control of such networks is essential for investigating issues like different diseases. Control is the process of studying the states and behavior of a given system under different conditions. The system considered in this study is a gene regulatory network (GRN), and one of the most important aspects in the control of GRNs is scalability. Consequently, the objective of this study is to develop a scalable technique that facilitates the control of GRNs.

Method: As the approach described in this paper concentrates on the control of GRNs, we argue that it is possible to improve scalability by reducing the number of genes to be considered by the control policy. Consequently, we propose a novel method that considers gene relevancy to estimate genes that are less important for control. This way, it is possible to get a reduced model after identifying genes that can be ignored in model-building. The latter genes are located based on a threshold value which is expected to be provided by a domain expert. Some guidelines are listed to help the domain expert in setting appropriate threshold value.

Results: We run experiments using both synthetic and real data, including metastatic melanoma and budding yeast (Saccharomyces cerevisiae). The reported test results identified genes that could be eliminated from each of the investigated GRNs. For instance, test results on budding yeast identified the two genes SWI5 and MCM1 as candidates to be eliminated. This considerably reduces the computation cost and hence demonstrate the applicability and effectiveness of the proposed approach.

Conclusion: Employing the proposed reduction strategy results in close to optimal solutions to the control of GRNs, which are otherwise intractable due to the huge state space implied by the large number of genes.

Introduction

Protein synthesis is a key process for living organisms. All proteins are encoded by messenger RNA (mRNA), which is extracted from a gene in DNA; proteins are produced in a process that involves two stages, namely transcription and translation. In transcription, a sequence of the gene is used to produce mRNA, which is then used to create a protein during translation. For a gene to be transcribed into mRNA, it is often necessary for a specific protein called transcription factor to bind to the DNA in a specific location. A transcription factor can have a positive or negative regulatory effect on the binding site. So, the transcription level (or the expression level) of the gene can change based on the binding of the transcription factor. Since the transcription factor is also a protein, which is decoded from a gene, it is possible to describe and discuss a set of interactions among genes; these interactions constitute a GRN.

As described in the literature, there are various methods to represent and model a GRN [1]. These include (dynamic) Bayesian networks, (probabilistic) Boolean networks (BNs), neural networks, petri-net models and differential equation-based models. Modeling may provide an opportunity to estimate the future state of a cell based on the current state and the conditions affecting the cell. To justify the need for control, consider a cell which is estimated to be in an undesirable state (e.g., cancerous state) in the near future; this brings the necessity to intervene the current state of the network in order to avoid reaching undesirable state(s). But, it is important to intervene as efficiently and effectively as possible because of the urgency of the situation and the cost of the intervention. This motivates for the need to control GRNs, the problem may be stated as follows: find an efficient policy to interact (by interventions) with the network in order to change the behavior in a way that satisfies some prespecified objective(s). On the other hand, the size of the state space is the most crucial issue in GRN control; this consideration is common to all control problems. Here it is also worth mentioning that the term control in the context of GRN is slightly different from control theory because of the limitedness of possible intervention means for GRNs.

For a discrete GRN (where the expression levels of genes are discretized), the size of the state space is proportional to the number of genes and the number of levels of discretization for each gene. Even if the expression levels of the genes are discretized to binary levels, the size of the state space is $2^{N}$ for an N-gene network; this makes the problem hard to cope with even for small values of N. So, to find an efficient policy for the GRN control problem, appropriate methods must be introduced to reduce the state space to a reasonable size, whenever possible and desired.

The relevancy of a given gene in terms of control depends on the objective to be satisfied because genes in a GRN have varying effects on each other; this means that a gene might have minimal or negligible effect on the solution of a GRN control problem. In this paper, we utilize the relevancy measure to propose a kind of feature reduction method capable of identifying genes which are less relevant for control. Such genes are candidates to be eliminated in building a model so that an approximate control policy can be reached faster. The feature reduction process may identify more than one gene as candidates to be eliminated. But, even when one gene is eliminated, the state space reduces significantly. Obviously, this positively reflects on the scalability of the GRN control problem to be investigated.

Generally, the control of GRNs has been studied on Markovian models, e.g., [2], [3], [4], [5], [6], [7], [8], [9], [10]. For instance, Shmulevich et al. [10] considered control in a Markovian model by exploiting the Markov chain theory [11]. It has been shown how to select the gene to intervene in order to minimize the time required to reach some set of desirable states, given the current state. Structural intervention is also considered for reaching desired states [9]. On the other hand, Datta et al. [5] formulated the interventions in terms of altering transition probabilities by using some external control variables. They used dynamic programming to formulate and solve a finite horizon controlled Markov chain, where a horizon is the duration of applying external actions and the Markov chain is defined similar to a Markov decision process [12]. Optimal infinite-horizon control extension of this work is described in [8].

Almost all the above mentioned studies use probabilistic Boolean networks (PBNs) [13] as the Markovian model. A slightly different model in the context of control is investigated in [7], where the switching between the BNs forming a probabilistic Boolean network (PBN) is not performed in every step, but probabilistically. Attractors in a PBN are the states to be reached after a finite number of steps and this stable situation will not change in the absence of perturbations. Control with the lack of the knowledge of switching probabilities between BNs that have common attractors has been studied by Choudhary et al. [4].

A previous study in our group [2], [3] is based on the following argument: if the control process is considered as a treatment then observing the patient after the treatment can also be taken into account while solving the control problem. As a result, the solution described in [2] was developed; it considers a monitoring horizon after the control horizon. The solution is given for various settings depending on the control and monitoring horizons being finite or infinite. The problem was also formulated as a multi-objective problem where the objectives are state cost and state-action cost defined by domain experts [3].

To sum up, the above mentioned works focus on solving the control problem in GRNs for different settings using dynamic programming. A gene is assumed to be relevant if it is chosen for modeling, i.e., it exists in the same GRN with others. But, we observed that the relevancy also depends on the objective(s); and consequently we argue that the component of the GRN we should focus on may change according to the given set of objectives. Based on this argument, we propose a feature reduction method that successfully maintains scalability in the control of GRNs [14], [15]. By feature reduction, we provide the choice to reduce the number of genes to be considered in the control process and hence maintain scalability. Neglecting scalability turns control into an unmanageable process, though control is essential to study and understand the behavior of any given system. To the best of our knowledge, this is a major contribution as the first attempt of applying feature reduction in the context of GRNs; our initial results have encouraged us to expand the work as described in this paper. The results reported in this paper demonstrate the applicability and effectiveness of the proposed approach. Although GRN control studies are not yet directly applicable to clinical practice, the promising results demonstrate the potential to be used in real applications. We reported test results using both synthetic and real gene expression data.

The rest of this paper is organized as follows. Section 2 includes the necessary background information. Section 3 covers the details of the proposed reduction based approach. Section 4 reports experimental results on synthetic and real gene expression data. Section 5 is conclusions and future research directions.

Section snippets

Background

In this section, we cover the background necessary for the scope of the work described in this paper. In particular, we present an overview of the Markov decision problems (MDPs) and discuss the control problem in the context of GRNs.

Scalable control by feature reduction

Feature reduction is the process of finding and excluding from further consideration, features that are expected to have reasonably negligible or minimal effect on the output quality. In general, feature reduction or feature selection is performed to improve the performance of some predictors [17]. The features in the case of gene expression data are the genes, the samples, or both. In this paper, we consider feature reduction as decreasing the number of genes. What we consider as output is the

Experimental results

We have conducted some experiments to demonstrate the applicability and effectiveness of the proposed reduction approach. We used PBNs [13], [10] as the modeling technique. The basic idea in PBNs as different from BNs is to use for each target gene more than one Boolean function. So, PBN is a more general and probabilistic modification of a Boolean network. We used the PBN Toolbox software [13] to derive a PBN from a given data. The algorithm of deriving a PBN from the data depends on a concept

Conclusions and future research directions

In this paper, we proposed a feature reduction based method to handle the problem of finding approximate solution to the control of GRNs. For each gene, a score is computed to estimate whether the gene is relevant in solving the resulting MDP. The score is based on MDP minimization theory and estimation of the degree of the genes in determining the next state of each other. The results are promising in the sense that given a threshold value, the score can be used to remove some genes with

Acknowledgments

The research of Mehmet Tan is partially supported by The Scientific and Technological Research Council of Turkey. The research of Reda Alhajj is partially supported by NSERC, Canada.

References (27)

R. Givan et al.
Equivalence notions and model minimization in markov decision processes
Artificial Intelligence
(2003)
H. de Jong
Modeling and simulation of genetic regulatory systems: A literature review”
Journal of Computational Biology
(2002)
Abul O, Alhajj R, Polat F. Markov decision processes based optimal control policies for probabilistic Boolean networks....
Abul O, Alhajj R, Polat F. An optimal multi-objective control method for discrete genetic regulatory networks. In:...
A. Choudhary et al.
Intervention in a family of Boolean networks
Bioinformatics
(2006)
A. Datta et al.
External control in markovian genetic regulatory networks
Machine Learning
(2003)
A. Datta et al.
External control in markovian genetic regulatory networks: The imperfect information case”
Bioinformatics
(2004)
R. Pal et al.
Intervention in context-sensitive probabilistic Boolean networks”
Bioinformatics
(2005)
R. Pal et al.
Optimal infinite-horizon control for probabilistic Boolean networks
IEEE Transactions on Signal Processing
(2006)
I. Shmulevich et al.
Control of stationary behaviour in probabilistic Boolean networks by means of structural intervention
Biological Systems
(2002)

I. Shmulevich et al.

Gene perturbation and intervention in probabilistic Boolean networks

Bioinformatics

(2002)

V.G. Kulkarni

Modeling and analysis of stochastic systems

(1996)

R.S. Sutton et al.

Reinforcement learning

(1998)

Cited by (5)

A framework to shift basins of attraction of gene regulatory networks through batch reinforcement learning
2020, Artificial Intelligence in Medicine
A major challenge in gene regulatory networks (GRN) of biological systems is to discover when and what interventions should be applied to shift them to healthy phenotypes. A set of gene activity profiles, called basin of attraction (BOA), takes this network to a specific phenotype; therefore, a healthy BOA leads the GRN to a healthy phenotype. However, without the complete observability of the genes, it is not possible to identify whether the current BOA is healthy. In this article we investigate external interventions in GRN with partial observability aiming to bring it to healthy BOAs. We propose a new batch reinforcement learning method (BRL), called mSFQI, to define intervention strategies based on the probabilities of the gene activity profiles being in healthy BOAs, which are calculated from a set of previous observed experiences. BRL uses approximation functions and repeated applications of previous experiences to accelerate learning. Results demonstrate that our proposal can quickly shift a partially observable GRN to healthy BOAs, while reducing the number of interventions. In addition, when observability is poor, mSFQI produces better results when the probabilities for a greater amount of previous observations are available.
Employing decomposable partially observable Markov decision processes to control gene regulatory networks
2017, Artificial Intelligence in Medicine
Citation Excerpt :
The main focus on these previous works by our group was formulating the GRN in PBN model and trying to solve the MDP problem in different settings and exploring different aspects such as finite or infinite horizon reward mechanisms, factored representations of the MDP problems, and improved modeling and solution techniques for plain and factored MDP problems. Although our techniques have been effective and well received by the research community [22–26], we realized the need for incorporating partial observability in the problem definition; and accordingly, the target of our approach described in this paper is to develop appropriate solutions for the problem augmented to be partially observable. The need to cover partial observability has been realized by some other researchers; however, the problem has not yet received enough and comprehensive attention.
Formulate the induction and control of gene regulatory networks (GRNs) from gene expression data using Partially Observable Markov Decision Processes (POMDPs).
Different approaches exist to model GRNs; they are mostly simulated as mathematical models that represent relationships between genes. Actually, it has been realized that biological functions at the cellular level are controlled by genes; thus, by controlling the behavior of genes, it is possible to regulate these biological functions. The GRN control problem has been studied mostly with the aid of probabilistic Boolean networks, and corresponding control policies have been devised. Though turns into a more challenging problem, we argue that partial observability would be a more natural and realistic method for handling the control of GRNs. Partial observability is a fundamental aspect of the problem; it is mostly ignored and substituted by assumption that states of GRN are known precisely, prescribed as full observability. We propose a method for the construction of POMDP model of GRN from only raw gene expression data which is original and novel. Then, we introduce a novel approach to decompose/factor the POMDP model into sub-POMDP's in order to solve it efficiently with the help of divide-and-conquer strategy.
In order to demonstrate the effectiveness of the proposed solution we experimented with two synthetic network and one real network data from the literature. We also conducted two sets of separate experiments used to explore the impact of network connectivity and data order to our approach
The reported test results using both synthetic and real GRNs are promising in demonstrating the applicability, effectiveness and efficiency of the proposed approach. This is due to the fact that partial observability fits well to the problem of noisy acquisition of gene expression data as there are technological limitations to measure precisely exact expression levels of genes.
Molecular species identification of six forensically important iranian flesh flies (diptera)
2020, Journal of Arthropod-Borne Diseases
Sparse biologically-constrained optimal perturbation of gene regulatory networks
2013, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Intervention and Control of Gene Regulatory Networks: Theoretical Framework and Application to Human Melanoma Gene Regulation
2013, Statistical Diagnostics for Cancer: Analyzing High-Dimensional Data

View full text

Artificial Intelligence in Medicine

Scalable approach for effective control of gene regulatory networks

Abstract

Introduction

Section snippets

Background

Scalable control by feature reduction

Experimental results

Conclusions and future research directions

Acknowledgments

Equivalence notions and model minimization in markov decision processes

Artificial Intelligence

Modeling and simulation of genetic regulatory systems: A literature review”

Journal of Computational Biology

Intervention in a family of Boolean networks

Bioinformatics

External control in markovian genetic regulatory networks

Machine Learning

External control in markovian genetic regulatory networks: The imperfect information case”

Bioinformatics

Intervention in context-sensitive probabilistic Boolean networks”

Bioinformatics

Optimal infinite-horizon control for probabilistic Boolean networks

IEEE Transactions on Signal Processing

Control of stationary behaviour in probabilistic Boolean networks by means of structural intervention

Biological Systems

Gene perturbation and intervention in probabilistic Boolean networks

Bioinformatics

Modeling and analysis of stochastic systems

Reinforcement learning

A framework to shift basins of attraction of gene regulatory networks through batch reinforcement learning

Employing decomposable partially observable Markov decision processes to control gene regulatory networks

Molecular species identification of six forensically important iranian flesh flies (diptera)

Sparse biologically-constrained optimal perturbation of gene regulatory networks

Intervention and Control of Gene Regulatory Networks: Theoretical Framework and Application to Human Melanoma Gene Regulation