Regularized maximum likelihood estimation of sparse stochastic monomolecular biochemical reaction networks

doi:10.1016/j.compchemeng.2016.03.018

Computers & Chemical Engineering

Volume 90, 12 July 2016, Pages 111-120

https://doi.org/10.1016/j.compchemeng.2016.03.018 Get rights and content

Highlights

•
A sparse parameter matrix estimation method is proposed for a stochastic monomolecular biochemical reaction network system.
•
We considers stochastic monomolecular reaction systems for which an exact analytical solution of the CME can be derived.
•
The estimation method incorporates the closed-form solution into a regularized MLE for which model complexity is penalized.
•
Simulation results are provided to verify performance improvement of regularized MLE over LSE.

Abstract

A sparse parameter matrix estimation method is proposed for identifying a stochastic monomolecular biochemical reaction network system. Identification of a reaction network can be achieved by estimating a sparse parameter matrix containing the reaction network structure and kinetics information. Stochastic dynamics of a biochemical reaction network system is usually modeled by a chemical master equation (CME) describing the time evolution of probability distributions for all possible states. This paper considers closed monomolecular reaction systems for which an exact analytical solution of the corresponding chemical master equation can be derived. The estimation method presented in this paper incorporates the closed-form solution into a regularized maximum likelihood estimation (MLE) for which model complexity is penalized. A simulation result is provided to verify performance improvement of regularized MLE over least-square estimation (LSE), which is based on a deterministic mass-average model, in the case of a small population size.

Introduction

Stochastic dynamics of biological systems have lately received increased attention from researchers in the field of biological engineering (Raj and van Oudenaarden, 2009). In the past, such studies were greatly hampered by lack of qualitative measurement data. Nowadays, quantitative but noisy data can be obtained using the microarray technology. Recent developments in sensing techniques that can provide real-time observations of intrinsic stochastic dynamics at small length scales have motivated many scientific investigations (Raj et al., 2010, Xie et al., 2008). One such sensing technique for biological systems is bio-imaging using fluorescent proteins (García-Parajó et al., 2001). By grafting a fluorescent protein into gene expression, proteins and mRNA expressions originating from targeted DNA can be detected quantitatively in real time. Specifically, yellow fluorescent protein (yfp) (Elowitz et al., 2002, Li and Xie, 2011, Yu et al., 2006) has been widely used for detecting changes with single-macromolecule sensitivity in individual live cells.

In the real-time data reported in the aforementioned papers, strongly stochastic behavior has been observed. For example, bursts of transcribed protein molecules from the cell, controlled by an identical messenger RNA molecule, have different copy numbers (Elowitz et al., 2002). The total population number of species in the system within the detectable range is small, ranging from tens to thousands of copies. Stochastic dynamics of systems with discrete states can be modeled by the chemical master equation (CME) (Feinberg, 1979, Fichthorn and Weinberg, 1991), $\frac{\partial P (σ, t)}{\partial t} = \sum_{σ^{'}} W (σ^{'}, σ) P (σ^{'}, t) - \sum_{σ^{'}} W (σ, σ^{'}) P (σ, t)$ where $P (σ, t)$ is the probability of the system being in discrete state $σ$ at time t, and $W (σ^{'}, σ)$ is the transition rate from state $σ^{'}$ to state $σ$ . The CME describes the time evolution of the probability distribution among all possible configurations, and can be written as (MacNamara et al., 2008, Munsky and Khammash, 2006) $\frac{d P (t)}{d t} = A (t; θ) P (t)$ where $P (t)$ is the state vector containing all the state probability variables and $A (t; θ)$ is a matrix containing all the transition rate constants with dependence on the model parameter vector $θ$ related with physicochemical phenomena, e.g., biochemical reactions. Many numerical algorithms have been developed for solving the matrix ordinary differential Eq. (2), which can be divided into direct methods (MacNamara et al., 2008, Munsky and Khammash, 2006) and indirect methods (Gibson and Bruck, 2000, Gillespie, 1977). Direct methods, such as the finite state projection (FSP) algorithm, attempt to evaluate the matrix exponential directly. In practice, given the large size of the state space, indirect methods that use stochastic simulation algorithms (SSAs) to generate approximate probability distributions have been more popular.

A reaction network is one of fundamental models for describing biological mechanisms. Different types of biological reaction networks exist and interact to accomplish needed functions including gene regulatory networks, metabolic networks, and signaling networks. Network identification is an important aspect of studying biological network systems. Given time series experimental data from sensors, a parameter estimation method can be used to identify the parameters of a given reaction network model. Typically, the estimation is formulated to find parameter values minimizing the difference between the experimental data and their model predictions. Most of the literature has employed least-squares estimation (LSE) approaches, which fit stochastic data to a deterministic continuum model (Gennemark and Wedelin, 2009).

The LSE method based on deterministic models, however, can provide poor parameter estimates for highly stochastic systems (Tian et al., 2007). Several stochastic parameter estimation methods based on the stochastic differential equation of type (1) have shown improved estimation performance (Munsky et al., 2012). These methods attempt to solve for the probability density functions (PDFs) of the CME and use them for estimation. Previous studies employed the moment-based method (Munsky et al., 2009, Zechner et al., 2012), the Bayesian method (Boys et al., 2008, Golightly and Wilkinson, 2011, Lillacci and Khammash, 2010), the maximum likelihood estimation (MLE) method (Daigle et al., 2012, Tian et al., 2007), and the density function distance (DFD) method (Lillacci and Khammash, 2013, Poovathingal and Gunawan, 2010). However, these published methods are based on approximated PDFs of the CME. In many cases, PDFs are approximated using the SSA approach (Gillespie, 2007), which typically demands a very large number of simulations to be performed for an accurate estimation.

In many cases, the exact network structure of a biological system is not known. An important problem in biological systems is to use measurement data to identify the topology of reaction networks and estimate associated parameters at the same time. Craciun and Pantea, 2008 discussed the issue of identifiability of reaction networks with the fact that there might exist more than one parameter set that fits the measurement data with the same level of accuracy. To cope with such challenges, model discrimination/invalidation of reaction networks has been studied (Conradi et al., 2005, Kremling et al., 2004). Non-uniqueness of an estimation algorithm can be overcome by using the principle of parsimony and choosing simple models over complex models (Jefferys and Berger, 1991). An approach to reduce model complexity is to use penalization term for the number of parameters such as ridge and $ℓ$ ₁ regularization (Hesterberg et al., 2008).

Taking the above issues into consideration altogether, this paper considers stochastic monomolecular reaction systems in a sparse parameter matrix estimation problem. Stochastic monomolecular reaction system, described by an exact probability distribution solution of the CME (Jahnke and Huisinga, 2007), can represent realistic gene regulatory networks or metabolic networks. The exact solution enables formulation of a regularized MLE method rather than LSE. Improved performance over the LSE method is verified with a simulation study involving a small scale reaction network system. Applicability of the proposed MLE method to a more stochastic and larger scale reaction network system is also tested.

Section snippets

Biochemical reaction network system

In a living organism, most biological functions arise from complex interactions between numerous components such as genes, metabolites, and proteins. The interactions form a large biochemical reaction network system involving thousands of components. Gene expression is the main mechanism by which cells regulate the interaction webs to perform functions. The gene expression process occurs in two steps: (1) transcription of genes into mRNAs initiated by specialized proteins and (2) translation of

Exact Solution of the CME

In Jahnke and Huisinga (2007), analytical solutions for the CME (10) for both open and closed systems are derived. In this section, results in Jahnke and Huisinga (2007) for closed systems are highlighted. The multinomial distribution, $M (x, N, λ (t))$ , is defined by $M (x, N, λ (t)) = N! Π_{i = 1}^{N} \frac{λ_{i}^{x_{i}} (t)}{x_{i}!}$ where $λ (t) \in ℝ^{n}$ is the probability parameter (also called the hyper-parameter).

Proposition 1 in Jahnke and Huisinga (2007): Suppose that $P (x, 0) = M (x, N, λ_{0})$ . Then $P (x, t) = M (x, N, λ (t))$ solves the CME (10), where N is

3 Species reaction system

The proposed parameter estimation formulation is applied to a simple reaction system containing a population size of 100 for all 3 species. Potential trajectories are simulated by SSA and used in the proposed parameter estimation method. The sampling interval is 1 s and final simulation time is 50 s so that the population of species A–C at 51 time points are calculated in each SSA run. Species A and B have a reversible reaction connectivity and species A and C have an irreversible reaction

Conclusions

A regularized maximum likelihood estimation (MLE) method is presented for determining the interaction topology of a biochemical reaction network system. The regularized MLE method is formulated by combining a closed-form solution of the chemical master equations that describe stochastic monomolecular biochemical reaction systems with an $ℓ_{1}$ -penalty term for the parameter matrix. Improved performance of the MLE method and recursive algorithms is demonstrated by using stochastic simulation data

Acknowledgement

This work was supported by the Advanced Biomass R&D Center (ABC) of Global Frontier Project funded by the Ministry of Science, ICT and Future Planning (ABC-2011-0031354).

References (44)

C. Gadgil et al.
A stochastic analysis of first-order reaction networks
Bull. Math. Biol.
(2005)
M.M. Zavlanos et al.
Inferring stable genetic networks from steady-state data
Automatica
(2011)
E. August et al.
Efficient, sparse biological network determination
BMC Syst. Biol.
(2009)
S.P. Boyd et al.
Convex Optimization
(2004)
R.J. Boys et al.
Bayesian inference for a discretely observed stochastic kinetic model
Stat. Comput.
(2008)
C. Conradi et al.
Using chemical reaction network theory to discard a kinetic mechanism hypothesis
IEE Proc. Syst. Biol.
(2005)
G. Craciun et al.
Identifiability of chemical reaction networks
J. Math. Chem.
(2008)
B.J. Daigle et al.
Accelerated maximum likelihood parameter estimation for stochastic biochemical systems
BMC Bioinform.
(2012)
M.B. Elowitz et al.
Stochastic gene expression in a single cell
Science
(2002)
Feinberg, M. (1979). Lectures on chemical reaction networks. Notes of lectures given at the Mathematics Research Center...

A.M. Feist et al.

Reconstruction of biochemical networks in microorganisms

Nat. Rev. Microbiol.

(2008)

K.A. Fichthorn et al.

Theoretical foundations of dynamical Monte Carlo simulations

J. Chem. Phys.

(1991)

M.F. García-Parajó et al.

Optical probing of single fluorescent molecules and proteins

ChemPhysChem

(2001)

T.S. Gardner et al.

Inferring genetic networks and identifying compound mode of action via expression profiling

Science

(2003)

P. Gennemark et al.

Benchmarks for identification of ordinary differential equations from time series data

Bioinformatics

(2009)

M.A. Gibson et al.

Efficient exact stochastic simulation of chemical systems with many species and many channels

J. Phys. Chem. A

(2000)

D.T. Gillespie

Exact stochastic simulation of coupled chemical reactions

J. Phys. Chem.

(1977)

D.T. Gillespie

Stochastic simulation of chemical kinetics

Annu. Rev. Phys. Chem.

(2007)

A. Golightly et al.

Bayesian parameter inference for stochastic biochemical network models using particle Markov chain Monte Carlo

Interface Focus

(2011)

A. Hayter

Probability and Statistics for Engineers and Scientists

(2012)

T. Hesterberg et al.

Least angle and ℓ1 penalized regression: a review

Stat. Surv.

(2008)

D.R. Hyduke et al.

Towards genome-scale signalling-network reconstructions

Nat. Rev. Genet.

(2010)

Cited by (4)

Parameter-dependent linear matrix inequality approach to robust state estimation of noisy genetic networks
2020, Computers and Chemical Engineering
Citation Excerpt :
Such fluctuations are of stochastic nature described by Wiener process. Besides, the genetic networks may suffer from deterministic or stochastic extrinsic disturbances from the environment (Paulsson, 2004; Chen et al., 2005, 2008; Wang et al., 2015; Jang et al., 2016). As a result, it is generally impossible to model a genetic network with a purely deterministic model, and a stochastic model is more suitable to characterize the genetic network with intrinsic and extrinsic noise.
Genetic networks play an important role in systems biology as they explain the interactions between genes and proteins. However, the genetic networks are described by dynamical systems with nonlinear uncertainties and usually affected by stochastic internal fluctuations, and stochastic external disturbances. The design addressed in this paper is concerned with robust state estimator of stochastic genetic networks in the presence of nonlinear uncertainties. The objective is to estimate the true concentrations of mRNAs and proteins of the noisy nonlinear genetic networks. Based on the notion of Lyapunov functions, we improve the design condition for the robust estimator to ensure that the estimation error satisfies H_∞ performance criterion. The sufficient condition is derived in terms of parameter-dependent linear matrix inequalities, which are convex constraints, and can be efficiently solved via the sum-of-squares technique. We provide two numerical examples of real genetic networks to illustrate the effectiveness of the proposed method.
Navigating the Statistical Minefield of Model Selection and Clustering in Neuroscience
2022, eNeuro
Comprehensive review of models and methods for inferences in bio-chemical reaction networks
2019, arXiv
Comprehensive review of models and methods for inferences in bio-chemical reaction networks
2019, Frontiers in Genetics

^☆: Preliminary results in this manuscript were published in conference papers “H. Jang, K. K. K. Kim, J. H. Lee, and R. D. Braatz, Regularized Maximum Likelihood Estimation of Sparse Stochastic Monomolecular Biochemical Reaction Networks, Proceedings of the IFAC World Congress, Cape Town, South Africa, 2014” and “K. K. K. Kim, H. Jang, R. B. Gopaluni, J. H. Lee and R. D. Braatz, Sparse Identification in Chemical Master Equations for Monomolecular Reaction Networks, Proceedings of the American Control Conference, Portland, Oregon, 2014.”

View full text

Regularized maximum likelihood estimation of sparse stochastic monomolecular biochemical reaction networks☆

Highlights

Abstract

Introduction

Section snippets

Biochemical reaction network system

Exact Solution of the CME

3 Species reaction system

Conclusions

Acknowledgement

Bull. Math. Biol.

Automatica

Efficient, sparse biological network determination

BMC Syst. Biol.

Convex Optimization

Bayesian inference for a discretely observed stochastic kinetic model

Stat. Comput.

Using chemical reaction network theory to discard a kinetic mechanism hypothesis

IEE Proc. Syst. Biol.

Identifiability of chemical reaction networks

J. Math. Chem.

Accelerated maximum likelihood parameter estimation for stochastic biochemical systems

BMC Bioinform.

Stochastic gene expression in a single cell

Science

Reconstruction of biochemical networks in microorganisms

Nat. Rev. Microbiol.

Theoretical foundations of dynamical Monte Carlo simulations

J. Chem. Phys.

Optical probing of single fluorescent molecules and proteins

ChemPhysChem

Inferring genetic networks and identifying compound mode of action via expression profiling

Science

Benchmarks for identification of ordinary differential equations from time series data

Bioinformatics

Efficient exact stochastic simulation of chemical systems with many species and many channels

J. Phys. Chem. A

Exact stochastic simulation of coupled chemical reactions

J. Phys. Chem.

Stochastic simulation of chemical kinetics

Annu. Rev. Phys. Chem.

Bayesian parameter inference for stochastic biochemical network models using particle Markov chain Monte Carlo

Interface Focus

Probability and Statistics for Engineers and Scientists

Least angle and ℓ1 penalized regression: a review

Stat. Surv.

Towards genome-scale signalling-network reconstructions

Nat. Rev. Genet.