Abstract
Automating the process of model building from experimental data is a very desirable goal to palliate the lack of modellers for many applications. However, despite the spectacular progress of machine learning techniques in data analytics, classification, clustering and prediction making, learning dynamical models from data time-series is still challenging. In this paper we investigate the use of the Probably Approximately Correct (PAC) learning framework of Leslie Valiant as a method for the automated discovery of influence models of biochemical processes from Boolean and stochastic traces. We show that Thomas’ Boolean influence systems can be naturally represented by k-CNF formulae, and learned from time-series data with a number of Boolean activation samples per species quasi-linear in the precision of the learned model, and that positive Boolean influence systems can be represented by monotone DNF formulae and learned actively with both activation samples and oracle calls. We consider Boolean traces and Boolean abstractions of stochastic simulation traces, and study the space-time tradeoff there is between the diversity of initial states and the length of the time horizon, and its impact on the error bounds provided by the PAC learning algorithms. We evaluate the performance of this approach on a model of T-lymphocyte differentiation, with and without prior knowledge, and discuss its merits as well as its limitations with respect to realistic experiments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
For the sake of reproducibility, the code used in this article is available at http://lifeware.inria.fr/wiki/software/#CMSB17.
- 2.
More generally, the PAC learning protocol can discover partial vectors, but for the applications discussed in the current article it is enough to only consider total vectors.
- 3.
- 4.
More precisely, in a well-formed influence system, f is assumed to be partially differentiable; \(x_i\in P\) if and only if \(\sigma = +\) (resp. −) and \({\partial {f}}/ {\partial x_i}(\varvec{x})>0\) (resp. \(<0\)) for some value \(\varvec{x}\in \mathbb {R}_+^n\); and \(x_i\in I\) if and only if \(\sigma = +\) (resp. −) and \({\partial {f}}/ {\partial x_i}(\varvec{x})<0\) (resp. \(>0\)) for some value \(\varvec{x}\in \mathbb {R}_+^n\).
- 5.
Note that this function ignores the cases where \(v_i = 0\) and \({x_i}^-(v) =0\), or \(v_i=1\) and \({x_i}^+(v)=1\) which may create loops in non-terminal states in general influence systems.
References
Angelopoulos, N., Muggleton, S.H.: Machine learning metabolic pathway descriptions using a probabilistic relational representation. Electron. Trans. Artif. Intell. 7(9), 1–11 (2002). also in Proceedings of Machine Intelligence
Angelopoulos, N., Muggleton, S.H.: Slps for probabilistic pathways: Modeling and parameter estimation. Technical Report TR 2002/12. Department of Computing, Imperial College, London, UK (2002)
Bernot, G., Comet, J.P., Richard, A., Guespin, J.: A fruitful application of formal methods to biological regulatory networks: Extending Thomas’ asynchronous logical approach with temporal logic. J. Theor. Biol. 229(3), 339–347 (2004)
Bryant, C.H., Muggleton, S.H., Oliver, S.G., Kell, D.B., Reiser, P.G.K., King, R.D.: Combining inductive logic programming, active learning and robotics to discover the function of genes. Electron. Trans. Artif. Intell. 6(12), 1–36 (2001)
Calzone, L., Chabrier-Rivier, N., Fages, F., Soliman, S.: Machine learning biochemical networks from temporal logic properties. In: Priami, C., Plotkin, G. (eds.) Transactions on Computational Systems Biology VI. LNCS, vol. 4220, pp. 68–94. Springer, Heidelberg (2006). doi:10.1007/11880646_4
Chen, K.C., Calzone, L., Csikász-Nagy, A., Cross, F.R., Györffy, B., Val, J., Novàk, B., Tyson, J.J.: Integrative analysis of cell cycle control in budding yeast. Mol. Biol. Cell 15(8), 3841–3862 (2004)
Deng, K., Bourke, C., Scott, S.D., Sunderman, J., Zheng, Y.: Bandit-based algorithms for budgeted learning. In: ICDM (2007)
Deng, K., Zheng, Y., Bourke, C., Scott, S., Masciale, J.: New algorithms for budgeted learning. Mach. Learn. 90, 59–90 (2013)
Fages, F., Martinez, T., Rosenblueth, D.A., Soliman, S.: Influence systems vs Reaction systems. In: Bartocci, E., Lio, P., Paoletti, N. (eds.) CMSB 2016. LNCS, vol. 9859, pp. 98–115. Springer, Cham (2016). doi:10.1007/978-3-319-45177-0_7
Fages, F., Soliman, S.: Abstract interpretation and types for systems biology. Theor. Comput. Sci. 403(1), 52–70 (2008)
Gebser, M., Kaufmann, B., Neumann, A., Schaub, T.: clasp: A conflict-driven answer set solver. In: Baral, C., Brewka, G., Schlipf, J. (eds.) LPNMR 2007. LNCS (LNAI), vol. 4483, pp. 260–265. Springer, Heidelberg (2007). doi:10.1007/978-3-540-72200-7_23
Gebser, M., Schaub, T., Thiele, S., Usadel, B., Veber, P.: Detecting inconsistencies in large biological networks with answer set programming. In: Garcia de la Banda, M., Pontelli, E. (eds.) ICLP 2008. LNCS, vol. 5366, pp. 130–144. Springer, Heidelberg (2008). doi:10.1007/978-3-540-89982-2_19
Gillespie, D.T.: Exact stochastic simulation of coupled chemical reactions. J. Phys. Chemis. 81(25), 2340–2361 (1977)
Gordon, A.D., Henzinger, T.A., Nori, A.V., Rajamani, S.K.: Probabilistic programming. In: Proceedings of the on Future of Software Engineering, FOSE 2014, pp. 167–181, NY, USA. ACM, New York (2014)
Hill, S.M., et al.: Inferring causal molecular networks: empirical assessment through a community-based effort. Nat. Method. 1(4), 310–318 (2016)
Llamosi, A., Mezine, A., dÁlché-Buc, F., Letort, V., Sebag, M.: Experimental design in dynamical system identification: a bandit-based active learning approach. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS, vol. 8725, pp. 306–321. Springer, Heidelberg (2014). doi:10.1007/978-3-662-44851-9_20
Mendoza, L.: A network model for the control of the differentiation process in Th cells. Biosystems 84(2), 101–114 (2006)
Meyer, P., Cokelaer, T., Chandran, D., Kim, K.H., Loh, P.R., Tucker, G., Lipson, M., Berger, B., Kreutz, C., Raue, A., Steiert, B., Timmer, J., Bilal, E., Sauro, H.M., Stolovitzky, G., Saez-Rodriguez, J.: Network topology and parameter estimation: from experimental design methods to gene regulatory network kinetics using a community based approach. BMC Syst. Biol. 8(1), 1–18 (2014)
Muggleton, S.H.: Inverse entailment and progol. New Gener. Comput. 13, 245–286 (1995)
Ostrowski, M., Paulevé, L., Schaub, T., Siegel, A., Guziolowski, C.: Boolean network identification from perturbation time series data combining dynamics abstraction and logic programming. Biosystems 149, 139–153 (2016)
Remy, E., Ruet, P., Mendoza, L., Thieffry, D., Chaouiya, C.: From logical regulatory graphs to standard petri nets: dynamical roles and functionality of feedback circuits. In: Priami, C., Ingólfsdóttir, A., Mishra, B., Riis Nielson, H. (eds.) Transactions on Computational Systems Biology VII. LNCS, vol. 4230, pp. 56–72. Springer, Heidelberg (2006). doi:10.1007/11905455_3
Thomas, R.: Boolean formalisation of genetic control circuits. J. Theor. Biol. 42, 565–583 (1973)
Thomas, R.: Regulatory networks seen as asynchronous automata : a logical description. J. Theor. Biol. 153, 1–23 (1991)
Valiant, L.: A theory of the learnable. Commun. ACM 27(11), 1134–1142 (1984)
Valiant, L.: Probably Approximately Correct. Basic Books (2013)
Videla, S., Konokotina, I., Alexopoulos, L.G., Saez-Rodriguez, J., Schaub, T., Siegel, A., Guziolowski, C.: Designing experiments to discriminate families of logic models. Front. Bioeng. Biotechnol. 3, 131 (2015)
Acknowledgements
This work is partly supported by the ANR project Hyclock.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Carcano, A., Fages, F., Soliman, S. (2017). Probably Approximately Correct Learning of Regulatory Networks from Time-Series Data. In: Feret, J., Koeppl, H. (eds) Computational Methods in Systems Biology. CMSB 2017. Lecture Notes in Computer Science(), vol 10545. Springer, Cham. https://doi.org/10.1007/978-3-319-67471-1_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-67471-1_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67470-4
Online ISBN: 978-3-319-67471-1
eBook Packages: Computer ScienceComputer Science (R0)