Abstract
Reconstructing directed acyclic graphs (DAGs) from observed data constitutes an important machine learning task. It has important applications in systems biology and functional genomics. However, it is a challenging learning problem, due to the need to consider all possible orderings of the network nodes and score the resulting network structures based on the available data. The resulting computational complexity for enumerating all possible orderings is exponential in the size of the network. A large number of methods based on various modeling formalisms have been developed to address this problem, primarily focusing on developing fast algorithms to reduce computational time. On many instances, partial topological information may be available for subsets of the nodes; for example, in biology one has information about transcription factors that regulate (precede) other genes, or such information can be obtained from perturbation/silencing experiments for subsets of DAG nodes (genes).
We develop a framework for estimating DAGs from observational data under the assumption that the nodes are partitioned into sets and a complete topological ordering exists amongst them. The proposed approach combines (i) (penalized) regression to estimate edges between nodes across different sets, with (ii) the popular PC-algorithm that identifies the skeleton of the graph within each set. In the final step, we combine the results from the previous two steps to screen out redundant edges. We illustrate the performance of the proposed approach on topologies extracted from the DREAM3 competition. The numerical results showcase the usefulness of the additional partial topological ordering information for this challenging learning problem.
Supported by grants NIH 1U01CA23548701 and 5R01GM11402904 to G. Michailidis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Anchang, B., et al.: Modeling the temporal interplay of molecular signaling and gene expression by using dynamic nested effects models. Proc. Nat. Acad. Sci. 106(16), 6447–6452 (2009)
Andersson, S.A., Madigan, D., Perlman, M.D.: Alternative markov properties for chain graphs. Scand. J. Stat. 28(1), 33–85 (2001)
Aragam, B., Zhou, Q.: Concave penalized estimation of sparse Gaussian Bayesian networks. J. Mach. Learn. Res. 16, 2273–2328 (2015)
Bühlmann, P., Peters, J., Ernest, J., et al.: CAM: causal additive models, high-dimensional order search and penalized regression. Ann. Stat. 42(6), 2526–2556 (2014)
Champion, M., Picheny, V., Vignes, M.: Inferring large graphs using \(\ell \_1 \)-penalized likelihood. Stat. Comput. 28(4), 905–921 (2018)
Colombo, D., Maathuis, M.H.: Order-independent constraint-based causal structure learning. J. Mach. Learn. Res. 15(1), 3741–3782 (2014)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press, Cambridge (2009)
Fu, F., Zhou, Q.: Learning sparse causal Gaussian networks with experimental intervention: regularization and coordinate descent. J. Am. Stat. Assoc. 108(501), 288–300 (2013)
Gu, J., Fu, F., Zhou, Q.: Penalized estimation of directed acyclic graphs from discrete data, March 2014. https://arxiv.org/abs/1403.2310
Kalisch, M., Bühlmann, P.: Estimating high-dimensional directed acyclic graphs with the PC-algorithm. J. Mach. Learn. Res. 8(Mar), 613–636 (2007)
Kalisch, M., Mächler, M., Colombo, D., Maathuis, M.H., Bühlmann, P., et al.: Causal inference using graphical models with the R package pcalg. J. Stat. Softw. 47(11), 1–26 (2012)
Lee, T.I., et al.: Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298(5594), 799–804 (2002)
Liang, X., Young, W.C., Hung, L.H., Raftery, A.E., Yeung, K.Y.: Integration of multiple data sources for gene network inference using genetic perturbation data. J. Comput. Biol. 26, 1113–1129 (2019)
Lin, J., Basu, S., Banerjee, M., Michailidis, G.: Penalized maximum likelihood estimation of multi-layered Gaussian graphical models. J. Mach. Learn. Res. 17(146), 1–51 (2016)
Markowetz, F., Kostka, D., Troyanskaya, O.G., Spang, R.: Nested effects models for high-dimensional phenotyping screens. Bioinformatics 23(13), i305–i312 (2007)
Ni, Y., Stingo, F.C., Baladandayuthapani, V.: Bayesian nonlinear model selection for gene regulatory networks. Biometrics 71(3), 585–595 (2015)
Peters, J.M.: Restricted structural equation models for causal inference. Ph.D. thesis, ETH Zurich (2012)
Pinna, A., Soranzo, N., De La Fuente, A.: From knockouts to networks: establishing direct cause-effect relationships through graph analysis. PLoS ONE 5(10), e12912 (2010)
Qin, S., Ma, F., Chen, L.: Gene regulatory networks by transcription factors and micrornas in breast cancer. Bioinformatics 31(1), 76–83 (2014)
Shimizu, S., Hoyer, P.O., Hyvärinen, A., Kerminen, A.: A linear non-Gaussian acyclic model for causal discovery. J. Mach. Learn. Res. 7(Oct), 2003–2030 (2006)
Shojaie, A., Jauhiainen, A., Kallitsis, M., Michailidis, G.: Inferring regulatory networks by combining perturbation screens and steady state gene expression profiles. PLoS ONE 9(2), e82393 (2014)
Shojaie, A., Michailidis, G.: Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs. Biometrika 97(3), 519–538 (2010)
Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction, and Search. Lecture Notes in Statistics. Springer, New York (1993). https://doi.org/10.1007/978-1-4612-2748-9
Spirtes, P., Glymour, C.N., Scheines, R.: Causation, Prediction, and Search. MIT Press, Cambridge (2000)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, PL., Michailidis, G. (2019). Directed Acyclic Graph Reconstruction Leveraging Prior Partial Ordering Information. In: Nicosia, G., Pardalos, P., Umeton, R., Giuffrida, G., Sciacca, V. (eds) Machine Learning, Optimization, and Data Science. LOD 2019. Lecture Notes in Computer Science(), vol 11943. Springer, Cham. https://doi.org/10.1007/978-3-030-37599-7_38
Download citation
DOI: https://doi.org/10.1007/978-3-030-37599-7_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37598-0
Online ISBN: 978-3-030-37599-7
eBook Packages: Computer ScienceComputer Science (R0)