Abstract
We propose a differentiable framework for logic program inference as a step toward realizing flexible and scalable logical inference. The basic idea is to replace symbolic search appearing in logical inference by the minimization of a cost function \( {\mathbf{J}} \) in a continuous space. \( {\mathbf{J}} \) is made up of matrix (tensor), Frobenius norm and non-linear functions just like neural networks and specifically designed for each task (relation abduction, answer set computation, etc) in such a way that \( {\mathbf{J}} ( {\mathbf{X}} ) \ge 0\) and \( {\mathbf{J}} ( {\mathbf{X}} ) = 0\) holds if-and-only-if \( {\mathbf{X}} \) is a 0–1 tensor representing a solution for the task. We compute the minimizer X of \( {\mathbf{J}} \) giving \( {\mathbf{J}} ( {\mathbf{X}} ) = 0\) by gradient descent or Newton’s method. Using artificial data and real data, we empirically show the potential of our approach by a variety of tasks including abduction, random SAT, rule refinement and probabilistic modeling based on answer set (supported model) sampling.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We follow Prolog convention and logical variables begin with upper case letters.
- 2.
Stated another way, what we are doing here is “predicate invention” in inductive logic programming (ILP) in which \(r_2(Y,Z)\) is invented.
- 3.
For a matrix \( {\mathbf{A}} \), \(\text{ min}_1( {\mathbf{A}} )\) indicates element-wise application of \(\text{ min}_1(x)\) to \( {\mathbf{A}} \).
- 4.
Throughout this paper, we implicitly assume vector, matrix dimensions are all compatible.
- 5.
\(\Vert {\mathbf{X}} \odot (\mathbbm {1}- {\mathbf{X}} )\Vert _F^2 = \sum _{ij} {\mathbf{X}} _{ij}^2(1- {\mathbf{X}} _{ij})^2 = 0\) implies \( {\mathbf{X}} _{ij}(1- {\mathbf{X}} _{ij})=0\) for all i, j where \( {\mathbf{X}} _{ij}\) denotes the (i, j) element of \( {\mathbf{X}} \).
- 6.
Be warned that this task can be extremely difficult because it includes solving SAT problem as we mentioned above.
- 7.
\(\text{ min}_1(x)\) is differentiable except at one point \(x=1\) and hence \(\partial {\mathbf{J}} ^\mathrm{abd}/\partial {\mathbf{X}} \) is almost everywhere differentiable.
- 8.
The derivation of \( {\mathbf{J}} _\mathbf{a}^{abd}\) is described in Appendix.
- 9.
For matrices \( {\mathbf{X}} , {\mathbf{Y}} \), \(( {\mathbf{X}} \bullet {\mathbf{Y}} ) = \sum _{ij} {\mathbf{X}} _{ij} {\mathbf{Y}} _{ij}\).
- 10.
All experiments in this paper are carried out using GNU Octave 4.2.2 and Python 3.6.3 on a PC with Intel(R) Core(TM) i7-3770@3.40 GHz CPU, 28 GB memory.
- 11.
\( {\mathbf{J}} _\mathbf{a}^{sat}\) is derived similarly to \( {\mathbf{J}} _\mathbf{a}^{abd}\).
- 12.
We here consider \( {\mathbf{A}} \) as a set \(\{(i,j) \mid {\mathbf{A}} _{ij}=1 \}\) and use \(| {\mathbf{A}} |\) as its cardinality.
- 13.
This is a variant of ‘Friends & Smokers’ program from ProbLog’s tutorial (https://dtai.cs.kuleuven.be/problog/tutorial/basic/05_smokers.html).
- 14.
We assume \( \text{ DB } ^g\) has supported models.
- 15.
ASP is logic programming based on stable model semantics of logic programs and primarily applied to solve combinatorial problems.
- 16.
Another difference is that Nickles [20] deals with stable models while we use supported models which are easier to compute than stable ones.
References
Baral, C., Gelfond, M., Rushton, N.: Probabilistic reasoning with answer sets. Theory Pract. Logic Program. (TPLP) 9(1), 57–144 (2009)
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Burges, C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K. (eds.) Advances in Neural Information Processing Systems, vol. 26, pp. 2787–2795 (2013)
Chakraborty, S., Fremont, D.J., Meel, K.S., Seshia, S.A., Vardi, M.Y.: Distribution-aware sampling and weighted model counting for sat. In: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence. AAAI 2014, pp. 1722–1730. AAAI Press (2014). http://dl.acm.org/citation.cfm?id=2892753.2892792
Cohen, W.W., Yang, F., Mazaitis, K.: TensorLog: deep learning meets probabilistic DBs. CoRR abs/1707.05390 (2017). http://arxiv.org/abs/1707.05390
Denecker, M., Kakas, A.: Abduction in logic programming. In: Kakas, A.C., Sadri, F. (eds.) Computational Logic: Logic Programming and Beyond. LNCS (LNAI), vol. 2407, pp. 402–436. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45628-7_16
Eiter, T., Gottlob, G., Leone, N.: Abduction from logic programs: semantics and complexity. Theoret. Comput. Sci. 189(1–2), 129–177 (1997)
Eiter, T., Ianni, G., Krennwallner, T.: Answer set programming: a primer. In: Tessaris, S., Franconi, E., Eiter, T., Gutierrez, C., Handschuh, S., Rousset, M.-C., Schmidt, R.A. (eds.) Reasoning Web 2009. LNCS, vol. 5689, pp. 40–110. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03754-2_2
Flach, P., Kakas, A. (eds.): Abduction and Induction - Essays on Their Relation and Integration. Springer, Dordrecht (2000). https://doi.org/10.1007/978-94-017-0606-3
Gelfond, M., Lifshcitz, V.: The stable model semantics for logic programming, pp. 1070–1080 (1988)
Gottlob, G., Pichler, R., Wei, F.: Tractable database design and datalog abduction through bounded treewidth. Inf. Syst. 35(3), 278–298 (2010)
Grefenstette, E.: Towards a formal distributional semantics: simulating logical calculi with tensors. In: Proceedings of the Second Joint Conference on Lexical and Computational Semantics, pp. 1–10 (2013). http://www.aclweb.org/anthology/S13-1001
Heule, M., Järvisalo, M., Suda, M. (eds.): Proceedings of SAT Competition 2018: Solver and Benchmark Descriptions, Department of Computer Science Series of Publications B, vol. B-2018-1. Department of Computer Science, University of Helsinki (2018)
Hobbs, J.R., Stickel, M.E., Appelt, D.E., Martin, P.: Interpretation as abduction. Artif. Intell. 63(1–2), 69–142 (1993)
Inoue, K., Sato, T., Ishihata, M., Kameya, Y., Nabeshima, H.: Evaluating abductive hypotheses using an EM algorithm on BDDs. In: Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI 2009), pp. 810–815 (2009)
Kakas, A.C., Kowalski, R., Toni, F.: Abductive logic programming. J. Logic Comput. 2(6), 719–770 (1992)
Kate, R., Mooney, R.: Probabilistic abduction using Markov logic networks. In: The IJCAI-09 Workshop on Plan, Activity, and Intent Recognition (PAIR 2009), pp. 22–28 (2009)
Kazemi, S.M., Poole, D.: RELNN: a deep neural model for relational learning. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI 2018), pp. 6367–6375 (2018)
Manhaeve, R., Dumancic, S., Kimmig, A., Demeester, T., Raedt, L.D.: DeepProbLog: neural probabilistic logic programming. CoRR (2018). http://arxiv.org/abs/1805.10872
Marek, W., Subrahmanian, V.S.: The relationship between stable, supported, default and autoepistemic semantics for general logic programs. Theoret. Comput. Sci. 103(2), 365–386 (1992)
Nickles, M.: Differentiable SAT/ASP. In: Proceedings of the 5th International Workshop on Probabilistic Logic Programming, PLP 2018, pp. 62–74 (2018)
Poole, D.: Probabilistic Horn abduction and Bayesian networks. Artif. Intell. 64(1), 81–129 (1993)
De Raedt, L., Kimmig, A.: Probabilistic (logic) programming concepts. Mach. Learn. 100(1), 5–47 (2015). https://doi.org/10.1007/s10994-015-5494-z
Rocktäschel, T., Riedel, S.: End-to-end differentiable proving. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 3788–3800. Curran Associates, Inc., Long Beach (2017)
Roth, D.: Integer linear programming inference for conditional random fields. In: Proceedings of the International Conference on Machine Learning (ICML 2005), pp. 737–744 (2005)
Roth, D., Yih, W.T.: Global inference for entity and relation identification via a linear programming formulation. In: Introduction to Statistical Relational Learning, January 2007
Sakama, C., Nguyen, H., Sato, T., Inoue, K.: Partial evaluation of logic programs in vector spaces. In: Proceedings of the 11th Workshop on Answer Set Programming and Other Computing Paradigms (ASPOCP 2018) (2018). 10.29007/9d61
Sato, T.: A statistical learning method for logic programs with distribution semantics. In: Proceedings of the 12th International Conference on Logic Programming (ICLP 1995), pp. 715–729 (1995)
Sato, T., Kameya, Y.: Statistical abduction with tabulation. In: Kakas, A.C., Sadri, F. (eds.) Computational Logic: Logic Programming and Beyond. LNCS (LNAI), vol. 2408, pp. 567–587. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45632-5_22
Sato, T.: Embedding Tarskian semantics in vector spaces. In: AAAI-17 Workshop on Symbolic Inference and Optimization (SymInfOpt 2017) (2017)
Sato, T., Inoue, K., Sakama, C.: Abducing relations in continuous spaces. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI-ECAI 2018), pp. 1956–1962 (2018)
Selsam, D., Lamm, M., Bünz, B., Liang, P., de Moura, L., Dill, D.L.: Learning a SAT solver from single-bit supervision. In: International Conference on Learning Representations (ICLR 2019) (2019). https://openreview.net/forum?id=HJMC_iA5tm
Tamaddoni-Nezhad, A., Chaleil, R., Kakas, A., Muggleton, S.: Application of abductive ILP to learning metabolic network inhibition from temporal data. Mach. Learn. 64, 209–230 (2006)
Widdows, D., Cohen, T.: Reasoning with vectors: a continuous model for fast robust inference. Logic J. IGPL/Interest Group Pure Appl. Log. 23(2), 141–173 (2015)
Acknowledgments
This paper is based on results obtained from a project commissioned by the New Energy and Industrial Technology Development Organization (NEDO).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Here we describe how the Jacobian \( {\mathbf{J}} _\mathbf{a}^{abd}\) in Sect. 1 is derived. Recall that our cost function (1) is
First we introduce a dot product for two matrices \( {\mathbf{X}} \) and \( {\mathbf{Y}} \) by \(( {\mathbf{X}} \bullet {\mathbf{Y}} ) = \sum _{ij} {\mathbf{X}} _{ij} {\mathbf{Y}} _{ij}\). Then \(\Vert X \Vert _F^2 =( {\mathbf{X}} \bullet {\mathbf{X}} )\) holds. Also \((( {\mathbf{X}} {\mathbf{Z}} ) \bullet {\mathbf{Y}} ) = ( {\mathbf{Z}} \bullet ( {\mathbf{X}} ^T {\mathbf{Y}} ))\) and \((( {\mathbf{X}} \odot {\mathbf{Z}} ) \bullet {\mathbf{Y}} ) = ( {\mathbf{Z}} \bullet ( {\mathbf{X}} \odot {\mathbf{Y}} ))\) hold. Let \( {\mathbf{X}} _{pq}\) be the (p, q) element of a matrix \( {\mathbf{X}} \) and \( {\mathbf{I}} _{pq}\) a zero matrix except the (p, q) element which is one. We also put \( {\mathbf{C}} = {\mathbf{R}} _1 {\mathbf{X}} \), \( {\mathbf{B}} = \text{ min}_1( {\mathbf{C}} )- {\mathbf{R}} _3\) and use the fact that \( {\mathbf{C}} _{\le 1}\odot {\mathbf{B}} = {\mathbf{C}} _{\le 1}\odot ( {\mathbf{C}} - {\mathbf{R}} _3)\) for simplification. Now we have
Since this holds for any (p, q), we reach the Jacobian \( {\mathbf{J}} _\mathbf{a}^{abd}\) (5):
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Sato, T., Kojima, R. (2020). Logical Inference as Cost Minimization in Vector Spaces. In: El Fallah Seghrouchni, A., Sarne, D. (eds) Artificial Intelligence. IJCAI 2019 International Workshops. IJCAI 2019. Lecture Notes in Computer Science(), vol 12158. Springer, Cham. https://doi.org/10.1007/978-3-030-56150-5_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-56150-5_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-56149-9
Online ISBN: 978-3-030-56150-5
eBook Packages: Computer ScienceComputer Science (R0)