Logical Inference as Cost Minimization in Vector Spaces

Sato, Taisuke; Kojima, Ryosuke

doi:10.1007/978-3-030-56150-5_12

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12158))

Included in the following conference series:

International Joint Conference on Artificial Intelligence

455 Accesses
1 Citations

Abstract

We propose a differentiable framework for logic program inference as a step toward realizing flexible and scalable logical inference. The basic idea is to replace symbolic search appearing in logical inference by the minimization of a cost function $ {\mathbf{J}} $ in a continuous space. $ {\mathbf{J}} $ is made up of matrix (tensor), Frobenius norm and non-linear functions just like neural networks and specifically designed for each task (relation abduction, answer set computation, etc) in such a way that $ {\mathbf{J}} ( {\mathbf{X}} ) \ge 0$ and $ {\mathbf{J}} ( {\mathbf{X}} ) = 0$ holds if-and-only-if $ {\mathbf{X}} $ is a 0–1 tensor representing a solution for the task. We compute the minimizer X of $ {\mathbf{J}} $ giving $ {\mathbf{J}} ( {\mathbf{X}} ) = 0$ by gradient descent or Newton’s method. Using artificial data and real data, we empirically show the potential of our approach by a variety of tasks including abduction, random SAT, rule refinement and probabilistic modeling based on answer set (supported model) sampling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We follow Prolog convention and logical variables begin with upper case letters.
2.
Stated another way, what we are doing here is “predicate invention” in inductive logic programming (ILP) in which $r_2(Y,Z)$ is invented.
3.
For a matrix $ {\mathbf{A}} $, $\text{ min}_1( {\mathbf{A}} )$ indicates element-wise application of $\text{ min}_1(x)$ to $ {\mathbf{A}} $.
4.
Throughout this paper, we implicitly assume vector, matrix dimensions are all compatible.
5.
$\Vert {\mathbf{X}} \odot (\mathbbm {1}- {\mathbf{X}} )\Vert _F^2 = \sum _{ij} {\mathbf{X}} _{ij}^2(1- {\mathbf{X}} _{ij})^2 = 0$ implies $ {\mathbf{X}} _{ij}(1- {\mathbf{X}} _{ij})=0$ for all i, j where $ {\mathbf{X}} _{ij}$ denotes the (i, j) element of $ {\mathbf{X}} $.
6.
Be warned that this task can be extremely difficult because it includes solving SAT problem as we mentioned above.
7.
$\text{ min}_1(x)$ is differentiable except at one point $x=1$ and hence $\partial {\mathbf{J}} ^\mathrm{abd}/\partial {\mathbf{X}} $ is almost everywhere differentiable.
8.
The derivation of $ {\mathbf{J}} _\mathbf{a}^{abd}$ is described in Appendix.
9.
For matrices $ {\mathbf{X}} , {\mathbf{Y}} $, $( {\mathbf{X}} \bullet {\mathbf{Y}} ) = \sum _{ij} {\mathbf{X}} _{ij} {\mathbf{Y}} _{ij}$.
10.
All experiments in this paper are carried out using GNU Octave 4.2.2 and Python 3.6.3 on a PC with Intel(R) Core(TM) i7-3770@3.40 GHz CPU, 28 GB memory.
11.
$ {\mathbf{J}} _\mathbf{a}^{sat}$ is derived similarly to $ {\mathbf{J}} _\mathbf{a}^{abd}$.
12.
We here consider $ {\mathbf{A}} $ as a set $\{(i,j) \mid {\mathbf{A}} _{ij}=1 \}$ and use $| {\mathbf{A}} |$ as its cardinality.
13.
This is a variant of ‘Friends & Smokers’ program from ProbLog’s tutorial (https://dtai.cs.kuleuven.be/problog/tutorial/basic/05_smokers.html).
14.
We assume $ \text{ DB } ^g$ has supported models.
15.
ASP is logic programming based on stable model semantics of logic programs and primarily applied to solve combinatorial problems.
16.
Another difference is that Nickles [20] deals with stable models while we use supported models which are easier to compute than stable ones.

References

Baral, C., Gelfond, M., Rushton, N.: Probabilistic reasoning with answer sets. Theory Pract. Logic Program. (TPLP) 9(1), 57–144 (2009)
Article MathSciNet Google Scholar
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Burges, C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K. (eds.) Advances in Neural Information Processing Systems, vol. 26, pp. 2787–2795 (2013)
Google Scholar
Chakraborty, S., Fremont, D.J., Meel, K.S., Seshia, S.A., Vardi, M.Y.: Distribution-aware sampling and weighted model counting for sat. In: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence. AAAI 2014, pp. 1722–1730. AAAI Press (2014). http://dl.acm.org/citation.cfm?id=2892753.2892792
Cohen, W.W., Yang, F., Mazaitis, K.: TensorLog: deep learning meets probabilistic DBs. CoRR abs/1707.05390 (2017). http://arxiv.org/abs/1707.05390
Denecker, M., Kakas, A.: Abduction in logic programming. In: Kakas, A.C., Sadri, F. (eds.) Computational Logic: Logic Programming and Beyond. LNCS (LNAI), vol. 2407, pp. 402–436. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45628-7_16
Chapter MATH Google Scholar
Eiter, T., Gottlob, G., Leone, N.: Abduction from logic programs: semantics and complexity. Theoret. Comput. Sci. 189(1–2), 129–177 (1997)
Article MathSciNet Google Scholar
Eiter, T., Ianni, G., Krennwallner, T.: Answer set programming: a primer. In: Tessaris, S., Franconi, E., Eiter, T., Gutierrez, C., Handschuh, S., Rousset, M.-C., Schmidt, R.A. (eds.) Reasoning Web 2009. LNCS, vol. 5689, pp. 40–110. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03754-2_2
Chapter Google Scholar
Flach, P., Kakas, A. (eds.): Abduction and Induction - Essays on Their Relation and Integration. Springer, Dordrecht (2000). https://doi.org/10.1007/978-94-017-0606-3
Book MATH Google Scholar
Gelfond, M., Lifshcitz, V.: The stable model semantics for logic programming, pp. 1070–1080 (1988)
Google Scholar
Gottlob, G., Pichler, R., Wei, F.: Tractable database design and datalog abduction through bounded treewidth. Inf. Syst. 35(3), 278–298 (2010)
Article Google Scholar
Grefenstette, E.: Towards a formal distributional semantics: simulating logical calculi with tensors. In: Proceedings of the Second Joint Conference on Lexical and Computational Semantics, pp. 1–10 (2013). http://www.aclweb.org/anthology/S13-1001
Heule, M., Järvisalo, M., Suda, M. (eds.): Proceedings of SAT Competition 2018: Solver and Benchmark Descriptions, Department of Computer Science Series of Publications B, vol. B-2018-1. Department of Computer Science, University of Helsinki (2018)
Google Scholar
Hobbs, J.R., Stickel, M.E., Appelt, D.E., Martin, P.: Interpretation as abduction. Artif. Intell. 63(1–2), 69–142 (1993)
Article Google Scholar
Inoue, K., Sato, T., Ishihata, M., Kameya, Y., Nabeshima, H.: Evaluating abductive hypotheses using an EM algorithm on BDDs. In: Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI 2009), pp. 810–815 (2009)
Google Scholar
Kakas, A.C., Kowalski, R., Toni, F.: Abductive logic programming. J. Logic Comput. 2(6), 719–770 (1992)
Article MathSciNet Google Scholar
Kate, R., Mooney, R.: Probabilistic abduction using Markov logic networks. In: The IJCAI-09 Workshop on Plan, Activity, and Intent Recognition (PAIR 2009), pp. 22–28 (2009)
Google Scholar
Kazemi, S.M., Poole, D.: RELNN: a deep neural model for relational learning. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI 2018), pp. 6367–6375 (2018)
Google Scholar
Manhaeve, R., Dumancic, S., Kimmig, A., Demeester, T., Raedt, L.D.: DeepProbLog: neural probabilistic logic programming. CoRR (2018). http://arxiv.org/abs/1805.10872
Marek, W., Subrahmanian, V.S.: The relationship between stable, supported, default and autoepistemic semantics for general logic programs. Theoret. Comput. Sci. 103(2), 365–386 (1992)
Article MathSciNet Google Scholar
Nickles, M.: Differentiable SAT/ASP. In: Proceedings of the 5th International Workshop on Probabilistic Logic Programming, PLP 2018, pp. 62–74 (2018)
Google Scholar
Poole, D.: Probabilistic Horn abduction and Bayesian networks. Artif. Intell. 64(1), 81–129 (1993)
Article Google Scholar
De Raedt, L., Kimmig, A.: Probabilistic (logic) programming concepts. Mach. Learn. 100(1), 5–47 (2015). https://doi.org/10.1007/s10994-015-5494-z
Article MathSciNet MATH Google Scholar
Rocktäschel, T., Riedel, S.: End-to-end differentiable proving. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 3788–3800. Curran Associates, Inc., Long Beach (2017)
Google Scholar
Roth, D.: Integer linear programming inference for conditional random fields. In: Proceedings of the International Conference on Machine Learning (ICML 2005), pp. 737–744 (2005)
Google Scholar
Roth, D., Yih, W.T.: Global inference for entity and relation identification via a linear programming formulation. In: Introduction to Statistical Relational Learning, January 2007
Google Scholar
Sakama, C., Nguyen, H., Sato, T., Inoue, K.: Partial evaluation of logic programs in vector spaces. In: Proceedings of the 11th Workshop on Answer Set Programming and Other Computing Paradigms (ASPOCP 2018) (2018). 10.29007/9d61
Google Scholar
Sato, T.: A statistical learning method for logic programs with distribution semantics. In: Proceedings of the 12th International Conference on Logic Programming (ICLP 1995), pp. 715–729 (1995)
Google Scholar
Sato, T., Kameya, Y.: Statistical abduction with tabulation. In: Kakas, A.C., Sadri, F. (eds.) Computational Logic: Logic Programming and Beyond. LNCS (LNAI), vol. 2408, pp. 567–587. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45632-5_22
Chapter Google Scholar
Sato, T.: Embedding Tarskian semantics in vector spaces. In: AAAI-17 Workshop on Symbolic Inference and Optimization (SymInfOpt 2017) (2017)
Google Scholar
Sato, T., Inoue, K., Sakama, C.: Abducing relations in continuous spaces. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI-ECAI 2018), pp. 1956–1962 (2018)
Google Scholar
Selsam, D., Lamm, M., Bünz, B., Liang, P., de Moura, L., Dill, D.L.: Learning a SAT solver from single-bit supervision. In: International Conference on Learning Representations (ICLR 2019) (2019). https://openreview.net/forum?id=HJMC_iA5tm
Tamaddoni-Nezhad, A., Chaleil, R., Kakas, A., Muggleton, S.: Application of abductive ILP to learning metabolic network inhibition from temporal data. Mach. Learn. 64, 209–230 (2006)
Article Google Scholar
Widdows, D., Cohen, T.: Reasoning with vectors: a continuous model for fast robust inference. Logic J. IGPL/Interest Group Pure Appl. Log. 23(2), 141–173 (2015)
MathSciNet Google Scholar

Download references

Acknowledgments

This paper is based on results obtained from a project commissioned by the New Energy and Industrial Technology Development Organization (NEDO).

Author information

Authors and Affiliations

AI research center AIST, Koto-ku, Japan
Taisuke Sato
Graduate School of Medicine, Kyoto University, Kyoto, Japan
Ryosuke Kojima

Authors

Taisuke Sato
View author publications
You can also search for this author in PubMed Google Scholar
Ryosuke Kojima
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Taisuke Sato .

Editor information

Editors and Affiliations

Sorbonne University – Sciences, Paris, France
Amal El Fallah Seghrouchni
Bar-Ilan University, Ramat Gan, Israel
David Sarne

Appendix

Here we describe how the Jacobian $ {\mathbf{J}} _\mathbf{a}^{abd}$ in Sect. 1 is derived. Recall that our cost function (1) is

$$\begin{aligned} {\mathbf{J}} ^{abd}( {\mathbf{X}} )= & {} \frac{1}{2} \{ \Vert \text{ min}_1( {\mathbf{R}} _1 {\mathbf{X}} ) - {\mathbf{R}} _3 \Vert _F^2 + \ell \cdot \Vert {\mathbf{X}} \odot (\mathbbm {1}- {\mathbf{X}} ) \Vert _F^2 \}. \end{aligned}$$

First we introduce a dot product for two matrices $ {\mathbf{X}} $ and $ {\mathbf{Y}} $ by $( {\mathbf{X}} \bullet {\mathbf{Y}} ) = \sum _{ij} {\mathbf{X}} _{ij} {\mathbf{Y}} _{ij}$. Then $\Vert X \Vert _F^2 =( {\mathbf{X}} \bullet {\mathbf{X}} )$ holds. Also $(( {\mathbf{X}} {\mathbf{Z}} ) \bullet {\mathbf{Y}} ) = ( {\mathbf{Z}} \bullet ( {\mathbf{X}} ^T {\mathbf{Y}} ))$ and $(( {\mathbf{X}} \odot {\mathbf{Z}} ) \bullet {\mathbf{Y}} ) = ( {\mathbf{Z}} \bullet ( {\mathbf{X}} \odot {\mathbf{Y}} ))$ hold. Let $ {\mathbf{X}} _{pq}$ be the (p, q) element of a matrix $ {\mathbf{X}} $ and $ {\mathbf{I}} _{pq}$ a zero matrix except the (p, q) element which is one. We also put $ {\mathbf{C}} = {\mathbf{R}} _1 {\mathbf{X}} $, $ {\mathbf{B}} = \text{ min}_1( {\mathbf{C}} )- {\mathbf{R}} _3$ and use the fact that $ {\mathbf{C}} _{\le 1}\odot {\mathbf{B}} = {\mathbf{C}} _{\le 1}\odot ( {\mathbf{C}} - {\mathbf{R}} _3)$ for simplification. Now we have

$$\begin{aligned}&{ \partial {\mathbf{J}} ^{abd}/\partial {\mathbf{X}} _{pq} } \\&\,\, = (( {\mathbf{C}} _{\le 1}\odot ( {\mathbf{R}} _1 {\mathbf{I}} _{pq})) \bullet {\mathbf{B}} ) + \\&\qquad \,\, \ell \cdot ( ( {\mathbf{I}} _{pq} \bullet ( {\mathbf{X}} \odot (\mathbbm {1}- {\mathbf{X}} )\odot (\mathbbm {1}- {\mathbf{X}} ))) - ( {\mathbf{I}} _{pq} \bullet ( {\mathbf{X}} \odot {\mathbf{X}} \odot (\mathbbm {1}- {\mathbf{X}} )))) \\&\,\, = (( {\mathbf{R}} _1 {\mathbf{I}} _{pq}) \bullet ( {\mathbf{C}} _{\le 1}\odot {\mathbf{B}} )) + ( {\mathbf{I}} _{pq} \bullet \ell \cdot ( {\mathbf{X}} \odot (\mathbbm {1}- {\mathbf{X}} )\odot (\mathbbm {1}-2 {\mathbf{X}} ))) \\&\,\, = ( {\mathbf{I}} _{pq} \bullet ( {\mathbf{R}} _1^T( {\mathbf{C}} _{\le 1}\odot {\mathbf{B}} ))) + ( {\mathbf{I}} _{pq} \bullet \ell \cdot ( {\mathbf{X}} \odot (\mathbbm {1}- {\mathbf{X}} )\odot (\mathbbm {1}-2 {\mathbf{X}} ))) \\&\,\, = ( {\mathbf{I}} _{pq} \bullet ( {\mathbf{R}} _1^T( {\mathbf{C}} _{\le 1}\odot {\mathbf{B}} ) + \ell \cdot ( {\mathbf{X}} \odot (\mathbbm {1}- {\mathbf{X}} )\odot (\mathbbm {1}-2 {\mathbf{X}} )))) \\&\,\, = ( {\mathbf{I}} _{pq} \bullet ( {\mathbf{R}} _1^T( {\mathbf{C}} _{\le 1}\odot ( {\mathbf{C}} - {\mathbf{R}} _3)) + \ell \cdot ( {\mathbf{X}} \odot (\mathbbm {1}- {\mathbf{X}} )\odot (\mathbbm {1}-2 {\mathbf{X}} )))) \nonumber \end{aligned}$$

Since this holds for any (p, q), we reach the Jacobian $ {\mathbf{J}} _\mathbf{a}^{abd}$ (5):

$$\begin{aligned} {\mathbf{J}} _\mathbf{a}^{abd}= & {} \partial {\mathbf{J}} ^{abd}/\partial {\mathbf{X}} \\= & {} {\mathbf{R}} _1^T(( {\mathbf{R}} _1 {\mathbf{X}} )_{\le 1}\odot ( {\mathbf{R}} _1 {\mathbf{X}} - {\mathbf{R}} _3)) + \ell \cdot ( {\mathbf{X}} \odot (\mathbbm {1}- {\mathbf{X}} ) \odot (\mathbbm {1}-2 {\mathbf{X}} )) \nonumber \end{aligned}$$

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sato, T., Kojima, R. (2020). Logical Inference as Cost Minimization in Vector Spaces. In: El Fallah Seghrouchni, A., Sarne, D. (eds) Artificial Intelligence. IJCAI 2019 International Workshops. IJCAI 2019. Lecture Notes in Computer Science(), vol 12158. Springer, Cham. https://doi.org/10.1007/978-3-030-56150-5_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-56150-5_12
Published: 18 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-56149-9
Online ISBN: 978-3-030-56150-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Logical Inference as Cost Minimization in Vector Spaces

Abstract

Access this chapter

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation