Skip to main content

Embed2Rule Scalable Neuro-Symbolic Learning via Latent Space Weak-Labelling

  • Conference paper
  • First Online:
Neural-Symbolic Learning and Reasoning (NeSy 2024)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14979))

Included in the following conference series:

  • 694 Accesses

Abstract

Neuro-symbolic approaches have garnered much interest recently as a path toward endowing neural systems with robust reasoning capabilities. Most proposed end-to-end methods assume knowledge to be given in advance and do not scale up over many latent concepts. The recently proposed Embed2Sym tackles the scalability limitation by performing end-to-end neural training of a visual perception component from downstream labels to generate clusters in the latent space of symbolic concepts. These are later used to perform downstream symbolic reasoning but symbolic knowledge is still engineered. Taking inspiration from Embed2Sym, this paper introduces a novel method for scalable neuro-symbolic learning of first-order logic programs from raw data. The learned clusters are optimally labelled using sampled predictions of a pre-trained vision-language model. A SOTA symbolic learner, robust to noise, uses these labels to learn an answer set program that solves the reasoning task. Our approach, called Embed2Rule, is shown to achieve better accuracy than SOTA neuro-symbolic systems on existing benchmark tasks in most cases while scaling up to tasks that require far more complex reasoning and a large number of latent concepts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Examples with infinite penalty must be covered by the induced hypothesis.

  2. 2.

    Implementation/data can be found at https://github.com/YanivAspis/Embed2Rule.

References

  1. Aspis, Y., Broda, K., Lobo, J., Russo, A.: Embed2Sym - scalable neuro-symbolic reasoning via clustered embeddings. In: Proceedings of the 19th International Conference on Principles of Knowledge Representation and Reasoning, pp. 421–431, August 2022. https://doi.org/10.24963/kr.2022/44, https://doi.org/10.24963/kr.2022/44

  2. Augustine, E., Pryor, C., Dickens, C., Pujara, J., Wang, W.Y., Getoor, L.: Visual sudoku puzzle classification: a suite of collective neuro-symbolic tasks. In: d’Avila Garcez, A.S., Jiménez-Ruiz, E. (eds.) Proceedings of the 16th International Workshop on Neural-Symbolic Learning and Reasoning as part of the 2nd International Joint Conference on Learning & Reasoning (IJCLR 2022), Cumberland Lodge, Windsor Great Park, UK, September 28-30, 2022. CEUR Workshop Proceedings, vol. 3212, pp. 15–29. CEUR-WS.org (2022), https://ceur-ws.org/Vol-3212/paper2.pdf

  3. Badreddine, S., d’Avila Garcez, A., Serafini, L., Spranger, M.: Logic tensor networks. Artificial Intelligence 303, 103649 (2022). https://doi.org/10.1016/j.artint.2021.103649, https://www.sciencedirect.com/science/article/pii/S0004370221002009

  4. Charalambous, T., Aspis, Y., Russo, A.: Neuralfastlas: Fast logic-based learning from raw data (2023)

    Google Scholar 

  5. Cunnington, D., Law, M., Lobo, J., Russo, A.: Ffnsl: feed-forward neural-symbolic learner. Mach. Learn. 112(2), 515–569 (2023)

    Article  MathSciNet  Google Scholar 

  6. Cunnington, D., Law, M., Lobo, J., Russo, A.: Neuro-symbolic learning of answer set programs from raw data. In: Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, pp. 3586–3596. International Joint Conferences on Artificial Intelligence Organization (8 2023)

    Google Scholar 

  7. Cunnington, D., Law, M., Lobo, J., Russo, A.: The role of foundation models in neuro-symbolic learning and reasoning (2024). https://arxiv.org/abs/2402.01889

  8. Dai, W.Z., Muggleton, S.: Abductive knowledge induction from raw data. In: Zhou, Z.H. (ed.) Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21. pp. 1845–1851. International Joint Conferences on Artificial Intelligence Organization (8 2021). https://doi.org/10.24963/ijcai.2021/254. https://doi.org/10.24963/ijcai.2021/254, main Track

  9. Daniele, A., Campari, T., Malhotra, S., Serafini, L.: Deep symbolic learning: Discovering symbols and rules from perceptions. In: Elkind, E. (ed.) Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-23, pp. 3597–3605. International Joint Conferences on Artificial Intelligence Organization (8 2023). https://doi.org/10.24963/ijcai.2023/400. https://doi.org/10.24963/ijcai.2023/400, main Track

  10. Dasaratha, S., Puranam, S.A., Phogat, K.S., Tiyyagura, S.R., Duffy, N.P.: Deeppsl: End-to-end perception and reasoning. In: Elkind, E. (ed.) Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-23, pp. 3606–3614. International Joint Conferences on Artificial Intelligence Organization, August 2023. https://doi.org/10.24963/ijcai.2023/401, https://doi.org/10.24963/ijcai.2023/401, main Track

  11. Defresne, M., Barbe, S., Schiex, T.: Scalable coupling of deep learning with logical reasoning. In: Elkind, E. (ed.) Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-23, pp. 3615–3623. International Joint Conferences on Artificial Intelligence Organization (8 2023). https://doi.org/10.24963/ijcai.2023/402, https://doi.org/10.24963/ijcai.2023/402, main Track

  12. Evans, R., Grefenstette, E.: Learning explanatory rules from noisy data. J. Artif. Intell. Res. 61, 1–64 (2018)

    Article  MathSciNet  Google Scholar 

  13. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  14. Huang, Y.X., Dai, W.Z., Cai, L.W., Muggleton, S.H., Jiang, Y.: Fast abductive learning by similarity-based consistency optimization. Adv. Neural. Inf. Process. Syst. 34, 26574–26584 (2021)

    Google Scholar 

  15. Huang, Y.X., Dai, W.Z., Jiang, Y., Zhou, Z.: Enabling knowledge refinement upon new concepts in abductive learning. In: AAAI Conference on Artificial Intelligence (2023). https://api.semanticscholar.org/CorpusID:259731271

  16. Huang, Y.X., Sun, Z., Li, G., Tian, X., Dai, W.Z., Hu, W., Jiang, Y., Zhou, Z.H.: Enabling abductive learning to exploit knowledge graph. In: Elkind, E. (ed.) Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-23, pp. 3839–3847. International Joint Conferences on Artificial Intelligence Organization (8 2023). https://doi.org/10.24963/ijcai.2023/427, https://doi.org/10.24963/ijcai.2023/427, main Track

  17. Karp, R.M.: Reducibility among Combinatorial Problems, pp. 85–103. Springer US, Boston, MA (1972). https://doi.org/10.1007/978-1-4684-2001-2_9, https://doi.org/10.1007/978-1-4684-2001-2_9

  18. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization (2017)

    Google Scholar 

  19. van Krieken, E., Thanapalasingam, T., Tomczak, J.M., van Harmelen, F., Teije, A.t.: A-nesi: A scalable approximate method for probabilistic neurosymbolic inference. arXiv preprint arXiv:2212.12393 (2022)

  20. Kuhn, H.W.: The hungarian method for the assignment problem. Naval Res. Logistics Quarterly 2(1–2), 83–97 (1955)

    Article  MathSciNet  Google Scholar 

  21. Law, M.: Conflict-driven inductive logic programming. Theory Pract. Logic Program. 23(2), 387–414 (2023)

    Article  MathSciNet  Google Scholar 

  22. Law, M., Russo, A., Broda, K.: Inductive learning of answer set programs from noisy examples. arXiv preprint arXiv:1808.08441 (2018)

  23. Law, M., Russo, A., Broda, K.: The ilasp system for inductive learning of answer set programs. https://arxiv.org/abs/2005.00904 (2020)

  24. Law, M., Russo, A., Broda, K., Bertino, E.: Scalable non-observational predicate learning in asp. In: Zhou, Z.H. (ed.) Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pp. 1936–1943. International Joint Conferences on Artificial Intelligence Organization, August 2021. https://doi.org/10.24963/ijcai.2021/267, https://doi.org/10.24963/ijcai.2021/267, main Track

  25. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  26. LeCun, Y., Cortes, C., Burges, C.: Mnist handwritten digit database. ATT Labs. http://yann.lecun.com/exdb/mnist2 (2010)

  27. Li, D., Li, J., Le, H., Wang, G., Savarese, S., Hoi, S.C.: LAVIS: a one-stop library for language-vision intelligence. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pp. 31–41. Association for Computational Linguistics, Toronto, Canada, July 2023. https://aclanthology.org/2023.acl-demo.3

  28. Li, J., Li, D., Savarese, S., Hoi, S.: BLIP-2: bootstrapping language-image pre-training with frozen image encoders and large language models. In: ICML (2023)

    Google Scholar 

  29. Li, Z., et al: Neuro-symbolic learning yielding logical constraints. In: Thirty-seventh Conference on Neural Information Processing Systems (2023)

    Google Scholar 

  30. Manhaeve, R., Dumančić, S., Kimmig, A., Demeester, T., De Raedt, L.: Neural probabilistic logic programming in deepproblog. Artificial Intelligence 298, 103504 (2021). https://doi.org/10.1016/j.artint.2021.103504, https://www.sciencedirect.com/science/article/pii/S0004370221000552

  31. Manhaeve, R., Marra, G., De Raedt, L.: Approximate Inference for Neural Probabilistic Logic Programming. In: Proceedings of the 18th International Conference on Principles of Knowledge Representation and Reasoning, pp. 475–486, November 2021. https://doi.org/10.24963/kr.2021/45, https://doi.org/10.24963/kr.2021/45

  32. Muggleton, S.: Inductive logic programming. New Generation Comput. 8, 295–318 (1991)

    Article  Google Scholar 

  33. Muggleton, S.H., Lin, D., Tamaddoni-Nezhad, A.: Meta-interpretive learning of higher-order dyadic datalog: predicate invention revisited. Mach. Learn. 100(1), 49–73 (2015)

    Article  MathSciNet  Google Scholar 

  34. Pryor, C., Dickens, C., Augustine, E., Albalak, A., Wang, W.Y., Getoor, L.: Neupsl: Neural probabilistic soft logic. In: Elkind, E. (ed.) Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-23, pp. 4145–4153. International Joint Conferences on Artificial Intelligence Organization, August 2023. https://doi.org/10.24963/ijcai.2023/461, https://doi.org/10.24963/ijcai.2023/461, main Track

  35. Riegel, R., et al.: Logical neural networks. arXiv preprint arXiv:2006.13155 (2020)

  36. Sen, P., de Carvalho, B.W., Riegel, R., Gray, A.: Neuro-symbolic inductive logic programming with logical neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 8212–8219 (2022)

    Google Scholar 

  37. Shindo, H., Pfanschilling, V., Dhami, D.S., Kersting, K.: \(\alpha \) ilp: thinking visual scenes as differentiable logic programs. Mach. Learn. 112(5), 1465–1497 (2023)

    Article  MathSciNet  Google Scholar 

  38. Skryagin, A., Ochs, D., Dhami, D.S., Kersting, K.: Scalable neural-probabilistic answer set programming. J. Artif. Int. Res. 78 (dec 2023). https://doi.org/10.1613/jair.1.15027, https://doi.org/10.1613/jair.1.15027

  39. Tsamoura, E., Hospedales, T., Michael, L.: Neural-symbolic integration: a compositional perspective. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 5051–5060 (2021)

    Google Scholar 

  40. Vaswani, A., et al.: Attention is all you need. Advances in neural information processing systems 30 (2017)

    Google Scholar 

  41. Wang, P.W., Donti, P.L., Wilder, B., Kolter, Z.: Satnet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver. In: International Conference on Machine Learning (2019), https://api.semanticscholar.org/CorpusID:168170169

  42. Winters, T., Marra, G., Manhaeve, R., Raedt, L.D.: Deepstochlog: neural stochastic logic programming. In: Proceedings of the AAAI Conference on Artificial Intelligence 36(9), 10090–10100, June 2022. https://doi.org/10.1609/aaai.v36i9.21248, https://ojs.aaai.org/index.php/AAAI/article/view/21248

  43. Xu, J., Zhang, Z., Friedman, T., Liang, Y., Van den Broeck, G.: A semantic loss function for deep learning with symbolic knowledge. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 5502–5511. PMLR (10–15 Jul 2018). https://proceedings.mlr.press/v80/xu18h.html

  44. Yang, Y., Song, L.: Learn to explain efficiently via neural logic inductive learning. In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=SJlh8CEYDB

  45. Yang, Z., Ishay, A., Lee, J.: Neurasp: Embracing neural networks into answer set programming. In: Bessiere, C. (ed.) Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, pp. 1755–1762. International Joint Conferences on Artificial Intelligence Organization (7 2020). https://doi.org/10.24963/ijcai.2020/243, https://doi.org/10.24963/ijcai.2020/243, main track

  46. Yin, S., et al.: A survey on multimodal large language models (2023)

    Google Scholar 

Download references

Acknowledgments

This work was partly supported by UKRI grant EP/X040518/1.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Yaniv Aspis , Mohammad Albinhassan , Jorge Lobo or Alessandra Russo .

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Appendices

A Datasets

In this appendix, we describe the datasets used throughout the experiments described in the main paper.

1.1 A.1 Raw Datasets

For raw data (images), we use the standard MNIST [26] dataset as provided by the PyTorch module, and the Cards dataset by Cunnington et al. [5]. The Cards dataset is publicly available at: https://github.com/DanCunnington/FFNSL. For both datasets, we split the training images into a training and validation set, while using the test images as the test set.

Hitting Sets. For this task, we generated a dataset of 200K training samples, 1K validation samples, and 1K test samples for each case. The generation process was as follows: We randomly generated collections of sets of a given size (4-6) with a given number of elements (5 or 10) and checked if a hitting set of size 2 or less exists. We split the generated collections into training, validation and test sets with an equal number of positive and negative samples, ensuring there is no overlap between the sets. Then, we assigned each of the elements in a sample to a random image from MNIST with a label corresponding to that element, and from the appropriate split (train/validation/test). Note that MNIST images may repeat across different samples from the same split.

Visual Sudoku Classification. For the 4 \(\times \) 4 cases (Fixed and Random), we generated 50K training samples, 5K validation samples and 5K test samples. For the 9 \(\times \) 9 case, we generated 200K training samples, 20K validation and 20K test samples. When generating samples, we followed a process similar to Augustine et al. [2]. We began by generating valid boards and then corrupted half to generate negative samples. There are two types of corruption: 1. Replacement - a cell is randomly chosen and its content is replaced with a different digit. 2. Substitution - two cells are randomly chosen and their contents are swapped. We apply at least one corruption (at random) to each board and then flip a (biased) coin to decide if further corruptions are to be applied. If we get heads, we apply another random corruption and flip again until we get tails. We then checked to see if the multiple corruptions did not accidentally create a valid board. In the 4 \(\times \) 4 cases, the coin has a bias of 0 (always landing on tails) so the board contains a single corruption. For the 9 \(\times \) 9 Fixed case, we used a bias of 0.75, so a board contains, on average, 4 corruptions. The generated (valid and invalid) puzzles are then split into a training, validation and test set, with an equal number of positive and negative samples, and ensuring no overlap. Assigning MNIST images to cells followed the same procedure as Hitting Sets.

Follow Suit Winner. We generated 200K training samples, 1K validation and 1K test samples. The generation process is done by continuously simulating games and recording the winner. The process of assigning images to cards is similar to the other tasks.

B Training Details

In this section, we provide further details on the training of the Embed2Rule model and baselines in the various experiments described in the evaluation section of the paper.

1.1 B.1 Training Hardware

We trained on a system with an Intel i7-12700K @ 3600 MHz, 32 GB of RAM and a Nvidia GeForce RTX 3090.

1.2 B.2 Embed2Rule Hyperparameters

In all tasks, K-Means is trained with 1000 random samples from the training set, and another 1024 random samples are given to BLIP-2 for weak-labelling.

Hitting Sets. We employ LeNet-5 [25] as a perception network with the output layer changed to size 32 with ReLU activation. The output embeddings are concatenated and fed into the reasoning network, an MLP with residual connections. This MLP is comprised of a stack of residual blocks. Each block is a fully connected layer with an output size of 128, followed by a GELU activation, batch norm and finally adding the block input. We use 4 such residual blocks. The network is trained for 100 epochs (50 in the 5-4 case) with a learning rate of 0.0009 and a batch size of 256 using the Adam optimiser [18]. The prompt for BLIP-2 was simply the digit in numerical form. We set a uniform penalty of 100 to the ILASP examples and set a maximum rule length of 10 (to allow learning long choice rules). We provide ILASP with 50 examples from the training set.

Visual Sudoku Classification. We utilise a CNN similar to that used in SATNet [41] for the perception network containing 2 convolutional and max-pooling layers, followed by 2 feed-forward layers with ReLU activations functions. We use a transformer encoder [40] for the reasoning network employing multi-headed self-attention with learnable positional embeddings. We set the number of attention heads to 4, the number of encoder blocks to 1, the feed-forward layer embedding dimensions to 512, the activation functions to GELU, and a dropout value of 0.1. The inputs and outputs of the transformer are embeddings of dimensions \(batch \times seq \times emb\), where batch is the batch size fixed to 512, seq is the sequence length set to 16 and 81 for the 4 \(\times \) 4 and 9 \(\times \) 9 Sudoku boards, respectively, and emb is the dimensions of the individual latent concept embeddings set to 128. We apply a single-layer classification head with a Sigmoid activation function to the layer-normalized outputs of the transformer to obtain the neural predictions. We train the network for 50 epochs using an AdamW optimizer with a linear warm-up phase from 0.0 to a base learning rate of 0.003 for 30% and 20% of the total number of iterations for the 4 \(\times \) 4 and 9 \(\times \) 9 boards, respectively, followed by a cosine annealing learning rate schedule between the base learning rate and 0.0001. We clip the gradient norm of each layer at 1.0. The prompt for BLIP-2 is again the digit in numerical form. We give ILASP 100 training examples with a uniform penalty of 1.

Follow Suit Winner. We use a ResNet-18 [13] as the perception network, with an output layer of size 128. As a reasoning network, we use a similar MLP architecture to the Hitting Sets task, with 4 residual blocks and a hidden size of 256. We train for 50 epochs with a learning rate of 0.0001 and a batch size of 128. For BLIP-2 we use a prompt of the format “The playing card rank of suit” where rank and suit are replaced with their appropriate values. We provide ILASP with 100 examples with a uniform penalty of 1.

1.3 B.3 Baseline Hyperparameters

We use the official implementations of NSIL and SLASH in our experiments, using the same perception networks as Embed2Rule. As both NSIL and SLASH are more data efficient than Embed2Rule, we used smaller training sets of 10K samples to avoid artificially increasing training time. We found no gain from using additional data.

NSIL. For Hitting Sets, we follow the same hyperparameters used in the official implementation. NSIL is trained for 20 iterations using the SGD optimiser with a learning rate of 0.0008, momentum of 0.7643 and a batch size of 64. The \(\lambda \) value (that affects the weight of neural network confidence in symbolic learning) is set to 1. For Visual Sudoku Classification and Follow Suit Winner, no set of hyperparameters affects the result, as NSIL could not solve either. For Sudoku, as the number of iterations did not change the performance of NSIL, we reported training time results when using 5 iterations.

SLASH. For Visual Sudoku Classification, we trained for 20 epochs using the Adam optimiser with a learning rate of 0.001 and a batch size of 100 (for 4 \(\times \) 4 random and 9 \(\times \) 9 we attempted to use batch sizes as small as 8 to overcome the memory issues, but found it did not help. Due to a bug in the implementation, batch sizes smaller than 8 are not possible.). For the Hitting Sets task, we found no set of hyperparameters that would allow SLASH to learn from positive samples alone, and so we report results for 5 epochs of training.

C Symbolic Learning Tasks

We include here the background knowledge and mode declarations given to ILASP in each task.

figure b
figure c
figure d
figure e
figure f
figure g

D Induced Programs

We include here some of the answer set programs induced by Embed2Rule (using ILASP) to solve the reasoning tasks in the paper.

figure h
figure i
figure j
figure k

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Aspis, Y., Albinhassan, M., Lobo, J., Russo, A. (2024). Embed2Rule Scalable Neuro-Symbolic Learning via Latent Space Weak-Labelling. In: Besold, T.R., d’Avila Garcez, A., Jimenez-Ruiz, E., Confalonieri, R., Madhyastha, P., Wagner, B. (eds) Neural-Symbolic Learning and Reasoning. NeSy 2024. Lecture Notes in Computer Science(), vol 14979. Springer, Cham. https://doi.org/10.1007/978-3-031-71167-1_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-71167-1_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-71166-4

  • Online ISBN: 978-3-031-71167-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics