Abstract
Neuro-symbolic approaches have garnered much interest recently as a path toward endowing neural systems with robust reasoning capabilities. Most proposed end-to-end methods assume knowledge to be given in advance and do not scale up over many latent concepts. The recently proposed Embed2Sym tackles the scalability limitation by performing end-to-end neural training of a visual perception component from downstream labels to generate clusters in the latent space of symbolic concepts. These are later used to perform downstream symbolic reasoning but symbolic knowledge is still engineered. Taking inspiration from Embed2Sym, this paper introduces a novel method for scalable neuro-symbolic learning of first-order logic programs from raw data. The learned clusters are optimally labelled using sampled predictions of a pre-trained vision-language model. A SOTA symbolic learner, robust to noise, uses these labels to learn an answer set program that solves the reasoning task. Our approach, called Embed2Rule, is shown to achieve better accuracy than SOTA neuro-symbolic systems on existing benchmark tasks in most cases while scaling up to tasks that require far more complex reasoning and a large number of latent concepts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Examples with infinite penalty must be covered by the induced hypothesis.
- 2.
Implementation/data can be found at https://github.com/YanivAspis/Embed2Rule.
References
Aspis, Y., Broda, K., Lobo, J., Russo, A.: Embed2Sym - scalable neuro-symbolic reasoning via clustered embeddings. In: Proceedings of the 19th International Conference on Principles of Knowledge Representation and Reasoning, pp. 421–431, August 2022. https://doi.org/10.24963/kr.2022/44, https://doi.org/10.24963/kr.2022/44
Augustine, E., Pryor, C., Dickens, C., Pujara, J., Wang, W.Y., Getoor, L.: Visual sudoku puzzle classification: a suite of collective neuro-symbolic tasks. In: d’Avila Garcez, A.S., Jiménez-Ruiz, E. (eds.) Proceedings of the 16th International Workshop on Neural-Symbolic Learning and Reasoning as part of the 2nd International Joint Conference on Learning & Reasoning (IJCLR 2022), Cumberland Lodge, Windsor Great Park, UK, September 28-30, 2022. CEUR Workshop Proceedings, vol. 3212, pp. 15–29. CEUR-WS.org (2022), https://ceur-ws.org/Vol-3212/paper2.pdf
Badreddine, S., d’Avila Garcez, A., Serafini, L., Spranger, M.: Logic tensor networks. Artificial Intelligence 303, 103649 (2022). https://doi.org/10.1016/j.artint.2021.103649, https://www.sciencedirect.com/science/article/pii/S0004370221002009
Charalambous, T., Aspis, Y., Russo, A.: Neuralfastlas: Fast logic-based learning from raw data (2023)
Cunnington, D., Law, M., Lobo, J., Russo, A.: Ffnsl: feed-forward neural-symbolic learner. Mach. Learn. 112(2), 515–569 (2023)
Cunnington, D., Law, M., Lobo, J., Russo, A.: Neuro-symbolic learning of answer set programs from raw data. In: Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, pp. 3586–3596. International Joint Conferences on Artificial Intelligence Organization (8 2023)
Cunnington, D., Law, M., Lobo, J., Russo, A.: The role of foundation models in neuro-symbolic learning and reasoning (2024). https://arxiv.org/abs/2402.01889
Dai, W.Z., Muggleton, S.: Abductive knowledge induction from raw data. In: Zhou, Z.H. (ed.) Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21. pp. 1845–1851. International Joint Conferences on Artificial Intelligence Organization (8 2021). https://doi.org/10.24963/ijcai.2021/254. https://doi.org/10.24963/ijcai.2021/254, main Track
Daniele, A., Campari, T., Malhotra, S., Serafini, L.: Deep symbolic learning: Discovering symbols and rules from perceptions. In: Elkind, E. (ed.) Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-23, pp. 3597–3605. International Joint Conferences on Artificial Intelligence Organization (8 2023). https://doi.org/10.24963/ijcai.2023/400. https://doi.org/10.24963/ijcai.2023/400, main Track
Dasaratha, S., Puranam, S.A., Phogat, K.S., Tiyyagura, S.R., Duffy, N.P.: Deeppsl: End-to-end perception and reasoning. In: Elkind, E. (ed.) Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-23, pp. 3606–3614. International Joint Conferences on Artificial Intelligence Organization, August 2023. https://doi.org/10.24963/ijcai.2023/401, https://doi.org/10.24963/ijcai.2023/401, main Track
Defresne, M., Barbe, S., Schiex, T.: Scalable coupling of deep learning with logical reasoning. In: Elkind, E. (ed.) Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-23, pp. 3615–3623. International Joint Conferences on Artificial Intelligence Organization (8 2023). https://doi.org/10.24963/ijcai.2023/402, https://doi.org/10.24963/ijcai.2023/402, main Track
Evans, R., Grefenstette, E.: Learning explanatory rules from noisy data. J. Artif. Intell. Res. 61, 1–64 (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Huang, Y.X., Dai, W.Z., Cai, L.W., Muggleton, S.H., Jiang, Y.: Fast abductive learning by similarity-based consistency optimization. Adv. Neural. Inf. Process. Syst. 34, 26574–26584 (2021)
Huang, Y.X., Dai, W.Z., Jiang, Y., Zhou, Z.: Enabling knowledge refinement upon new concepts in abductive learning. In: AAAI Conference on Artificial Intelligence (2023). https://api.semanticscholar.org/CorpusID:259731271
Huang, Y.X., Sun, Z., Li, G., Tian, X., Dai, W.Z., Hu, W., Jiang, Y., Zhou, Z.H.: Enabling abductive learning to exploit knowledge graph. In: Elkind, E. (ed.) Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-23, pp. 3839–3847. International Joint Conferences on Artificial Intelligence Organization (8 2023). https://doi.org/10.24963/ijcai.2023/427, https://doi.org/10.24963/ijcai.2023/427, main Track
Karp, R.M.: Reducibility among Combinatorial Problems, pp. 85–103. Springer US, Boston, MA (1972). https://doi.org/10.1007/978-1-4684-2001-2_9, https://doi.org/10.1007/978-1-4684-2001-2_9
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization (2017)
van Krieken, E., Thanapalasingam, T., Tomczak, J.M., van Harmelen, F., Teije, A.t.: A-nesi: A scalable approximate method for probabilistic neurosymbolic inference. arXiv preprint arXiv:2212.12393 (2022)
Kuhn, H.W.: The hungarian method for the assignment problem. Naval Res. Logistics Quarterly 2(1–2), 83–97 (1955)
Law, M.: Conflict-driven inductive logic programming. Theory Pract. Logic Program. 23(2), 387–414 (2023)
Law, M., Russo, A., Broda, K.: Inductive learning of answer set programs from noisy examples. arXiv preprint arXiv:1808.08441 (2018)
Law, M., Russo, A., Broda, K.: The ilasp system for inductive learning of answer set programs. https://arxiv.org/abs/2005.00904 (2020)
Law, M., Russo, A., Broda, K., Bertino, E.: Scalable non-observational predicate learning in asp. In: Zhou, Z.H. (ed.) Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pp. 1936–1943. International Joint Conferences on Artificial Intelligence Organization, August 2021. https://doi.org/10.24963/ijcai.2021/267, https://doi.org/10.24963/ijcai.2021/267, main Track
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
LeCun, Y., Cortes, C., Burges, C.: Mnist handwritten digit database. ATT Labs. http://yann.lecun.com/exdb/mnist2 (2010)
Li, D., Li, J., Le, H., Wang, G., Savarese, S., Hoi, S.C.: LAVIS: a one-stop library for language-vision intelligence. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pp. 31–41. Association for Computational Linguistics, Toronto, Canada, July 2023. https://aclanthology.org/2023.acl-demo.3
Li, J., Li, D., Savarese, S., Hoi, S.: BLIP-2: bootstrapping language-image pre-training with frozen image encoders and large language models. In: ICML (2023)
Li, Z., et al: Neuro-symbolic learning yielding logical constraints. In: Thirty-seventh Conference on Neural Information Processing Systems (2023)
Manhaeve, R., Dumančić, S., Kimmig, A., Demeester, T., De Raedt, L.: Neural probabilistic logic programming in deepproblog. Artificial Intelligence 298, 103504 (2021). https://doi.org/10.1016/j.artint.2021.103504, https://www.sciencedirect.com/science/article/pii/S0004370221000552
Manhaeve, R., Marra, G., De Raedt, L.: Approximate Inference for Neural Probabilistic Logic Programming. In: Proceedings of the 18th International Conference on Principles of Knowledge Representation and Reasoning, pp. 475–486, November 2021. https://doi.org/10.24963/kr.2021/45, https://doi.org/10.24963/kr.2021/45
Muggleton, S.: Inductive logic programming. New Generation Comput. 8, 295–318 (1991)
Muggleton, S.H., Lin, D., Tamaddoni-Nezhad, A.: Meta-interpretive learning of higher-order dyadic datalog: predicate invention revisited. Mach. Learn. 100(1), 49–73 (2015)
Pryor, C., Dickens, C., Augustine, E., Albalak, A., Wang, W.Y., Getoor, L.: Neupsl: Neural probabilistic soft logic. In: Elkind, E. (ed.) Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-23, pp. 4145–4153. International Joint Conferences on Artificial Intelligence Organization, August 2023. https://doi.org/10.24963/ijcai.2023/461, https://doi.org/10.24963/ijcai.2023/461, main Track
Riegel, R., et al.: Logical neural networks. arXiv preprint arXiv:2006.13155 (2020)
Sen, P., de Carvalho, B.W., Riegel, R., Gray, A.: Neuro-symbolic inductive logic programming with logical neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 8212–8219 (2022)
Shindo, H., Pfanschilling, V., Dhami, D.S., Kersting, K.: \(\alpha \) ilp: thinking visual scenes as differentiable logic programs. Mach. Learn. 112(5), 1465–1497 (2023)
Skryagin, A., Ochs, D., Dhami, D.S., Kersting, K.: Scalable neural-probabilistic answer set programming. J. Artif. Int. Res. 78 (dec 2023). https://doi.org/10.1613/jair.1.15027, https://doi.org/10.1613/jair.1.15027
Tsamoura, E., Hospedales, T., Michael, L.: Neural-symbolic integration: a compositional perspective. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 5051–5060 (2021)
Vaswani, A., et al.: Attention is all you need. Advances in neural information processing systems 30 (2017)
Wang, P.W., Donti, P.L., Wilder, B., Kolter, Z.: Satnet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver. In: International Conference on Machine Learning (2019), https://api.semanticscholar.org/CorpusID:168170169
Winters, T., Marra, G., Manhaeve, R., Raedt, L.D.: Deepstochlog: neural stochastic logic programming. In: Proceedings of the AAAI Conference on Artificial Intelligence 36(9), 10090–10100, June 2022. https://doi.org/10.1609/aaai.v36i9.21248, https://ojs.aaai.org/index.php/AAAI/article/view/21248
Xu, J., Zhang, Z., Friedman, T., Liang, Y., Van den Broeck, G.: A semantic loss function for deep learning with symbolic knowledge. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 5502–5511. PMLR (10–15 Jul 2018). https://proceedings.mlr.press/v80/xu18h.html
Yang, Y., Song, L.: Learn to explain efficiently via neural logic inductive learning. In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=SJlh8CEYDB
Yang, Z., Ishay, A., Lee, J.: Neurasp: Embracing neural networks into answer set programming. In: Bessiere, C. (ed.) Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, pp. 1755–1762. International Joint Conferences on Artificial Intelligence Organization (7 2020). https://doi.org/10.24963/ijcai.2020/243, https://doi.org/10.24963/ijcai.2020/243, main track
Yin, S., et al.: A survey on multimodal large language models (2023)
Acknowledgments
This work was partly supported by UKRI grant EP/X040518/1.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Ethics declarations
Disclosure of Interests
The authors have no competing interests to declare that are relevant to the content of this article.
Appendices
A Datasets
In this appendix, we describe the datasets used throughout the experiments described in the main paper.
1.1 A.1 Raw Datasets
For raw data (images), we use the standard MNIST [26] dataset as provided by the PyTorch module, and the Cards dataset by Cunnington et al. [5]. The Cards dataset is publicly available at: https://github.com/DanCunnington/FFNSL. For both datasets, we split the training images into a training and validation set, while using the test images as the test set.
Hitting Sets. For this task, we generated a dataset of 200K training samples, 1K validation samples, and 1K test samples for each case. The generation process was as follows: We randomly generated collections of sets of a given size (4-6) with a given number of elements (5 or 10) and checked if a hitting set of size 2 or less exists. We split the generated collections into training, validation and test sets with an equal number of positive and negative samples, ensuring there is no overlap between the sets. Then, we assigned each of the elements in a sample to a random image from MNIST with a label corresponding to that element, and from the appropriate split (train/validation/test). Note that MNIST images may repeat across different samples from the same split.
Visual Sudoku Classification. For the 4 \(\times \) 4 cases (Fixed and Random), we generated 50K training samples, 5K validation samples and 5K test samples. For the 9 \(\times \) 9 case, we generated 200K training samples, 20K validation and 20K test samples. When generating samples, we followed a process similar to Augustine et al. [2]. We began by generating valid boards and then corrupted half to generate negative samples. There are two types of corruption: 1. Replacement - a cell is randomly chosen and its content is replaced with a different digit. 2. Substitution - two cells are randomly chosen and their contents are swapped. We apply at least one corruption (at random) to each board and then flip a (biased) coin to decide if further corruptions are to be applied. If we get heads, we apply another random corruption and flip again until we get tails. We then checked to see if the multiple corruptions did not accidentally create a valid board. In the 4 \(\times \) 4 cases, the coin has a bias of 0 (always landing on tails) so the board contains a single corruption. For the 9 \(\times \) 9 Fixed case, we used a bias of 0.75, so a board contains, on average, 4 corruptions. The generated (valid and invalid) puzzles are then split into a training, validation and test set, with an equal number of positive and negative samples, and ensuring no overlap. Assigning MNIST images to cells followed the same procedure as Hitting Sets.
Follow Suit Winner. We generated 200K training samples, 1K validation and 1K test samples. The generation process is done by continuously simulating games and recording the winner. The process of assigning images to cards is similar to the other tasks.
B Training Details
In this section, we provide further details on the training of the Embed2Rule model and baselines in the various experiments described in the evaluation section of the paper.
1.1 B.1 Training Hardware
We trained on a system with an Intel i7-12700K @ 3600 MHz, 32 GB of RAM and a Nvidia GeForce RTX 3090.
1.2 B.2 Embed2Rule Hyperparameters
In all tasks, K-Means is trained with 1000 random samples from the training set, and another 1024 random samples are given to BLIP-2 for weak-labelling.
Hitting Sets. We employ LeNet-5 [25] as a perception network with the output layer changed to size 32 with ReLU activation. The output embeddings are concatenated and fed into the reasoning network, an MLP with residual connections. This MLP is comprised of a stack of residual blocks. Each block is a fully connected layer with an output size of 128, followed by a GELU activation, batch norm and finally adding the block input. We use 4 such residual blocks. The network is trained for 100 epochs (50 in the 5-4 case) with a learning rate of 0.0009 and a batch size of 256 using the Adam optimiser [18]. The prompt for BLIP-2 was simply the digit in numerical form. We set a uniform penalty of 100 to the ILASP examples and set a maximum rule length of 10 (to allow learning long choice rules). We provide ILASP with 50 examples from the training set.
Visual Sudoku Classification. We utilise a CNN similar to that used in SATNet [41] for the perception network containing 2 convolutional and max-pooling layers, followed by 2 feed-forward layers with ReLU activations functions. We use a transformer encoder [40] for the reasoning network employing multi-headed self-attention with learnable positional embeddings. We set the number of attention heads to 4, the number of encoder blocks to 1, the feed-forward layer embedding dimensions to 512, the activation functions to GELU, and a dropout value of 0.1. The inputs and outputs of the transformer are embeddings of dimensions \(batch \times seq \times emb\), where batch is the batch size fixed to 512, seq is the sequence length set to 16 and 81 for the 4 \(\times \) 4 and 9 \(\times \) 9 Sudoku boards, respectively, and emb is the dimensions of the individual latent concept embeddings set to 128. We apply a single-layer classification head with a Sigmoid activation function to the layer-normalized outputs of the transformer to obtain the neural predictions. We train the network for 50 epochs using an AdamW optimizer with a linear warm-up phase from 0.0 to a base learning rate of 0.003 for 30% and 20% of the total number of iterations for the 4 \(\times \) 4 and 9 \(\times \) 9 boards, respectively, followed by a cosine annealing learning rate schedule between the base learning rate and 0.0001. We clip the gradient norm of each layer at 1.0. The prompt for BLIP-2 is again the digit in numerical form. We give ILASP 100 training examples with a uniform penalty of 1.
Follow Suit Winner. We use a ResNet-18 [13] as the perception network, with an output layer of size 128. As a reasoning network, we use a similar MLP architecture to the Hitting Sets task, with 4 residual blocks and a hidden size of 256. We train for 50 epochs with a learning rate of 0.0001 and a batch size of 128. For BLIP-2 we use a prompt of the format “The playing card rank of suit” where rank and suit are replaced with their appropriate values. We provide ILASP with 100 examples with a uniform penalty of 1.
1.3 B.3 Baseline Hyperparameters
We use the official implementations of NSIL and SLASH in our experiments, using the same perception networks as Embed2Rule. As both NSIL and SLASH are more data efficient than Embed2Rule, we used smaller training sets of 10K samples to avoid artificially increasing training time. We found no gain from using additional data.
NSIL. For Hitting Sets, we follow the same hyperparameters used in the official implementation. NSIL is trained for 20 iterations using the SGD optimiser with a learning rate of 0.0008, momentum of 0.7643 and a batch size of 64. The \(\lambda \) value (that affects the weight of neural network confidence in symbolic learning) is set to 1. For Visual Sudoku Classification and Follow Suit Winner, no set of hyperparameters affects the result, as NSIL could not solve either. For Sudoku, as the number of iterations did not change the performance of NSIL, we reported training time results when using 5 iterations.
SLASH. For Visual Sudoku Classification, we trained for 20 epochs using the Adam optimiser with a learning rate of 0.001 and a batch size of 100 (for 4 \(\times \) 4 random and 9 \(\times \) 9 we attempted to use batch sizes as small as 8 to overcome the memory issues, but found it did not help. Due to a bug in the implementation, batch sizes smaller than 8 are not possible.). For the Hitting Sets task, we found no set of hyperparameters that would allow SLASH to learn from positive samples alone, and so we report results for 5 epochs of training.
C Symbolic Learning Tasks
We include here the background knowledge and mode declarations given to ILASP in each task.






D Induced Programs
We include here some of the answer set programs induced by Embed2Rule (using ILASP) to solve the reasoning tasks in the paper.




Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Aspis, Y., Albinhassan, M., Lobo, J., Russo, A. (2024). Embed2Rule Scalable Neuro-Symbolic Learning via Latent Space Weak-Labelling. In: Besold, T.R., d’Avila Garcez, A., Jimenez-Ruiz, E., Confalonieri, R., Madhyastha, P., Wagner, B. (eds) Neural-Symbolic Learning and Reasoning. NeSy 2024. Lecture Notes in Computer Science(), vol 14979. Springer, Cham. https://doi.org/10.1007/978-3-031-71167-1_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-71167-1_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-71166-4
Online ISBN: 978-3-031-71167-1
eBook Packages: Computer ScienceComputer Science (R0)