Abstract
This paper investigates the potential of integrating visual object states for developing methods addressing complex visual intelligence tasks such as Long-Term Action anticipation (LTAA) and proposes that this to achieve this with the aid of a Neurosymbolic (NeSy) framework. We consider that this approach could offer significant advancements in applications requiring nuanced understanding and anticipation of future scenarios and could serve as an inspiration for the further development of Nesy methods exhibiting Visual Intelligence.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bellotto, N., Castri, L., Hanheide, M., Mghames, S.: A neuro-symbolic approach for enhanced human motion prediction. repository.lincoln.ac.uk (2023)
Bhagat, S., Stepputtis, S., Campbell, J.: Knowledge-guided short-context action anticipation in human-centric videos. arXiv preprint arXiv:2309.05943 (2023)
Das, S., Ryoo, M.: Video+ clip baseline for ego4d long-term action anticipation. arXiv preprint arXiv:2207.00579 (2022)
De Raedt, L., Dumančić, S., Manhaeve, R., Marra, G.: From statistical relational to neuro-symbolic artificial intelligence. arXiv preprint arXiv:2003.08316 (2020)
Garcez, A.D., Lamb, L.C.: Neurosymbolic AI: The 3rd wave. Artif. Intell. Rev. 56(11), 12387–12406 (2023)
Geman, D., Geman, S., Hallonquist, N., et al.: Visual turing test for computer vision systems. Proc. Natl. Acad. Sci. 112(12), 3618–3623 (2015). https://doi.org/10.1073/pnas.1422953112
Gong, D., Lee, J., Kim, M., Ha, S.: Future transformer for long-term action anticipation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4377–4386 (2022)
Gouidis, F., Vassiliades, A., Patkos, T., Argyros, A.A., Bassiliades, N., Plexousakis, D.: A review on intelligent object perception methods combining knowledge-based reasoning and machine learning. In: Martin, A., et al. (eds.) Proceedings of the AAAI 2020 Spring Symposium on Combining Machine Learning and Knowledge Engineering in Practice, AAAI-MAKE 2020, Palo Alto, 23–25 March 2020, Volume I. CEUR Workshop Proceedings, vol. 2600. CEUR-WS.org (2020). https://ceur-ws.org/Vol-2600/paper7.pdf
Grauman, K., et al.: Ego4d: around the world in 3,000 hours of egocentric video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18995–19012 (2022)
Hitzler, P., Eberhart, A., Ebrahimi, M.: Neuro-symbolic approaches in artificial intelligence. Natl. Sci. Rev. 9(6) (2022)
Huang, D., Hilliges, O., Van Gool, L., Wang, X.: Palm: predicting actions through language models@ ego4d long-term action anticipation challenge 2023. arXiv preprint arXiv:2306.16545 (2023)
Katz, M., Srinivas, K., Sohrabi, S.: Scenario planning in the wild: a neuro-symbolic approach. In: Proceedings of the FinPlan Workshop at ICAPS (2021)
Ke, Q., Fritz, M., Schiele, B.: Time-conditioned action anticipation in one shot. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11004–11013 (2019)
Mascaró, E., Ahn, H., Lee, D.: Intention-conditioned long-term human egocentric action anticipation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2164–2173 (2023)
Mghames, S., Castri, L., Hanheide, M.: A neuro-symbolic approach for enhanced human motion prediction. In: 2023 International Joint Conference on Neural Networks (IJCNN). IEEE (2023). https://ieeexplore.ieee.org/abstract/document/10191970/
Nawhal, M., Jyothi, A.A., Mori, G.: Rethinking learning approaches for long-term action anticipation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022, pp. 558–576. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19830-4_32
Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.: The limitations of deep learning in adversarial settings. In: 2016 IEEE European Symposium on Security and Privacy (EuroS &P). pp. 372–387. IEEE (2016). https://ieeexplore.ieee.org/document/7467366
Patsch, C., Zhang, J., Wu, Y., Zakour, M.: Long-term action anticipation based on contextual alignment. In: 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2024) (2024)
Romero, O., Zimmerman, J., Steinfeld, A.: Synergistic integration of large language models and cognitive architectures for robust AI: an exploratory analysis. In: Proceedings of the AAAI Symposium (2023). https://ojs.aaai.org/index.php/AAAI-SS/article/view/27706
Thakur, S., Beyan, C., Morerio, P.: Leveraging next-active objects for context-aware anticipation in egocentric videos. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2024)
Thompson, N.C., Greenewald, K., Lee, K., Manso, G.F.: The computational limits of deep learning. arXiv preprint arXiv:2007.05558 (2020).
Zellers, R., Bisk, Y., Farhadi, A., Choi, Y.: From recognition to cognition: visual commonsense reasoning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6720–6731 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Gouidis, F., Papoutsakis, K., Patkos, T., Argyros, A., Plexousakis, D. (2025). Enabling Visual Intelligence by Leveraging Visual Object States in a Neurosymbolic Framework. In: Gong, M., Song, Y., Koh, Y.S., Xiang, W., Wang, D. (eds) AI 2024: Advances in Artificial Intelligence. AI 2024. Lecture Notes in Computer Science(), vol 15443. Springer, Singapore. https://doi.org/10.1007/978-981-96-0351-0_23
Download citation
DOI: https://doi.org/10.1007/978-981-96-0351-0_23
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-96-0350-3
Online ISBN: 978-981-96-0351-0
eBook Packages: Computer ScienceComputer Science (R0)