Enabling Visual Intelligence by Leveraging Visual Object States in a Neurosymbolic Framework

Gouidis, Filippos; Papoutsakis, Konstantinos; Patkos, Theodore; Argyros, Antonis; Plexousakis, Dimitris

doi:10.1007/978-981-96-0351-0_23

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 15443))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

247 Accesses

Abstract

This paper investigates the potential of integrating visual object states for developing methods addressing complex visual intelligence tasks such as Long-Term Action anticipation (LTAA) and proposes that this to achieve this with the aid of a Neurosymbolic (NeSy) framework. We consider that this approach could offer significant advancements in applications requiring nuanced understanding and anticipation of future scenarios and could serve as an inspiration for the further development of Nesy methods exhibiting Visual Intelligence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bellotto, N., Castri, L., Hanheide, M., Mghames, S.: A neuro-symbolic approach for enhanced human motion prediction. repository.lincoln.ac.uk (2023)
Google Scholar
Bhagat, S., Stepputtis, S., Campbell, J.: Knowledge-guided short-context action anticipation in human-centric videos. arXiv preprint arXiv:2309.05943 (2023)
Das, S., Ryoo, M.: Video+ clip baseline for ego4d long-term action anticipation. arXiv preprint arXiv:2207.00579 (2022)
De Raedt, L., Dumančić, S., Manhaeve, R., Marra, G.: From statistical relational to neuro-symbolic artificial intelligence. arXiv preprint arXiv:2003.08316 (2020)
Garcez, A.D., Lamb, L.C.: Neurosymbolic AI: The 3rd wave. Artif. Intell. Rev. 56(11), 12387–12406 (2023)
Google Scholar
Geman, D., Geman, S., Hallonquist, N., et al.: Visual turing test for computer vision systems. Proc. Natl. Acad. Sci. 112(12), 3618–3623 (2015). https://doi.org/10.1073/pnas.1422953112
Gong, D., Lee, J., Kim, M., Ha, S.: Future transformer for long-term action anticipation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4377–4386 (2022)
Google Scholar
Gouidis, F., Vassiliades, A., Patkos, T., Argyros, A.A., Bassiliades, N., Plexousakis, D.: A review on intelligent object perception methods combining knowledge-based reasoning and machine learning. In: Martin, A., et al. (eds.) Proceedings of the AAAI 2020 Spring Symposium on Combining Machine Learning and Knowledge Engineering in Practice, AAAI-MAKE 2020, Palo Alto, 23–25 March 2020, Volume I. CEUR Workshop Proceedings, vol. 2600. CEUR-WS.org (2020). https://ceur-ws.org/Vol-2600/paper7.pdf
Grauman, K., et al.: Ego4d: around the world in 3,000 hours of egocentric video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18995–19012 (2022)
Google Scholar
Hitzler, P., Eberhart, A., Ebrahimi, M.: Neuro-symbolic approaches in artificial intelligence. Natl. Sci. Rev. 9(6) (2022)
Google Scholar
Huang, D., Hilliges, O., Van Gool, L., Wang, X.: Palm: predicting actions through language models@ ego4d long-term action anticipation challenge 2023. arXiv preprint arXiv:2306.16545 (2023)
Katz, M., Srinivas, K., Sohrabi, S.: Scenario planning in the wild: a neuro-symbolic approach. In: Proceedings of the FinPlan Workshop at ICAPS (2021)
Google Scholar
Ke, Q., Fritz, M., Schiele, B.: Time-conditioned action anticipation in one shot. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11004–11013 (2019)
Google Scholar
Mascaró, E., Ahn, H., Lee, D.: Intention-conditioned long-term human egocentric action anticipation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2164–2173 (2023)
Google Scholar
Mghames, S., Castri, L., Hanheide, M.: A neuro-symbolic approach for enhanced human motion prediction. In: 2023 International Joint Conference on Neural Networks (IJCNN). IEEE (2023). https://ieeexplore.ieee.org/abstract/document/10191970/
Nawhal, M., Jyothi, A.A., Mori, G.: Rethinking learning approaches for long-term action anticipation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022, pp. 558–576. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19830-4_32
Chapter Google Scholar
Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.: The limitations of deep learning in adversarial settings. In: 2016 IEEE European Symposium on Security and Privacy (EuroS &P). pp. 372–387. IEEE (2016). https://ieeexplore.ieee.org/document/7467366
Patsch, C., Zhang, J., Wu, Y., Zakour, M.: Long-term action anticipation based on contextual alignment. In: 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2024) (2024)
Google Scholar
Romero, O., Zimmerman, J., Steinfeld, A.: Synergistic integration of large language models and cognitive architectures for robust AI: an exploratory analysis. In: Proceedings of the AAAI Symposium (2023). https://ojs.aaai.org/index.php/AAAI-SS/article/view/27706
Thakur, S., Beyan, C., Morerio, P.: Leveraging next-active objects for context-aware anticipation in egocentric videos. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2024)
Google Scholar
Thompson, N.C., Greenewald, K., Lee, K., Manso, G.F.: The computational limits of deep learning. arXiv preprint arXiv:2007.05558 (2020).
Zellers, R., Bisk, Y., Farhadi, A., Choi, Y.: From recognition to cognition: visual commonsense reasoning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6720–6731 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, University of Crete, Heraklion, Greece
Filippos Gouidis, Antonis Argyros & Dimitris Plexousakis
Institute of Computer Science, Foundation for Research and Technology Hellas, Heraklion, Greece
Filippos Gouidis, Konstantinos Papoutsakis, Theodore Patkos, Antonis Argyros & Dimitris Plexousakis
Department of Management, Science and Technology, Hellenic Mediterranean University, Agios Nikolaos, Greece
Konstantinos Papoutsakis

Authors

Filippos Gouidis
View author publications
You can also search for this author in PubMed Google Scholar
Konstantinos Papoutsakis
View author publications
You can also search for this author in PubMed Google Scholar
Theodore Patkos
View author publications
You can also search for this author in PubMed Google Scholar
Antonis Argyros
View author publications
You can also search for this author in PubMed Google Scholar
Dimitris Plexousakis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Filippos Gouidis .

Editor information

Editors and Affiliations

The University of Melbourne, Parkville, VIC, Australia
Mingming Gong
The University of Adelaide, Adelaide, SA, Australia
Yiliao Song
The University of Auckland, Auckland, Auckland, New Zealand
Yun Sing Koh
La Trobe University, Bundoora, VIC, Australia
Wei Xiang
CSIRO’s Data61, Clayton, VIC, Australia
Derui Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gouidis, F., Papoutsakis, K., Patkos, T., Argyros, A., Plexousakis, D. (2025). Enabling Visual Intelligence by Leveraging Visual Object States in a Neurosymbolic Framework. In: Gong, M., Song, Y., Koh, Y.S., Xiang, W., Wang, D. (eds) AI 2024: Advances in Artificial Intelligence. AI 2024. Lecture Notes in Computer Science(), vol 15443. Springer, Singapore. https://doi.org/10.1007/978-981-96-0351-0_23

Download citation

DOI: https://doi.org/10.1007/978-981-96-0351-0_23
Published: 20 November 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-96-0350-3
Online ISBN: 978-981-96-0351-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Enabling Visual Intelligence by Leveraging Visual Object States in a Neurosymbolic Framework