Artificial Visual Intelligence

Bhatt, Mehul; Suchan, Jakob

doi:10.1007/978-3-031-24349-3_12

Mehul Bhatt^11,13 &
Jakob Suchan¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13500))

Included in the following conference series:

ECCAI Advanced Course on Artificial Intelligence

1291 Accesses
4 Citations

Abstract

We address computational cognitive vision and perception at the interface of language, logic, cognition, and artificial intelligence. The chapter presents general methods for the processing and semantic interpretation of dynamic visuospatial imagery with a particular emphasis on the ability to abstract, learn, and reason with cognitively rooted structured characterisations of commonsense knowledge pertaining to space and motion. The presented work constitutes a systematic model and methodology integrating diverse, multi-faceted AI methods pertaining Knowledge Representation and Reasoning, Computer Vision, and Machine Learning towards realising practical, human-centred artificial visual intelligence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Multi-domain refers to more than one aspect of space, e.g., topology, orientation, direction, distance, shape; this requires a mixed domain ontology involving points, line-segments, polygons, and regions of space, time, and space-time [21, 35, 48].
2.
Select publications relevant to these chosen examples include: visuospatial questions-answering [37, 39,40,41], visuospatial abduction [43, 45, 47, 49], and integration of learning and reasoning [42, 46].
3.
A summary is available in [10].
4.
Select readings are indicated in Appendix A.

References

Baltrusaitis, T., Zadeh, A., Lim, Y.C., Morency, L.: OpenFace 2.0: facial behavior analysis toolkit. In: 2018 13th IEEE International Conference on Automatic Face Gesture Recognition (FG 2018), pp. 59–66, May 2018. https://doi.org/10.1109/FG.2018.00019
Bergmann, P., Meinhardt, T., Leal-Taixé, L.: Tracking without bells and whistles. In: The IEEE International Conference on Computer Vision (ICCV), October 2019
Google Scholar
Bewley, A., Ge, Z., Ott, L., Ramos, F., Upcroft, B.: Simple online and realtime tracking. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 3464–3468 (2016). https://doi.org/10.1109/ICIP.2016.7533003
Bhatt, M.: Reasoning about space, actions and change: a paradigm for applications of spatial reasoning. In: Qualitative Spatial Representation and Reasoning: Trends and Future Directions. IGI Global, USA (2012)
Google Scholar
Bhatt, M., Guesgen, H.W., Wölfl, S., Hazarika, S.M.: Qualitative spatial and temporal reasoning: emerging applications, trends, and directions. Spatial Cogn. Comput. 11(1), 1–14 (2011). https://doi.org/10.1080/13875868.2010.548568
Article Google Scholar
Bhatt, M., Kersting, K.: Semantic interpretation of multi-modal human-behaviour data - making sense of events, activities, processes. KI/Artif. Intell. 31(4), 317–320 (2017)
Google Scholar
Bhatt, M., Lee, J.H., Schultz, C.: CLP(QS): a declarative spatial reasoning framework. In: Egenhofer, M., Giudice, N., Moratz, R., Worboys, M. (eds.) COSIT 2011. LNCS, vol. 6899, pp. 210–230. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23196-4_12
Chapter Google Scholar
Bhatt, M., Loke, S.W.: Modelling dynamic spatial systems in the situation calculus. Spatial Cogn. Comput. 8(1–2), 86–130 (2008). https://doi.org/10.1080/13875860801926884
Article Google Scholar
Bhatt, M., Schultz, C., Freksa, C.: The ‘space’ in spatial assistance systems: conception, formalisation and computation. In: Tenbrink, T., Wiener, J., Claramunt, C. (eds.) Representing Space in Cognition: Interrelations of Behavior, Language, and Formal Models. Series: Explorations in Language and Space. Oxford University Press (2013). ISBN 978-0-19-967991-1
Google Scholar
Bhatt, M., Suchan, J.: Cognitive vision and perception. In: Giacomo, G.D., Catalá, A., Dilkina, B., Milano, M., Barro, S., Bugarín, A., Lang, J. (eds.) 24th European Conference on Artificial Intelligence, ECAI 2020, Santiago de Compostela, Spain, 29 August–8 September 2020 - Including 10th Conference on Prestigious Applications of Artificial Intelligence (PAIS 2020). Frontiers in Artificial Intelligence and Applications, vol. 325, pp. 2881–2882. IOS Press (2020). https://doi.org/10.3233/FAIA200434
Bochkovskiy, A., Wang, C., Liao, H.M.: YOLOv4: optimal speed and accuracy of object detection. CoRR abs/2004.10934 (2020). https://arxiv.org/abs/2004.10934
Brewka, G., Eiter, T., Truszczyński, M.: Answer set programming at a glance. Commun. ACM 54(12), 92–103 (2011). https://doi.org/10.1145/2043174.2043195
Article Google Scholar
Cao, Z., Hidalgo Martinez, G., Simon, T., Wei, S., Sheikh, Y.A.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. 43, 172–186 (2019)
Article Google Scholar
Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. arXiv:1802.02611 (2018)
Davis, E.: Pouring liquids: a study in commonsense physical reasoning. Artif. Intell. 172(12–13), 1540–1578 (2008)
Article MathSciNet MATH Google Scholar
Davis, E.: How does a box work? A study in the qualitative dynamics of solid objects. Artif. Intell. 175(1), 299–345 (2011)
Article MathSciNet MATH Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR 2009 (2009)
Google Scholar
Deng, J., Guo, J., Ververas, E., Kotsia, I., Zafeiriou, S.: RetinaFace: single-shot multi-level face localisation in the wild. In: CVPR (2020)
Google Scholar
Dubba, K.S.R., Cohn, A.G., Hogg, D.C., Bhatt, M., Dylla, F.: Learning relational event models from video. J. Artif. Intell. Res. (JAIR) 53, 41–90 (2015). https://doi.org/10.1613/jair.4395. http://dx.doi.org/10.1613/jair.4395
Hampe, B., Grady, J.E.: From Perception to Meaning. De Gruyter Mouton, Berlin (2008). https://www.degruyter.com/view/title/17429
Hazarika, S.M.: Qualitative spatial change : space-time histories and continuity. Ph.D. thesis, The University of Leeds, School of Computing (2005). Supervisor - Anthony Cohn
Google Scholar
He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. 42(02), 386–397 (2020). https://doi.org/10.1109/TPAMI.2018.2844175
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, pp. 770–778. IEEE Computer Society (2016). https://doi.org/10.1109/CVPR.2016.90
Hu, P., Ramanan, D.: Finding tiny faces. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
Google Scholar
Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: FlowNet 2.0: evolution of optical flow estimation with deep networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017). http://lmb.informatik.uni-freiburg.de/Publications/2017/IMSKDB17
Jaffar, J., Maher, M.J.: Constraint logic programming: a survey. J. Logic Program. 19, 503–581 (1994)
Article MathSciNet MATH Google Scholar
Kowalski, R., Sergot, M.: A logic-based calculus of events. In: Schmidt, J.W., Thanos, C. (eds.) Foundations of Knowledge Base Management, pp. 23–51. Springer, Heidelberg (1989). https://doi.org/10.1007/978-3-642-83397-7_2
Chapter Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Bartlett, P.L., Pereira, F.C.N., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a Meeting Held at Lake Tahoe, Nevada, United States, 3–6 December 2012, pp. 1106–1114 (2012). https://proceedings.neurips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html
Mani, I., Pustejovsky, J.: Interpreting Motion - Grounded Representations for Spatial Language, Explorations in Language and Space, vol. 5. Oxford University Press, Oxford (2012)
Book Google Scholar
Muggleton, S., Raedt, L.D.: Inductive logic programming: theory and methods. J. Log. Program. 19(20), 629–679 (1994)
Article MathSciNet MATH Google Scholar
Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, pp. 779–788. IEEE Computer Society (2016). https://doi.org/10.1109/CVPR.2016.91
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. CoRR abs/1804.02767 (2018). http://arxiv.org/abs/1804.02767
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017). https://doi.org/10.1109/TPAMI.2016.2577031
Article Google Scholar
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
Schultz, C., Bhatt, M., Suchan, J., Wałęga, P.A.: Answer set programming modulo ‘space-time’. In: Benzmüller, C., Ricca, F., Parent, X., Roman, D. (eds.) RuleML+RR 2018. LNCS, vol. 11092, pp. 318–326. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99906-7_24
Chapter Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)
Google Scholar
Spranger, M., Suchan, J., Bhatt, M.: Robust natural language processing - combining reasoning, cognitive semantics and construction grammar for spatial language. In: 25th International Joint Conference on Artificial Intelligence, IJCAI 2016. AAAI Press, July 2016
Google Scholar
Srinivasan, A.: The Aleph Manual (2001). http://web.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/
Suchan, J., Bhatt, M.: The geometry of a scene: on deep semantics for visual perception driven cognitive film, studies. In: 2016 IEEE Winter Conference on Applications of Computer Vision, WACV 2016, Lake Placid, NY, USA, 7–10, March 2016, pp. 1–9. IEEE Computer Society (2016). https://doi.org/10.1109/WACV.2016.7477712
Suchan, J., Bhatt, M.: Semantic question-answering with video and eye-tracking data: AI foundations for human visual perception driven cognitive film studies. In: Kambhampati, S. (ed.) Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9–15 July 2016, pp. 2633–2639. IJCAI/AAAI Press (2016). http://www.ijcai.org/Abstract/16/374
Suchan, J., Bhatt, M.: Deep semantic abstractions of everyday human activities: on commonsense representations of human interactions. In: ROBOT 2017: Third Iberian Robotics Conference, Advances in Intelligent Systems and Computing 693 (2017)
Google Scholar
Suchan, J., Bhatt, M., Schultz, C.P.L.: Deeply semantic inductive spatio-temporal learning. In: Cussens, J., Russo, A. (eds.) Proceedings of the 26th International Conference on Inductive Logic Programming (Short Papers), London, UK, vol. 1865, pp. 73–80. CEUR-WS.org (2016)
Google Scholar
Suchan, J., Bhatt, M., Varadarajan, S.: Out of sight but not out of mind: an answer set programming based online abduction framework for visual sensemaking in autonomous driving. In: Kraus, S. (ed.) Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, 10–16 August 2019, pp. 1879–1885. ijcai.org (2019). https://doi.org/10.24963/ijcai.2019/260
Suchan, J., Bhatt, M., Varadarajan, S.: Driven by commonsense. In: Giacomo, G.D., et al. (eds.) ECAI 2020–24th European Conference on Artificial Intelligence, Santiago de Compostela, Spain, 29 August–8 September 2020 - Including 10th Conference on Prestigious Applications of Artificial Intelligence (PAIS 2020). Frontiers in Artificial Intelligence and Applications, vol. 325, pp. 2939–2940. IOS Press (2020). https://doi.org/10.3233/FAIA200463
Suchan, J., Bhatt, M., Varadarajan, S.: Commonsense visual sensemaking for autonomous driving - on generalised neurosymbolic online abduction integrating vision and semantics. Artif. Intell. 299, 103522 (2021). https://doi.org/10.1016/j.artint.2021.103522
Article MATH Google Scholar
Suchan, J., Bhatt, M., Vardarajan, S., Amirshahi, S.A., Yu, S.: Semantic analysis of (reflectional) visual symmetry: a human-centred computational model for declarative explainability. Adv. Cogn. Syst. 6, 65–84 (2018). http://www.cogsys.org/journal
Suchan, J., Bhatt, M., Walega, P.A., Schultz, C.P.L.: Visual explanation by high-level abduction: on answer-set programming driven reasoning about moving objects. In: 32nd AAAI Conference on Artificial Intelligence (AAAI-2018), USA, pp. 1965–1972. AAAI Press (2018)
Google Scholar
Wałęga, P.A., Bhatt, M., Schultz, C.: ASPMT(QS): non-monotonic spatial reasoning with answer set programming modulo theories. In: Calimeri, F., Ianni, G., Truszczynski, M. (eds.) LPNMR 2015. LNCS (LNAI), vol. 9345, pp. 488–501. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23264-5_41
Chapter MATH Google Scholar
Walega, P.A., Schultz, C.P.L., Bhatt, M.: Non-monotonic spatial reasoning with answer set programming modulo theories. Theory Pract. Log. Program. 17(2), 205–225 (2017). https://doi.org/10.1017/S1471068416000193
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Örebro University, Örebro, Sweden
Mehul Bhatt
German Aerospace Center (DLR), Oldenburg, Germany
Jakob Suchan
CoDesign Lab EU/Cognition. AI. Interaction. Design., Stockholm, Sweden
Mehul Bhatt

Authors

Mehul Bhatt
View author publications
You can also search for this author in PubMed Google Scholar
Jakob Suchan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mehul Bhatt .

Editor information

Editors and Affiliations

Sorbonne University, Paris, France
Mohamed Chetouani
Umeå University, Umeå, Sweden
Virginia Dignum
German Research Centre for Artificial Inteligence, Kaiserslautern, Germany
Paul Lukowicz
Artificial Intelligence Research Institute (IIIA-CSIC), Bellaterra, Spain
Carles Sierra

Appendices

A Select Further Readings

Select readings pertaining to cognitive vision and perception are as follows:

\(\blacktriangleright \):: Visuospatial Question-Answering [40] [39] [41] [37]
\(\blacktriangleright \):: Visuospatial Abduction [43, 45] [47] [49]
\(\blacktriangleright \):: Relational Visuospatial Learning [42] [46] [19]

Select readings pertaining to foundational aspects of commonsense spatial reasoning (within a KR setting) are as follows:

\(\blacktriangleright \):: Theory (Space, Action, Change) [4, 5, 8, 9]
\(\blacktriangleright \):: Declarative Spatial Reasoning (CLP, ASP, ILP) [7, 35, 42, 48]

B Visual Computing Foundations

A robust low-level visual computing foundation driven by the state of the art in computer vision techniques (e.g., for visual feature detection, tracking) is necessary towards realising explainable visual intelligence in the manner described in this chapter. The examples of this chapter (in Sect. 4), for instance, require extracting and analysing scene elements (i.e., people, body-structure, and objects in the scene) and motion (i.e., object motion and scene motion), encompassing methods for:

Image Classification and Feature Learning – based on Big Data, (e.g., ImageNet [17, 34]), using neural network architectures such as AlexNets [28], VGG [36], or ResNet [23].
Detection, i.e., of people and objects [11, 31,32,33], and faces [18, 24].
Pose Estimation, i.e., of body pose [13] (including fine grained hand pose), face and gaze analysis [1].
Segmentation, i.e., semantic segmentation [14] and instance segmentation [22].
Motion Analysis, i.e., optical flow based motion estimation [25] and movement tracking [2, 3].

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bhatt, M., Suchan, J. (2023). Artificial Visual Intelligence. In: Chetouani, M., Dignum, V., Lukowicz, P., Sierra, C. (eds) Human-Centered Artificial Intelligence. ACAI 2021. Lecture Notes in Computer Science(), vol 13500. Springer, Cham. https://doi.org/10.1007/978-3-031-24349-3_12

Download citation

DOI: https://doi.org/10.1007/978-3-031-24349-3_12
Published: 04 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24348-6
Online ISBN: 978-3-031-24349-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Artificial Visual Intelligence

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

Appendices

A Select Further Readings

B Visual Computing Foundations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation