Skip to main content
Log in

Declarative Reasoning about Space and Motion with Video

  • Technical Contribution
  • Published:
KI - Künstliche Intelligenz Aims and scope Submit manuscript

Abstract

We present a commonsense theory of space and motion for representing and reasoning about motion patterns in video data, to perform declarative (deep) semantic interpretation of visuo-spatial sensor data, e.g., coming from object tracking, eye tracking data, movement trajectories. The theory has been implemented within constraint logic programming to support integration into large scale AI projects. The theory is domain independent and has been applied in a range of domains, in which the capability to semantically interpret motion in visuo-spatial data is central. In this paper, we demonstrate its capabilities in the context of cognitive film studies for analysing visual perception of spectators by integrating the visual structure of a scene and spectators gaze acquired from eye tracking experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Al-Omari M, Chinellato E, Gatsoulis Y, Hogg DC, Cohn AG (2016) Unsupervised grounding of textual descriptions of object features and actions in video. In: Baral C, Delgrande JP, Wolter F (eds) Principles of knowledge representation and reasoning: Proceedings of the fifteenth international conference, KR 2016, Cape Town, South Africa, April 25–29, 2016, pp 505–508. AAAI Press

  2. Allen J F (1983) Maintaining knowledge about temporal intervals. Commun ACM 26(11):832–843 (ISSN 0001-0782)

    Article  MATH  Google Scholar 

  3. Aloimonos Y, Fermüller C (2015) The cognitive dialogue: a new model for vision implementing common sense reasoning. Image Vis Comput 34:42–44. doi:10.1016/j.imavis.2014.10.010 (ISSN 0262-8856)

  4. Bhatt M, Lee JH, Schultz C (2011) CLP(QS): a declarative spatial reasoning framework. COSIT 2011—spatial information theory. Springer, Berlin, pp 210–230 (ISBN 978-3-642-23195-7)

    Google Scholar 

  5. Bhatt M, Suchan J, Freksa C (2013a) Rotunde—a smart meeting cinematography initiative—tools, datasets, and benchmarks for cognitive interpretation and control. In: Space, time, and ambient intelligence. Papers from the 2013 AAAI Workshop, Bellevue, Washington, USA, July 14, 2013, volume WS-13-14 of AAAI Workshops. AAAI

  6. Bhatt M, Suchan J, Schultz CPL (2013b) Cognitive interpretation of everyday activities - toward perceptual narrative based visuo-spatial scene interpretation. In: Finlayson MA, Fisseni B, Löwe B, Meister JC (eds) 2013 workshop on computational models of narrative, CMN 2013, August 4–6, 2013, Hamburg, Germany, volume 32 of OASICS. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, pp 24–29

  7. Bhatt M, Suchan J, Kondyli V, Schultz CPL (2016a) Embodied visuo-locomotive experience analysis: immersive reality based summarisation of experiments in environment-behaviour studies. In: Jain E, Jörg S (eds) Proceedings of the ACM symposium on applied perception, SAP 2016, Anaheim, California, USA, July 22–23, 2016, p 133. ACM

  8. Bhatt M, Suchan J, Schultz CPL, Kondyli V, Goyal S (2016b) Artificial intelligence for predictive and evidence based architecture design. In: Schuurmans D, Wellman MP (eds) Proceedings of the thirtieth AAAI conference on artificial intelligence, February 12–17, 2016, Phoenix, Arizona, USA, pp 4349–4350. AAAI Press

  9. Cohn A, Hogg D, Bennett B, Devin V, Galata A, Magee D, Needham C, Santos P (2006) Cognitive vision: integrating symbolic qualitative representations with computer vision. In: Christensen H, Nagel H-H (eds) Cognitive vision systems, volume 3948 of lecture notes in computer science, vol 3948. Springer, Berlin, pp 221–246. doi:10.1007/11414353_14 (ISBN 978-3-540-33971-7)

    Google Scholar 

  10. Dubba KSR, Cohn AG, Hogg DC (2010) Event model learning from complex videos using ILP. In: Proceedings of ECAI, volume 215 of frontiers in artificial intelligence and applications, pp 93–98. IOS Press

  11. Dubba KSR, Cohn AG, Hogg DC, Bhatt M, Dylla F (2015) Learning relational event models from video. J Artif Intell Res (JAIR) 53:41–90

  12. Gärdenfors P (2000) Conceptual spaces—the geometry of thought. MIT Press, Cambridge ISBN 978-0-262-07199-4

    Google Scholar 

  13. Guesgen HW (1989) Spatial reasoning based on Allen’s temporal logic. Technical Report TR-89-049. International Computer Science Institute, Berkeley

  14. Hazarika SM, Cohn AG (2002) Abducing qualitative spatio-temporal histories from partial observations. In: KR’02 Proceedings of the eights international conference on principles of knowledge representation and reasoning. Morgan Kaufmann, San Francisco, pp 14–25

  15. Lieto A, Chella A, Frixione M (2017) Conceptual spaces for cognitive architectures: a lingua franca for different levels of representation. CoRR. arxiv:1701.00464

  16. Mandler J M, Pagn Cnovas C (2014) On defining image schemas. Lang Cogn 6(4):510?532. doi:10.1017/langcog.2014.14

    Article  Google Scholar 

  17. Moratz R (2006) Representing relative direction as a binary relation of oriented points. In ECAI, pp 407–411

  18. Muller P (1998) A qualitative theory of motion based on spatio-temporal primitives. In: Cohn AG, Schubert LK, Shapiro SC (eds) Proceedings of the sixth international conference on principles of knowledge representation and reasoning (KR’98), Trento, Italy, June 2–5, 1998, pp 131–143. Morgan Kaufmann

  19. Randell DA, Cui Z, Cohn A (1992) A spatial logic based on regions and connection. In: KR’92. Principles of knowledge representation and reasoning: Proceedings of the third international conference, pp 165–176. Morgan Kaufmann, San Mateo, California

  20. Rohrbach M, Rohrbach A, Regneri M, Amin S, Andriluka M, Pinkal M, Schiele B (2016) Recognizing fine-grained and composite activities using hand-centric features and script data. Int J Comput Vis 119(3):346–373

    Article  MathSciNet  Google Scholar 

  21. Rohrbach A, Torabi A, Rohrbach M, Tandon N, Pal C, Larochelle H, Courville A, Schiele B (2017) Movie description. Int J Comput Vis 123(1):94–120

  22. Schultz CPL, Bhatt M (2014) Declarative spatial reasoning with Boolean combinations of axis-aligned rectangular polytopes. In: ECAI 2014—21st European conference on artificial intelligence, 18–22 August 2014, Prague, Czech Republic—including prestigious applications of intelligent systems (PAIS 2014), pp 795–800. doi:10.3233/978-1-61499-419-0-795

  23. Schultz CPL, Bhatt M, Suchan J (2016) Probabilistic spatial reasoning in constraint logic programming. In: Schockaert S, Senellart P (eds) Scalable uncertainty management—10th international conference, SUM 2016, Nice, France, September 21–23, 2016, Proceedings, volume 9858 of lecture notes in computer science, pp 289–302. Springer

  24. Scivos A, Nebel B (2004) the finest of its class: the natural, point-based ternary calculus LR for qualitative spatial reasoning. In Freksa C, et al (2005) Spatial cognition IV. Reasoning, action, interaction: international co nference spatial cognition. Lecture notes in computer science, vol 3343, Springer, Berlin, volume 3343, pp 283–303

  25. Song YC, Kautz H, Allen J, Swift M, Li Y, Luo J, Zhang C (2013) A Markov logic framework for recognizing complex events from multimodal data. In: Proceedings of the 15th ACM on international conference on multimodal interaction, ICMI ’13, pp 141–148, New York, NY, USA, 2013. ACM. doi:10.1145/2522848.2522883 (ISBN 978-1-4503-2129-7)

  26. Spranger M, Suchan J, Bhatt M, Eppe M (2014) Grounding dynamic spatial relations for embodied (robot) interaction. In: PRICAI 2014: trends in artificial intelligence—13th Pacific Rim International conference on artificial intelligence, Gold Coast, QLD, Australia, December 1–5, 2014. Proceedings, volume 8862, pp 958–971. Springer. doi:10.1007/978-3-319-13560-1_83

  27. Spranger M, Suchan J, Bhatt M (2016) Robust natural language processing—combining reasoning, cognitive semantics, and construction grammar for spatial language. In: Kambhampati S (ed) Proceedings of the twenty-fifth international joint conference on artificial intelligence, IJCAI 2016, New York, NY, USA, 9–15 July 2016, pp 2908–2914. IJCAI/AAAI Press

  28. Sridhar M, Cohn AG, Hogg DC (2011) Benchmarking qualitative spatial calculi for video activity analysis. In: Proceedings of IJCAI workshop benchmarks and applications of spatial reasoning, pp 15–20

  29. Srinivasan A (2001) The Aleph manual. http://www.cs.ox.ac.uk/activities/machlearn/Aleph/. Accessed 18 Aug 2017

  30. Suchan J, Bhatt M (2016a) Semantic question-answering with video and eye-tracking data: AI foundations for human visual perception driven cognitive film studies. In: Kambhampati S (ed) Proceedings of the twenty-fifth international joint conference on artificial intelligence, IJCAI 2016, New York, NY, USA, 9–15 July 2016, pp 2633–2639. IJCAI/AAAI Press

  31. Suchan J, Bhatt M (2016b) The geometry of a scene: On deep semantics for visual perception driven cognitive film, studies. In: 2016 IEEE winter conference on applications of computer vision, WACV 2016, Lake Placid, NY, USA, March 7–10, 2016, pp 1–9. IEEE Computer Society

  32. Suchan J, Bhatt M, Santos PE (2014) Perceptual narratives of space and motion for semantic interpretation of visual data. In: de Agapito L, Bronstein MM, Rother C (eds) Computer vision—ECCV 2014 workshops—Zurich, Switzerland, September 6–7 and 12, 2014, Proceedings, Part II, volume 8926 of lecture notes in computer science, pp 339–354. Springer

  33. Suchan J, Bhatt M, Schultz CPL (2016) Deeply semantic inductive spatio-temporal learning. CoRR. arxiv:1608.02693

  34. Tran S, Davis LS (2008) Event modeling and recognition using Markov logic networks. Computer Vision-ECCV 2008, pp 610 – 623

  35. Tu K, Meng M, Lee MW, Choe TE, Zhu SC (2014) Joint video and text parsing for understanding events and answering queries. IEEE Multimed 21(2):42–70

    Article  Google Scholar 

  36. Vernon D (2006) The space of cognitive vision. In: Christensen HI, Nagel HH (eds) Cognitive vision systems. Lecture notes in computer science, vol 3948. Springer, Berlin, Heidelberg

  37. Vernon D (2008) Cognitive vision: the case for embodied perception. Image Vis Comput 26(1):127–140

    Article  Google Scholar 

  38. Walega P, Bhatt M, Schultz C (2015) ASPMT(QS): non-monotonic spatial reasoning with answer set programming modulo theories. In: LPNMR: logic programming and nonmonotonic reasoning—13th international conference

  39. Yang Y, Aloimonos Y, Fermüller C, Aksoy EE (2015) Learning the semantics of manipulation action. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing of the Asian Federation of Natural Language Processing, ACL 2015, July 26–31, 2015, Beijing, China, Volume 1: long papers, pp 676–686. The Association for Computer Linguistics

  40. Yu H, Siddharth N, Barbu A, Siskind JM (2015) A compositional framework for grounding language inference, generation, and acquisition in video. J Artif Intell Res 52:601–713. doi:10.1613/jair.4556

    MATH  MathSciNet  Google Scholar 

  41. Zampogiannis K, Yang Y, Fermüller C, Aloimonos Y (2015) Learning the spatial semantics of manipulation actions through preposition grounding. In: IEEE international conference on robotics and automation, ICRA 2015, Seattle, WA, USA, 26–30 May, 2015, pp 1389–1396. IEEE. doi:10.1109/ICRA.2015.7139371

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jakob Suchan.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Suchan, J. Declarative Reasoning about Space and Motion with Video. Künstl Intell 31, 321–330 (2017). https://doi.org/10.1007/s13218-017-0504-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13218-017-0504-x

Keywords

Navigation