Skip to main content

Advertisement

Log in

Efficient extraction of spatial relations for extended objects vis-à-vis human activity recognition in video

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Human activity recognition (HAR) deals with recognition of activities or interactions that include humans within a video. Entities occurring in a video frame can be abstracted in variety of ways, ranging from the detailed silhouette of the entity to the very basic axis-aligned minimum bounding rectangles (MBR). On one end of the spectrum, using a detailed silhouette is not only demanding in terms of storage and computational resources but is also easily susceptible to noise. On the other end of the spectrum, MBRs require less storage, computation and abstracts out noise and video specific details. However, for abstraction of human bodies in a video an MBR does not offer an adequate solution because in addition to abstracting away noise, it also abstracts out important details such as the posture of the human body. For a more precise description, which offers a reasonable tradeoff between efficiency and noise elimination, a human body can be abstracted using a set of MBRs corresponding to different body parts. However, for a representation of activities as relations between interacting objects, a simplistic approximation assuming each MBR to be an independent entity leads to computation of redundant relations. In this paper, we explore a representation schema for interaction between entities that are considered as sets of rectangles, also referred to as extended objects. We further show that, given the representation schema, a simple recursive algorithm can be used to opportunistically extract topological, directional and distance information in O(n l o g n) time. We evaluate our representation schema for HAR on the Mind’s Eye dataset (http://www.visint.org), the UT-Interaction (Ryoo and Aggarwal 2010) dataset and the SBU Kinect Interaction dataset (Yun et al. 2012).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. The sphere defines a region to be a sphere.

  2. A depiction of the states and their correspondence with the RA relations can be found in www.comp.leeds.ac.uk/qsr/cores.

  3. x,y,z,w are the indices of the cores, such that a i σ x y and b j σ z w .

  4. We use I-frames obtained using the tool ffmpeg as keyframes, www.ffmpeg.org.

References

  1. Aggarwal J, Ryoo M (2011) Human activity analysis: a review. ACM Comput Surv 43(3):16:1–16:43

    Article  Google Scholar 

  2. Bittner T, Donnelly M (2007) A formal theory of qualitative size and distance relations between regions. In: Proceedings of the 21st annual workshop on qualitative reasoning (QR07)

  3. Clementini E, Felice PD, Califano G (1995) Composite regions in topological queries. Inf Syst 20 (7):579–594

    Article  Google Scholar 

  4. Cohn AG, Hazarika SM (2001) Qualitative spatial representation and reasoning: an overview. Fundam Inform 46(1–2):1–29

    MathSciNet  MATH  Google Scholar 

  5. Cohn AG, Magee DR, Galata A, Hogg D, Hazarika SM (2003) Towards an architecture for cognitive vision using qualitative spatio-temporal representations and abduction Spatial cognition, pp 232–248

  6. Cohn AG, Renz J, Sridhar M (2012) Thinking inside the box: a comprehensive spatial representation for video analysis. In: Proceedings of the 13th international conference on principles of knowledge representation and reasoning (KR2012). AAAI Press, pp 588–592

  7. Dubba KSR, Bhatt M, Dylla F, Hogg DC, Cohn AG (2012) Interleaved inductive-abductive reasoning for learning complex event models. In: ILP. Lecture notes in computer science. Springer, vol 7207, pp 113–129

  8. Egenhofer MJ, Clementini E, Felice PD (1994) Topological relations between regions with holes. Int J Geogr Inf Syst 8:129–142

    Article  Google Scholar 

  9. Falomir Z, Jime’nez-Ruiz E, Museros L, Escrig MT (2009) An ontology for qualitative description of images. In: Spatial and temporal reasoning for ambient intelligence systems. Springer

  10. Kalita S, Karmakar A, Hazarika SM (2016) Comprehensive representation and efficient extraction of spatial information for human activity recognition from video data. In: Proceedings of international conference on computer vision and image processing (CVIP2016). Springer

  11. Kusumam K (2012) Relational learning using body parts for human activity recognition in videos. Master’s thesis, University of Leeds

  12. Park S, Park J, Al-masni M, Al-antari M, Uddin M, Kim TS (2016) A depth camera-based human activity recognition via deep learning recurrent neural network for health and social care services. Procedia Computer Science 100:78–84

    Article  Google Scholar 

  13. Randell DA, Cui Z, Cohn A (1992) A spatial logic based on regions and connection. In: Nebel B, Rich C, Swartout W (eds) Proceedings of the 3rd international conference on principles of knowledge representation and reasoning. KR’92. Morgan Kaufmann, pp 165–176

  14. Ryoo MS, Aggarwal JK (2010) UT-interaction dataset, ICPR contest on semantic description of human activities (SDHA). http://cvrc.ece.utexas.edu/SDHA2010/Human_Interaction.html

  15. Schneider M, Behr T (2006) Topological relationships between complex spatial objects. ACM Trans Database Syst 31(1):39–81

    Article  Google Scholar 

  16. Skiadopoulos S, Koubarakis M (2005) On the consistency of cardinal directions constraints. Artif Intell 163:91–135

    Article  MathSciNet  MATH  Google Scholar 

  17. Sokeh HS, Gould S, Renz J (2013) Efficient extraction and representation of spatial information from video data. In: Proceedings of the 23rd international joint conference on artificial intelligence (IJCAI’13), pp 1076–1082. AAAI press/IJCAI

  18. Sridhar M, Cohn AG, Hogg DC (2011) Benchmarking qualitative spatial calculi for video activity analysis. In: Proceedings of the IJCAI workshop benchmarks and applications of spatial reasoning, pp 15–20

  19. Yun K, Honorio J, Chattopadhyay D, Berg TL, Samaras D (2012) Two-person interaction detection using body-pose features and multiple instance learning. In: 2012 IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW). IEEE

  20. Zelnik-Manor L, IraniM(2001) Event-based analysis of video. In: 2001 IEEE conference on computer vision and pattern recognition (CVPR 2001). IEEE Computer Society, pp 123–130

  21. Zhang Y, Liu X, Chang MC, Ge W, Chen T (2012) Spatio-temporal phrases for activity recognition. Springer, Berlin Heidelberg, pp 707–721

    Google Scholar 

  22. Zhao Y, Holtzen S, Gao T, Zhu SC (2015) Represent and infer human theory of mind for human-robot interaction. In: 2015 AAAI fall symposium series

  23. Zhu W, Lan C, Xing J, Zeng W, Li Y, Shen L, Xie X (2016) Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In: Proceedings of the 38th AAAI conference on artificial intelligence, 2016, pp 3697–3704

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shobhanjana Kalita.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kalita, S., Karmakar, A. & Hazarika, S.M. Efficient extraction of spatial relations for extended objects vis-à-vis human activity recognition in video. Appl Intell 48, 204–219 (2018). https://doi.org/10.1007/s10489-017-0970-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-017-0970-8

Keywords

Navigation