Efficient extraction of spatial relations for extended objects vis-à-vis human activity recognition in video

Kalita, Shobhanjana; Karmakar, Arindam; Hazarika, Shyamanta M.

doi:10.1007/s10489-017-0970-8

Efficient extraction of spatial relations for extended objects vis-à-vis human activity recognition in video

Published: 27 June 2017

Volume 48, pages 204–219, (2018)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Shobhanjana Kalita ORCID: orcid.org/0000-0001-5830-2501¹,
Arindam Karmakar¹ &
Shyamanta M. Hazarika¹

1046 Accesses
13 Citations
Explore all metrics

Abstract

Human activity recognition (HAR) deals with recognition of activities or interactions that include humans within a video. Entities occurring in a video frame can be abstracted in variety of ways, ranging from the detailed silhouette of the entity to the very basic axis-aligned minimum bounding rectangles (MBR). On one end of the spectrum, using a detailed silhouette is not only demanding in terms of storage and computational resources but is also easily susceptible to noise. On the other end of the spectrum, MBRs require less storage, computation and abstracts out noise and video specific details. However, for abstraction of human bodies in a video an MBR does not offer an adequate solution because in addition to abstracting away noise, it also abstracts out important details such as the posture of the human body. For a more precise description, which offers a reasonable tradeoff between efficiency and noise elimination, a human body can be abstracted using a set of MBRs corresponding to different body parts. However, for a representation of activities as relations between interacting objects, a simplistic approximation assuming each MBR to be an independent entity leads to computation of redundant relations. In this paper, we explore a representation schema for interaction between entities that are considered as sets of rectangles, also referred to as extended objects. We further show that, given the representation schema, a simple recursive algorithm can be used to opportunistically extract topological, directional and distance information in O(n l o g n) time. We evaluate our representation schema for HAR on the Mind’s Eye dataset (http://www.visint.org), the UT-Interaction (Ryoo and Aggarwal 2010) dataset and the SBU Kinect Interaction dataset (Yun et al. 2012).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Attention mechanisms in computer vision: A survey

Article Open access 15 March 2022

BoostTrack: boosting the similarity measure and detection confidence for improved multiple object tracking

Article Open access 12 April 2024

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Article Open access 06 February 2017

Notes

The sphere defines a region to be a sphere.
A depiction of the states and their correspondence with the RA relations can be found in www.comp.leeds.ac.uk/qsr/cores.
x,y,z,w are the indices of the cores, such that a _i ∈ σ _{x
y} and b _j ∈ σ _{z
w}.
We use I-frames obtained using the tool ffmpeg as keyframes, www.ffmpeg.org.

References

Aggarwal J, Ryoo M (2011) Human activity analysis: a review. ACM Comput Surv 43(3):16:1–16:43
Article Google Scholar
Bittner T, Donnelly M (2007) A formal theory of qualitative size and distance relations between regions. In: Proceedings of the 21st annual workshop on qualitative reasoning (QR07)
Clementini E, Felice PD, Califano G (1995) Composite regions in topological queries. Inf Syst 20 (7):579–594
Article Google Scholar
Cohn AG, Hazarika SM (2001) Qualitative spatial representation and reasoning: an overview. Fundam Inform 46(1–2):1–29
MathSciNet MATH Google Scholar
Cohn AG, Magee DR, Galata A, Hogg D, Hazarika SM (2003) Towards an architecture for cognitive vision using qualitative spatio-temporal representations and abduction Spatial cognition, pp 232–248
Cohn AG, Renz J, Sridhar M (2012) Thinking inside the box: a comprehensive spatial representation for video analysis. In: Proceedings of the 13th international conference on principles of knowledge representation and reasoning (KR2012). AAAI Press, pp 588–592
Dubba KSR, Bhatt M, Dylla F, Hogg DC, Cohn AG (2012) Interleaved inductive-abductive reasoning for learning complex event models. In: ILP. Lecture notes in computer science. Springer, vol 7207, pp 113–129
Egenhofer MJ, Clementini E, Felice PD (1994) Topological relations between regions with holes. Int J Geogr Inf Syst 8:129–142
Article Google Scholar
Falomir Z, Jime’nez-Ruiz E, Museros L, Escrig MT (2009) An ontology for qualitative description of images. In: Spatial and temporal reasoning for ambient intelligence systems. Springer
Kalita S, Karmakar A, Hazarika SM (2016) Comprehensive representation and efficient extraction of spatial information for human activity recognition from video data. In: Proceedings of international conference on computer vision and image processing (CVIP2016). Springer
Kusumam K (2012) Relational learning using body parts for human activity recognition in videos. Master’s thesis, University of Leeds
Park S, Park J, Al-masni M, Al-antari M, Uddin M, Kim TS (2016) A depth camera-based human activity recognition via deep learning recurrent neural network for health and social care services. Procedia Computer Science 100:78–84
Article Google Scholar
Randell DA, Cui Z, Cohn A (1992) A spatial logic based on regions and connection. In: Nebel B, Rich C, Swartout W (eds) Proceedings of the 3rd international conference on principles of knowledge representation and reasoning. KR’92. Morgan Kaufmann, pp 165–176
Ryoo MS, Aggarwal JK (2010) UT-interaction dataset, ICPR contest on semantic description of human activities (SDHA). http://cvrc.ece.utexas.edu/SDHA2010/Human_Interaction.html
Schneider M, Behr T (2006) Topological relationships between complex spatial objects. ACM Trans Database Syst 31(1):39–81
Article Google Scholar
Skiadopoulos S, Koubarakis M (2005) On the consistency of cardinal directions constraints. Artif Intell 163:91–135
Article MathSciNet MATH Google Scholar
Sokeh HS, Gould S, Renz J (2013) Efficient extraction and representation of spatial information from video data. In: Proceedings of the 23rd international joint conference on artificial intelligence (IJCAI’13), pp 1076–1082. AAAI press/IJCAI
Sridhar M, Cohn AG, Hogg DC (2011) Benchmarking qualitative spatial calculi for video activity analysis. In: Proceedings of the IJCAI workshop benchmarks and applications of spatial reasoning, pp 15–20
Yun K, Honorio J, Chattopadhyay D, Berg TL, Samaras D (2012) Two-person interaction detection using body-pose features and multiple instance learning. In: 2012 IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW). IEEE
Zelnik-Manor L, IraniM(2001) Event-based analysis of video. In: 2001 IEEE conference on computer vision and pattern recognition (CVPR 2001). IEEE Computer Society, pp 123–130
Zhang Y, Liu X, Chang MC, Ge W, Chen T (2012) Spatio-temporal phrases for activity recognition. Springer, Berlin Heidelberg, pp 707–721
Google Scholar
Zhao Y, Holtzen S, Gao T, Zhu SC (2015) Represent and infer human theory of mind for human-robot interaction. In: 2015 AAAI fall symposium series
Zhu W, Lan C, Xing J, Zeng W, Li Y, Shen L, Xie X (2016) Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In: Proceedings of the 38th AAAI conference on artificial intelligence, 2016, pp 3697–3704

Download references

Author information

Authors and Affiliations

Computer Science and Engineering, Tezpur University, Tezpur, 784028, India
Shobhanjana Kalita, Arindam Karmakar & Shyamanta M. Hazarika

Authors

Shobhanjana Kalita
View author publications
You can also search for this author in PubMed Google Scholar
Arindam Karmakar
View author publications
You can also search for this author in PubMed Google Scholar
Shyamanta M. Hazarika
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shobhanjana Kalita.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kalita, S., Karmakar, A. & Hazarika, S.M. Efficient extraction of spatial relations for extended objects vis-à-vis human activity recognition in video. Appl Intell 48, 204–219 (2018). https://doi.org/10.1007/s10489-017-0970-8

Download citation

Published: 27 June 2017
Issue Date: January 2018
DOI: https://doi.org/10.1007/s10489-017-0970-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient extraction of spatial relations for extended objects vis-à-vis human activity recognition in video

Abstract

Access this article

Similar content being viewed by others

Attention mechanisms in computer vision: A survey

BoostTrack: boosting the similarity measure and detection confidence for improved multiple object tracking

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient extraction of spatial relations for extended objects vis-à-vis human activity recognition in video

Abstract

Access this article

Similar content being viewed by others

Attention mechanisms in computer vision: A survey

BoostTrack: boosting the similarity measure and detection confidence for improved multiple object tracking

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation