Semantic Representation and Recognition of Continued and Recursive Human Activities

Ryoo, M. S.; Aggarwal, J. K.

doi:10.1007/s11263-008-0181-1

Semantic Representation and Recognition of Continued and Recursive Human Activities

Published: 12 November 2008

Volume 82, pages 1–24, (2009)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

M. S. Ryoo¹ &
J. K. Aggarwal¹

866 Accesses
101 Citations
3 Altmetric
Explore all metrics

Abstract

This paper describes a methodology for automated recognition of complex human activities. The paper proposes a general framework which reliably recognizes high-level human actions and human-human interactions. Our approach is a description-based approach, which enables a user to encode the structure of a high-level human activity as a formal representation. Recognition of human activities is done by semantically matching constructed representations with actual observations. The methodology uses a context-free grammar (CFG) based representation scheme as a formal syntax for representing composite activities. Our CFG-based representation enables us to define complex human activities based on simpler activities or movements. Our system takes advantage of both statistical recognition techniques from computer vision and knowledge representation concepts from traditional artificial intelligence. In the low-level of the system, image sequences are processed to extract poses and gestures. Based on the recognition of gestures, the high-level of the system hierarchically recognizes composite actions and interactions occurring in a sequence of image frames. The concept of hallucinations and a probabilistic semantic-level recognition algorithm is introduced to cope with imperfect lower-layers. As a result, the system recognizes human activities including ‘fighting’ and ‘assault’, which are high-level activities that previous systems had difficulties. The experimental results show that our system reliably recognizes sequences of complex human activities with a high recognition rate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Allen, J. F., & Ferguson, G. (1994). Actions and events in interval temporal logic. Journal of Logic and Computation, 4(5), 531–579.
Article MATH MathSciNet Google Scholar
Bobick, A. F., & Wilson, A. D. (1997). A state-based approach to the representation and recognition of gesture. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(12), 1325–1337.
Article Google Scholar
Chomsky, N. (1956). Three models for the description of language. IEEE Transactions on Information Theory, 2(3), 113–124.
Article Google Scholar
Fine, S., Singer, Y., & Tishby, N. (1998). The hierarchical hidden Markov model: Analysis and applications. Machine Learning, 32(1), 41–62.
Article MATH Google Scholar
Hongeng, S., Nevatia, R., & Bremond, F. (2004). Video-based event recognition: Activity representation and probabilistic recognition methods. Computer Vision and Image Understanding: CVIU, 96(2), 129–162.
Article Google Scholar
Ivanov, Y. A., & Bobick, A. F. (2000). Recognition of visual activities and interactions by stochastic parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 852–872.
Article Google Scholar
Joo, S.-W., & Chellappa, R. (2006). Attribute grammar-based event recognition and anomaly detection. In CVPRW ’06: Proceedings of the 2006 conference on computer vision and pattern recognition workshop (p. 107).
Minnen, D., Essa, I. A., & Starner, T. (2003). Expectation grammars: Leveraging high-level expectations for activity recognition. In CVPR(2) (pp. 626–632). IEEE Computer Society.
Moore, D. J., & Essa, I. A. (2002). Recognizing multitasked activities from video using stochastic context-free grammar. In AAAI/IAAI (pp. 770–776).
Natarajan, P., & Nevatia, R. (2007). Coupled hidden semi Markov models for activity recognition. In IEEE workshop on motion and video computing, 2007. WMVC ’07 (pp. 10–10).
Nevatia, R., Hobbs, J., & Bolles, B. (2004). An ontology for video event representation. In CVPRW ’04: Proceedings of the 2004 conference on computer vision and pattern recognition workshop (CVPRW’04) (Vol. 7, p. 119).
Nguyen, N. T., Phung, D. Q., Venkatesh, S., & Bui, H. H. (2005). Learning and detecting activities from movement trajectories using the hierarchical hidden Markov models. In CVPR(2) (pp. 955–960). IEEE Computer Society.
Oliver, N. M., Rosario, B., & Pentland, A. P. (2000). A Bayesian computer vision system for modeling human interactions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 831–843.
Article Google Scholar
Park, S., & Aggarwal, J. K. (2004a). A hierarchical Bayesian network for event recognition of human actions and interactions. Multimedia Systems, 10(2), 164–179.
Article Google Scholar
Park, S., & Aggarwal, J. K. (2004b). Semantic-level understanding of human actions and interactions using event hierarchy. In CVPRW ’04: Proceedings of the 2004 conference on computer vision and pattern recognition workshop (CVPRW’04) (Vol. 1, p. 12).
Park, S., & Aggarwal, J. K. (2006). Simultaneous tracking of multiple body parts of interacting persons. Computer Vision and Image Understanding, 102(1), 1–21.
Article Google Scholar
Pinhanez, C. (1999). Representation and recognition of action in interactive spaces. MIT Media Lab, June 1999: PhD thesis.
Google Scholar
Ryoo, M. S., & Aggarwal, J. K. (2006a). Recognition of composite human activities through context-free grammar based representation. In CVPR ’06: Proceedings of the 2006 IEEE computer society conference on computer vision and pattern recognition (pp. 1709–1718).
Ryoo, M. S., & Aggarwal, J. K. (2006b). Semantic understanding of continued and recursive human activities. In ICPR ’06: Proceedings of the 18th international conference on pattern recognition (pp. 379–382).
Ryoo, M. S., & Aggarwal, J. K. (2007). Robust human-computer interaction system guiding a user by providing feedback. In M. M. Veloso (Ed.), IJCAI 2007, Proceedings of the 20th international joint conference on artificial intelligence (pp. 2850–2855).
Shi, Y., Huang, Y., Minnen, D., Bobick, A. F., & Essa, I. A. (2004). Propagation networks for recognition of partially ordered sequential action. In CVPR(2) (pp. 862–869).
Siskind, J. M. (2001). Grounding the lexical semantics of verbs in visual perception using force dynamics and event logic. Journal of Artificial Intelligence Research (JAIR), 15, 31–90.
MATH Google Scholar
Starner, T., & Pentland, A. (1995). Real-time American sign language recognition from video using hidden Markov models. ISCV, 00, 265.
Google Scholar
Vu, V.-T., Brémond, F., & Thonnat, M. (2003). Automatic video interpretation: A novel algorithm for temporal scenario recognition. In G. Gottlob & T. Walsh (Eds.), IJCAI-03, Proceedings of the eighteenth international joint conference on artificial intelligence (pp. 1295–1302). San Mateo: Morgan Kaufmann.
Google Scholar
Yamato, J., Ohya, J., & Ishii, K. (1992). Recognizing human action in time-sequential images using hidden Markov model. In CVPR (pp. 379–385).

Download references

Author information

Authors and Affiliations

Computer and Vision Research Center, The University of Texas at Austin, Austin, TX, USA
M. S. Ryoo & J. K. Aggarwal

Authors

M. S. Ryoo
View author publications
You can also search for this author in PubMed Google Scholar
J. K. Aggarwal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. S. Ryoo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ryoo, M.S., Aggarwal, J.K. Semantic Representation and Recognition of Continued and Recursive Human Activities. Int J Comput Vis 82, 1–24 (2009). https://doi.org/10.1007/s11263-008-0181-1

Download citation

Received: 10 July 2007
Accepted: 01 October 2008
Published: 12 November 2008
Issue Date: April 2009
DOI: https://doi.org/10.1007/s11263-008-0181-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semantic Representation and Recognition of Continued and Recursive Human Activities

Abstract

Access this article

Similar content being viewed by others

What an Algorithm Is

Toward human activity recognition: a survey

Vision-based human activity recognition: a survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Semantic Representation and Recognition of Continued and Recursive Human Activities

Abstract

Access this article

Similar content being viewed by others

What an Algorithm Is

Toward human activity recognition: a survey

Vision-based human activity recognition: a survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation