Skip to main content

Advertisement

Log in

Semantic Representation and Recognition of Continued and Recursive Human Activities

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

This paper describes a methodology for automated recognition of complex human activities. The paper proposes a general framework which reliably recognizes high-level human actions and human-human interactions. Our approach is a description-based approach, which enables a user to encode the structure of a high-level human activity as a formal representation. Recognition of human activities is done by semantically matching constructed representations with actual observations. The methodology uses a context-free grammar (CFG) based representation scheme as a formal syntax for representing composite activities. Our CFG-based representation enables us to define complex human activities based on simpler activities or movements. Our system takes advantage of both statistical recognition techniques from computer vision and knowledge representation concepts from traditional artificial intelligence. In the low-level of the system, image sequences are processed to extract poses and gestures. Based on the recognition of gestures, the high-level of the system hierarchically recognizes composite actions and interactions occurring in a sequence of image frames. The concept of hallucinations and a probabilistic semantic-level recognition algorithm is introduced to cope with imperfect lower-layers. As a result, the system recognizes human activities including ‘fighting’ and ‘assault’, which are high-level activities that previous systems had difficulties. The experimental results show that our system reliably recognizes sequences of complex human activities with a high recognition rate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Allen, J. F., & Ferguson, G. (1994). Actions and events in interval temporal logic. Journal of Logic and Computation, 4(5), 531–579.

    Article  MATH  MathSciNet  Google Scholar 

  • Bobick, A. F., & Wilson, A. D. (1997). A state-based approach to the representation and recognition of gesture. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(12), 1325–1337.

    Article  Google Scholar 

  • Chomsky, N. (1956). Three models for the description of language. IEEE Transactions on Information Theory, 2(3), 113–124.

    Article  Google Scholar 

  • Fine, S., Singer, Y., & Tishby, N. (1998). The hierarchical hidden Markov model: Analysis and applications. Machine Learning, 32(1), 41–62.

    Article  MATH  Google Scholar 

  • Hongeng, S., Nevatia, R., & Bremond, F. (2004). Video-based event recognition: Activity representation and probabilistic recognition methods. Computer Vision and Image Understanding: CVIU, 96(2), 129–162.

    Article  Google Scholar 

  • Ivanov, Y. A., & Bobick, A. F. (2000). Recognition of visual activities and interactions by stochastic parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 852–872.

    Article  Google Scholar 

  • Joo, S.-W., & Chellappa, R. (2006). Attribute grammar-based event recognition and anomaly detection. In CVPRW ’06: Proceedings of the 2006 conference on computer vision and pattern recognition workshop (p. 107).

  • Minnen, D., Essa, I. A., & Starner, T. (2003). Expectation grammars: Leveraging high-level expectations for activity recognition. In CVPR(2) (pp. 626–632). IEEE Computer Society.

  • Moore, D. J., & Essa, I. A. (2002). Recognizing multitasked activities from video using stochastic context-free grammar. In AAAI/IAAI (pp. 770–776).

  • Natarajan, P., & Nevatia, R. (2007). Coupled hidden semi Markov models for activity recognition. In IEEE workshop on motion and video computing, 2007. WMVC ’07 (pp. 10–10).

  • Nevatia, R., Hobbs, J., & Bolles, B. (2004). An ontology for video event representation. In CVPRW ’04: Proceedings of the 2004 conference on computer vision and pattern recognition workshop (CVPRW’04) (Vol. 7, p. 119).

  • Nguyen, N. T., Phung, D. Q., Venkatesh, S., & Bui, H. H. (2005). Learning and detecting activities from movement trajectories using the hierarchical hidden Markov models. In CVPR(2) (pp. 955–960). IEEE Computer Society.

  • Oliver, N. M., Rosario, B., & Pentland, A. P. (2000). A Bayesian computer vision system for modeling human interactions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 831–843.

    Article  Google Scholar 

  • Park, S., & Aggarwal, J. K. (2004a). A hierarchical Bayesian network for event recognition of human actions and interactions. Multimedia Systems, 10(2), 164–179.

    Article  Google Scholar 

  • Park, S., & Aggarwal, J. K. (2004b). Semantic-level understanding of human actions and interactions using event hierarchy. In CVPRW ’04: Proceedings of the 2004 conference on computer vision and pattern recognition workshop (CVPRW’04) (Vol. 1, p. 12).

  • Park, S., & Aggarwal, J. K. (2006). Simultaneous tracking of multiple body parts of interacting persons. Computer Vision and Image Understanding, 102(1), 1–21.

    Article  Google Scholar 

  • Pinhanez, C. (1999). Representation and recognition of action in interactive spaces. MIT Media Lab, June 1999: PhD thesis.

    Google Scholar 

  • Ryoo, M. S., & Aggarwal, J. K. (2006a). Recognition of composite human activities through context-free grammar based representation. In CVPR ’06: Proceedings of the 2006 IEEE computer society conference on computer vision and pattern recognition (pp. 1709–1718).

  • Ryoo, M. S., & Aggarwal, J. K. (2006b). Semantic understanding of continued and recursive human activities. In ICPR ’06: Proceedings of the 18th international conference on pattern recognition (pp. 379–382).

  • Ryoo, M. S., & Aggarwal, J. K. (2007). Robust human-computer interaction system guiding a user by providing feedback. In M. M. Veloso (Ed.), IJCAI 2007, Proceedings of the 20th international joint conference on artificial intelligence (pp. 2850–2855).

  • Shi, Y., Huang, Y., Minnen, D., Bobick, A. F., & Essa, I. A. (2004). Propagation networks for recognition of partially ordered sequential action. In CVPR(2) (pp. 862–869).

  • Siskind, J. M. (2001). Grounding the lexical semantics of verbs in visual perception using force dynamics and event logic. Journal of Artificial Intelligence Research (JAIR), 15, 31–90.

    MATH  Google Scholar 

  • Starner, T., & Pentland, A. (1995). Real-time American sign language recognition from video using hidden Markov models. ISCV, 00, 265.

    Google Scholar 

  • Vu, V.-T., Brémond, F., & Thonnat, M. (2003). Automatic video interpretation: A novel algorithm for temporal scenario recognition. In G. Gottlob & T. Walsh (Eds.), IJCAI-03, Proceedings of the eighteenth international joint conference on artificial intelligence (pp. 1295–1302). San Mateo: Morgan Kaufmann.

    Google Scholar 

  • Yamato, J., Ohya, J., & Ishii, K. (1992). Recognizing human action in time-sequential images using hidden Markov model. In CVPR (pp. 379–385).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. S. Ryoo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ryoo, M.S., Aggarwal, J.K. Semantic Representation and Recognition of Continued and Recursive Human Activities. Int J Comput Vis 82, 1–24 (2009). https://doi.org/10.1007/s11263-008-0181-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-008-0181-1

Keywords

Navigation