Skip to main content
Log in

A hierarchical Bayesian network for event recognition of human actions and interactions

  • Sp.lss. on Video Surveillance
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract.

Recognizing human interactions is a challenging task due to the multiple body parts of interacting persons and the concomitant occlusions. This paper presents a method for the recognition of two-person interactions using a hierarchical Bayesian network (BN). The poses of simultaneously tracked body parts are estimated at the low level of the BN, and the overall body pose is estimated at the high level of the BN. The evolution of the poses of the multiple body parts are processed by a dynamic Bayesian network (DBN). The recognition of two-person interactions is expressed in terms of semantic verbal descriptions at multiple levels: individual body-part motions at low level, single-person actions at middle level, and two-person interactions at high level. Example sequences of interacting persons illustrate the success of the proposed framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aggarwal JK, Cai Q (1999) Human motion analysis: a review. Comput Vis Image Understand 73(3):295-304

    Google Scholar 

  2. Allen JF, Ferguson G (1994) Actions and events in interval temporal logic. J Logic Comput 4(5):531-579

    MATH  Google Scholar 

  3. Bakowski A, Jones G (1999) Video surveillance tracking using color region adjacency graphs. In: 7th international conference on image processing and its applications, 13-15 July 1999, University of Manchester, UK, pp 794-798

  4. Barron C, Kakadiaris I (2003) A convex penalty method for optical human motion tracking. In: ACM international workshop on video surveillance (IWVS), Berkeley, CA, November 2003, pp 1-10

  5. Cowell RG, Dawid AP, Lauritzen SL, Spiegelhalter DJ (1999) Probabilistic networks and expert systems. Springer, Berlin Heidelberg New York

  6. Data A, Shah M, Lobo N (2002) Person-on-person violence detection in video data. In: Proceedings of the international conference on pattern recognition, Quebec City, Canada, 1:433-438

  7. Elgammal AM, Davis L (2001) Probabilistic framework for segmenting people under occlusion. In: International conference on computer vision, Vancouver, Canada, 2:145-152

  8. Gavrila D (1999) The visual analysis of human movement: a survey. Comput Vis Image Understand 73(1):82-98

    Article  MATH  Google Scholar 

  9. Graham RL (1972) An efficient algorithm for determining the convex hull of a finite planar set. Inf Process Lett 1:132-133

    Article  MATH  Google Scholar 

  10. Haritaoglu I, Harwood D, Davis LS (2000) W4: Real-time surveillance of people and their activities. IEEE Trans Pattern Anal Mach Intell 22(8):797-808

    Article  Google Scholar 

  11. Hongeng S, Bremond F, Nevatia R (2000) Representation and optimal recognition of human activities. In: IEEE conference on computer vision and pattern recognition, 1:818-825

  12. Huang C, Darwiche A (1996) Inference in belief networks: a procedural guide. Int J Approx Reason 15(3):225-263

    Article  MATH  Google Scholar 

  13. Jensen FV, Jensen F (1994) Optimal junction trees. In: Conference on uncertainty in artificial intelligence, Seattle, July 1994

  14. Kojima A, Tamura T, Fukunaga K (2002) Natural language description of human activities from video images based on concept hierarchy of actions. Int J Comput Vis 50(2):171-184

    Article  MATH  Google Scholar 

  15. Moeslund T, Granum E (2001) A survey of computer vision-based human motion capture. Comput Vis Image Understand 81(3):231-268

    Article  MATH  Google Scholar 

  16. Oliver NM, Rosario B, Pentland AP (2000) A Bayesian computer vision system for modeling human interactions. IEEE Trans Pattern Anal Mach Intell 22(8):831-843

    Article  Google Scholar 

  17. O’Rourke J (1994) Computational geometry in C. Cambridge University Press, Cambridge, UK, pp 70-112

  18. Park S, Aggarwal JK (2000) Recognition of human interaction using multiple features in grayscale images. In: Proceedings of the internaitonal conference on pattern recognition, Barcelona, Spain, September 2000, 1:51-54

  19. Park S, Aggarwal JK (2002) Segmentation and tracking of interacting human body parts under occlusion and shadowing. In: IEEE workshop on motion and video computing, Orlando, FL, pp 105-111

  20. Park S, Aggarwal JK (2003) Recognition of two-person interactions using a hierarchical Bayesian network. In: ACM international workshop on video surveillance, Berkeley, CA, pp 65-76

  21. Park S, Park J, Aggarwal JK (2003) Video retrieval of human interactions using model-based motion tracking and multi-layer finite state automata. In: Lecture notes in computer science, vol 2728. Springer, Berlin Heidelberg New York, pp 394-403

  22. Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, San Mateo, CA, pp 337-340

    Google Scholar 

  23. Rabiner L (1989) A tutorial on hidden markov models and selected applications in speech recognition. Proc IEEE 77(2):257-286

    Article  Google Scholar 

  24. Rosales R, Sclaroff S (2000) Inferring body pose without tracking body parts. In: Computer vision and pattern recognition, Hilton Head Island, SC, pp 721-727

  25. Sato K, Aggarwal JK (2001) Recognizing two-person interactions in outdoor image sequences. In: IEEE workshop on multi-object tracking, Vancouver, CA

  26. Sherrah J, Gong S (2000) Resolving visual uncertainty and occlusion through probabilistic reasoning. In: British machine vision conference, Bristol, UK, pp 252-261

  27. Sherrah J, Gong S (2000) Tracking discontinuous motion using bayesian inference. In: 6th European conference on computer vision, pp 150-166

  28. Siebel N, Maybank S (2001) Real-time tracking of pedestrians and vehicles. In: IEEE workshop on PETS, Kauai, HI

  29. Wada T, Matsuyama T (2000) Multiobject behavior recognition by event driven selective attention method. IEEE Trans Pattern Anal Mach Intell 22(8):873-887

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sangho Park.

Additional information

Published online: 25 October 2004

Rights and permissions

Reprints and permissions

About this article

Cite this article

Park, S., Aggarwal, J.K. A hierarchical Bayesian network for event recognition of human actions and interactions. Multimedia Systems 10, 164–179 (2004). https://doi.org/10.1007/s00530-004-0148-1

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-004-0148-1

Keywords:

Navigation