Abstract.
Recognizing human interactions is a challenging task due to the multiple body parts of interacting persons and the concomitant occlusions. This paper presents a method for the recognition of two-person interactions using a hierarchical Bayesian network (BN). The poses of simultaneously tracked body parts are estimated at the low level of the BN, and the overall body pose is estimated at the high level of the BN. The evolution of the poses of the multiple body parts are processed by a dynamic Bayesian network (DBN). The recognition of two-person interactions is expressed in terms of semantic verbal descriptions at multiple levels: individual body-part motions at low level, single-person actions at middle level, and two-person interactions at high level. Example sequences of interacting persons illustrate the success of the proposed framework.
Similar content being viewed by others
References
Aggarwal JK, Cai Q (1999) Human motion analysis: a review. Comput Vis Image Understand 73(3):295-304
Allen JF, Ferguson G (1994) Actions and events in interval temporal logic. J Logic Comput 4(5):531-579
Bakowski A, Jones G (1999) Video surveillance tracking using color region adjacency graphs. In: 7th international conference on image processing and its applications, 13-15 July 1999, University of Manchester, UK, pp 794-798
Barron C, Kakadiaris I (2003) A convex penalty method for optical human motion tracking. In: ACM international workshop on video surveillance (IWVS), Berkeley, CA, November 2003, pp 1-10
Cowell RG, Dawid AP, Lauritzen SL, Spiegelhalter DJ (1999) Probabilistic networks and expert systems. Springer, Berlin Heidelberg New York
Data A, Shah M, Lobo N (2002) Person-on-person violence detection in video data. In: Proceedings of the international conference on pattern recognition, Quebec City, Canada, 1:433-438
Elgammal AM, Davis L (2001) Probabilistic framework for segmenting people under occlusion. In: International conference on computer vision, Vancouver, Canada, 2:145-152
Gavrila D (1999) The visual analysis of human movement: a survey. Comput Vis Image Understand 73(1):82-98
Graham RL (1972) An efficient algorithm for determining the convex hull of a finite planar set. Inf Process Lett 1:132-133
Haritaoglu I, Harwood D, Davis LS (2000) W4: Real-time surveillance of people and their activities. IEEE Trans Pattern Anal Mach Intell 22(8):797-808
Hongeng S, Bremond F, Nevatia R (2000) Representation and optimal recognition of human activities. In: IEEE conference on computer vision and pattern recognition, 1:818-825
Huang C, Darwiche A (1996) Inference in belief networks: a procedural guide. Int J Approx Reason 15(3):225-263
Jensen FV, Jensen F (1994) Optimal junction trees. In: Conference on uncertainty in artificial intelligence, Seattle, July 1994
Kojima A, Tamura T, Fukunaga K (2002) Natural language description of human activities from video images based on concept hierarchy of actions. Int J Comput Vis 50(2):171-184
Moeslund T, Granum E (2001) A survey of computer vision-based human motion capture. Comput Vis Image Understand 81(3):231-268
Oliver NM, Rosario B, Pentland AP (2000) A Bayesian computer vision system for modeling human interactions. IEEE Trans Pattern Anal Mach Intell 22(8):831-843
O’Rourke J (1994) Computational geometry in C. Cambridge University Press, Cambridge, UK, pp 70-112
Park S, Aggarwal JK (2000) Recognition of human interaction using multiple features in grayscale images. In: Proceedings of the internaitonal conference on pattern recognition, Barcelona, Spain, September 2000, 1:51-54
Park S, Aggarwal JK (2002) Segmentation and tracking of interacting human body parts under occlusion and shadowing. In: IEEE workshop on motion and video computing, Orlando, FL, pp 105-111
Park S, Aggarwal JK (2003) Recognition of two-person interactions using a hierarchical Bayesian network. In: ACM international workshop on video surveillance, Berkeley, CA, pp 65-76
Park S, Park J, Aggarwal JK (2003) Video retrieval of human interactions using model-based motion tracking and multi-layer finite state automata. In: Lecture notes in computer science, vol 2728. Springer, Berlin Heidelberg New York, pp 394-403
Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, San Mateo, CA, pp 337-340
Rabiner L (1989) A tutorial on hidden markov models and selected applications in speech recognition. Proc IEEE 77(2):257-286
Rosales R, Sclaroff S (2000) Inferring body pose without tracking body parts. In: Computer vision and pattern recognition, Hilton Head Island, SC, pp 721-727
Sato K, Aggarwal JK (2001) Recognizing two-person interactions in outdoor image sequences. In: IEEE workshop on multi-object tracking, Vancouver, CA
Sherrah J, Gong S (2000) Resolving visual uncertainty and occlusion through probabilistic reasoning. In: British machine vision conference, Bristol, UK, pp 252-261
Sherrah J, Gong S (2000) Tracking discontinuous motion using bayesian inference. In: 6th European conference on computer vision, pp 150-166
Siebel N, Maybank S (2001) Real-time tracking of pedestrians and vehicles. In: IEEE workshop on PETS, Kauai, HI
Wada T, Matsuyama T (2000) Multiobject behavior recognition by event driven selective attention method. IEEE Trans Pattern Anal Mach Intell 22(8):873-887
Author information
Authors and Affiliations
Corresponding author
Additional information
Published online: 25 October 2004
Rights and permissions
About this article
Cite this article
Park, S., Aggarwal, J.K. A hierarchical Bayesian network for event recognition of human actions and interactions. Multimedia Systems 10, 164–179 (2004). https://doi.org/10.1007/s00530-004-0148-1
Issue Date:
DOI: https://doi.org/10.1007/s00530-004-0148-1