Paper
17 January 2005 A hierarchical framework for understanding human-human interactions in video surveillance
Author Affiliations +
Abstract
Understanding human behavior in video is essential in numerous applications including smart surveillance, video annotation/retrieval, and human-computer interaction. However, recognizing human interactions is a challenging task due to ambiguity in body articulation, variations in body size and appearance, loose clothing, mutual occlusion, and shadows. In this paper we present a framework for recognizing human actions and interactions in color video, and a hierarchical graphical model that unifies multiple-level processing in video computing: pixel level, blob level, object level, and event level. A mixture of Gaussian (MOG) model is used at the pixel level to train and classify individual pixel colors. A relaxation labeling with attribute relational graph (ARG) is used at the blob level to merge the pixels into coherent blobs and to register inter-blob relations. At the object level, the poses of individual body parts are recognized using Bayesian networks (BNs). At the event level, the actions of a single person are modeled using a dynamic Bayesian network (DBN). The results of the object-level descriptions for each person are juxtaposed along a common timeline to identify an interaction between two persons. The linguistic 'verb argument structure' is used to represent human action in terms of triplets. A meaningful semantic description in terms of is obtained. Our system achieves semantic descriptions of positive, neutral, and negative interactions between two persons including hand-shaking, standing hand-in-hand, and hugging as the positive interactions, approaching, departing, and pointing as the neutral interactions, and pushing, punching, and kicking as the negative interactions.
© (2005) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Sangho Park and J. K. Aggarwal "A hierarchical framework for understanding human-human interactions in video surveillance", Proc. SPIE 5682, Storage and Retrieval Methods and Applications for Multimedia 2005, (17 January 2005); https://doi.org/10.1117/12.587211
Lens.org Logo
CITATIONS
Cited by 2 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Video surveillance

Video

Head

Video processing

Abdomen

Analytical research

Chest

Back to Top