Abstract
Recognition of human interactions in a video is useful for video annotation, automated surveillance, and content-based video retrieval. This paper presents a model-based approach to motion tracking and recognition of human interactions using multi-layer finite state automata (FA). The system is used for widely-available, static-background monocular surveillance videos. A three-dimensional human body model is built using a sphere and cylinders and is projected on a two-dimensional image plane to fit the foreground image silhouette. We convert the human motion tracking problem into a parameter optimization problem without the need to compute inverse kinematics. A cost functional is used to estimate the degree of the overlap between the foreground input image silhouette and a projected three-dimensional body model silhouette. Motion data obtained from the tracker is analyzed in terms of feet, torso, and hands by a behavior recognition system. The recognition model represents human behavior as a sequence of states that register the configuration of individual body parts in space and time. In order to overcome the exponential growth of the number of states that usually occurs in single-level FA, we propose a multi-layer FA that abstracts states and events from motion data at multiple levels: low-level FA analyzes body parts only, and high-level FA analyzes the human interaction. Motion tracking results from video sequences are presented. Our recognition framework successfully recognizes various human interactions such as approaching, departing, pushing, pointing, and handshaking.
This work was partially supported by grant No. 2000-2-30400-011-1 from the Korea Science and Engineering Foundation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Kim, K., Choi, J., Kim, N., Kim, P.: Extracting semantic information from basketball video based on audio-visual features. Lecture Notes in Computer Science 2383 (2002) 268–277
Chang, Y., Zeng, W., Camel, I., Aonso, R.: Integrated image and speech analysis for content-based video indexing. In: IEEE proc. Int’l Conference on Multimedia Computing and Systems. (1996) 306–313
Denman, H., Rea, N., Kokaram, A.: Content based analysis for video from snooker broadcasts. In: Int’l Conference on Image and Video Retrieval, Lecture Notes in Computer Science. Volume 2383., Springer (2002) 186–193
Aggarwal, J., Cai, Q.: Human motion analysis: a review. Computer Vision and Image Understanding 73(3) (1999) 295–304
Morris, D., Rehg, J.: Singularity analysis for articulated object tracking. In: Computer Vision and Pattern Recognition, Santa Barbara, California (1998) 289–296
Huang, Y., Huang, T. S.: Model-based human body tracking. In: International Conference on Pattern Recognition. Volume 1., Quebec city, Canada (2002) 552–555
Sidenbladh, H., Black, M. J., Fleet, D. J.: Stochastic tracking of 3d human figures using 2d image motion. In: ECCV (2). (2000) 702–718
Oliver, N.M., Rosario, B., Pentland, A.P.: A Bayesian Computer Vision System for Modeling Human Interactions. IEEE Trans. Pattern Analysis and Machine Intelligence 22 (2000) 831–843
Hongeng, S., Bremond, F., Nevatia, R.: Representation and optimal recognition of human activities. In: IEEE Conf. on Computer Vision and Pattern Recognition. Volume 1. (2000) 818–825
Park, S., Aggarwal, J.: Recognition of human interaction using multiple features in grayscale images. In: Int’l Conference on Pattern Recognition. Volume 1., Barcelona, Spain (2000) 51–54
Hong, P., Turk, M., Huang, T. S.: Gesture modeling and recognition using finite state machines. In: IEEE Conf. on Face and Gesture Recognition. (2000)
Wada, T., Matsuyama, T.: Multiobject behavior recognition by event driven selective attention method. IEEE transaction on Pattern Analysis and Machine Intelligence 22 (2000) 873–887
Park, S., Aggarwal, J.: Segmentation and tracking of interacting human body parts under occlusion and shadowing. In: IEEE Workshop on Motion and Video Computing, Orlando, FL (2002) 105–111
Gavrila, D.M., Philomin, V.: Real-time object detection using distance transforms. In: Proc. IEEE International Conference on Intelligent Vehicles, Stuttgart, Germany (1998) 274–279
Hill, F.: Computer Graphics. Macmillan (1990)
Lasdon, L., Waren, A.: GRG2 User’s Guide. (1989)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Park, S., Park, J., Aggarwal, J.K. (2003). Video Retrieval of Human Interactions Using Model-Based Motion Tracking and Multi-layer Finite State Automata. In: Bakker, E.M., Lew, M.S., Huang, T.S., Sebe, N., Zhou, X.S. (eds) Image and Video Retrieval. CIVR 2003. Lecture Notes in Computer Science, vol 2728. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45113-7_39
Download citation
DOI: https://doi.org/10.1007/3-540-45113-7_39
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40634-1
Online ISBN: 978-3-540-45113-6
eBook Packages: Springer Book Archive