Video Retrieval of Human Interactions Using Model-Based Motion Tracking and Multi-layer Finite State Automata

Park, Sangho; Park, Jihun; Aggarwal, Jake K.

doi:10.1007/3-540-45113-7_39

Sangho Park⁸,
Jihun Park⁹ &
Jake K. Aggarwal⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2728))

Included in the following conference series:

International Conference on Image and Video Retrieval

1223 Accesses
5 Citations

Abstract

Recognition of human interactions in a video is useful for video annotation, automated surveillance, and content-based video retrieval. This paper presents a model-based approach to motion tracking and recognition of human interactions using multi-layer finite state automata (FA). The system is used for widely-available, static-background monocular surveillance videos. A three-dimensional human body model is built using a sphere and cylinders and is projected on a two-dimensional image plane to fit the foreground image silhouette. We convert the human motion tracking problem into a parameter optimization problem without the need to compute inverse kinematics. A cost functional is used to estimate the degree of the overlap between the foreground input image silhouette and a projected three-dimensional body model silhouette. Motion data obtained from the tracker is analyzed in terms of feet, torso, and hands by a behavior recognition system. The recognition model represents human behavior as a sequence of states that register the configuration of individual body parts in space and time. In order to overcome the exponential growth of the number of states that usually occurs in single-level FA, we propose a multi-layer FA that abstracts states and events from motion data at multiple levels: low-level FA analyzes body parts only, and high-level FA analyzes the human interaction. Motion tracking results from video sequences are presented. Our recognition framework successfully recognizes various human interactions such as approaching, departing, pushing, pointing, and handshaking.

This work was partially supported by grant No. 2000-2-30400-011-1 from the Korea Science and Engineering Foundation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kim, K., Choi, J., Kim, N., Kim, P.: Extracting semantic information from basketball video based on audio-visual features. Lecture Notes in Computer Science 2383 (2002) 268–277
Google Scholar
Chang, Y., Zeng, W., Camel, I., Aonso, R.: Integrated image and speech analysis for content-based video indexing. In: IEEE proc. Int’l Conference on Multimedia Computing and Systems. (1996) 306–313
Google Scholar
Denman, H., Rea, N., Kokaram, A.: Content based analysis for video from snooker broadcasts. In: Int’l Conference on Image and Video Retrieval, Lecture Notes in Computer Science. Volume 2383., Springer (2002) 186–193
Google Scholar
Aggarwal, J., Cai, Q.: Human motion analysis: a review. Computer Vision and Image Understanding 73(3) (1999) 295–304
Article Google Scholar
Morris, D., Rehg, J.: Singularity analysis for articulated object tracking. In: Computer Vision and Pattern Recognition, Santa Barbara, California (1998) 289–296
Google Scholar
Huang, Y., Huang, T. S.: Model-based human body tracking. In: International Conference on Pattern Recognition. Volume 1., Quebec city, Canada (2002) 552–555
Google Scholar
Sidenbladh, H., Black, M. J., Fleet, D. J.: Stochastic tracking of 3d human figures using 2d image motion. In: ECCV (2). (2000) 702–718
Google Scholar
Oliver, N.M., Rosario, B., Pentland, A.P.: A Bayesian Computer Vision System for Modeling Human Interactions. IEEE Trans. Pattern Analysis and Machine Intelligence 22 (2000) 831–843
Article Google Scholar
Hongeng, S., Bremond, F., Nevatia, R.: Representation and optimal recognition of human activities. In: IEEE Conf. on Computer Vision and Pattern Recognition. Volume 1. (2000) 818–825
Google Scholar
Park, S., Aggarwal, J.: Recognition of human interaction using multiple features in grayscale images. In: Int’l Conference on Pattern Recognition. Volume 1., Barcelona, Spain (2000) 51–54
MATH Google Scholar
Hong, P., Turk, M., Huang, T. S.: Gesture modeling and recognition using finite state machines. In: IEEE Conf. on Face and Gesture Recognition. (2000)
Google Scholar
Wada, T., Matsuyama, T.: Multiobject behavior recognition by event driven selective attention method. IEEE transaction on Pattern Analysis and Machine Intelligence 22 (2000) 873–887
Article Google Scholar
Park, S., Aggarwal, J.: Segmentation and tracking of interacting human body parts under occlusion and shadowing. In: IEEE Workshop on Motion and Video Computing, Orlando, FL (2002) 105–111
Google Scholar
Gavrila, D.M., Philomin, V.: Real-time object detection using distance transforms. In: Proc. IEEE International Conference on Intelligent Vehicles, Stuttgart, Germany (1998) 274–279
Google Scholar
Hill, F.: Computer Graphics. Macmillan (1990)
Google Scholar
Lasdon, L., Waren, A.: GRG2 User’s Guide. (1989)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX, 78712
Sangho Park & Jake K. Aggarwal
Department of Computer Engineering, Hongik University, Seoul, Korea
Jihun Park

Authors

Sangho Park
View author publications
You can also search for this author in PubMed Google Scholar
Jihun Park
View author publications
You can also search for this author in PubMed Google Scholar
Jake K. Aggarwal
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

LIACS Media Lab, Leiden University, Niels Bohrweg 1, 2333 CA, Leiden, The Netherlands
Erwin M. Bakker & Michael S. Lew &
Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, 405 N. Mathews Avenue, Urbana, IL, 61801, USA
Thomas S. Huang
University of Amsterdam, Kruislaan 403, 1098 SJ, Amsterdam, The Netherlands
Nicu Sebe
Siemens Corporate Research, 755 College Road East, Princeton, NJ, 08540, USA
Xiang Sean Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Park, S., Park, J., Aggarwal, J.K. (2003). Video Retrieval of Human Interactions Using Model-Based Motion Tracking and Multi-layer Finite State Automata. In: Bakker, E.M., Lew, M.S., Huang, T.S., Sebe, N., Zhou, X.S. (eds) Image and Video Retrieval. CIVR 2003. Lecture Notes in Computer Science, vol 2728. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45113-7_39

Download citation

DOI: https://doi.org/10.1007/3-540-45113-7_39
Published: 24 June 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40634-1
Online ISBN: 978-3-540-45113-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics