Skip to main content

Video Retrieval of Human Interactions Using Model-Based Motion Tracking and Multi-layer Finite State Automata

  • Conference paper
  • First Online:
Image and Video Retrieval (CIVR 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2728))

Included in the following conference series:

Abstract

Recognition of human interactions in a video is useful for video annotation, automated surveillance, and content-based video retrieval. This paper presents a model-based approach to motion tracking and recognition of human interactions using multi-layer finite state automata (FA). The system is used for widely-available, static-background monocular surveillance videos. A three-dimensional human body model is built using a sphere and cylinders and is projected on a two-dimensional image plane to fit the foreground image silhouette. We convert the human motion tracking problem into a parameter optimization problem without the need to compute inverse kinematics. A cost functional is used to estimate the degree of the overlap between the foreground input image silhouette and a projected three-dimensional body model silhouette. Motion data obtained from the tracker is analyzed in terms of feet, torso, and hands by a behavior recognition system. The recognition model represents human behavior as a sequence of states that register the configuration of individual body parts in space and time. In order to overcome the exponential growth of the number of states that usually occurs in single-level FA, we propose a multi-layer FA that abstracts states and events from motion data at multiple levels: low-level FA analyzes body parts only, and high-level FA analyzes the human interaction. Motion tracking results from video sequences are presented. Our recognition framework successfully recognizes various human interactions such as approaching, departing, pushing, pointing, and handshaking.

This work was partially supported by grant No. 2000-2-30400-011-1 from the Korea Science and Engineering Foundation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kim, K., Choi, J., Kim, N., Kim, P.: Extracting semantic information from basketball video based on audio-visual features. Lecture Notes in Computer Science 2383 (2002) 268–277

    Google Scholar 

  2. Chang, Y., Zeng, W., Camel, I., Aonso, R.: Integrated image and speech analysis for content-based video indexing. In: IEEE proc. Int’l Conference on Multimedia Computing and Systems. (1996) 306–313

    Google Scholar 

  3. Denman, H., Rea, N., Kokaram, A.: Content based analysis for video from snooker broadcasts. In: Int’l Conference on Image and Video Retrieval, Lecture Notes in Computer Science. Volume 2383., Springer (2002) 186–193

    Google Scholar 

  4. Aggarwal, J., Cai, Q.: Human motion analysis: a review. Computer Vision and Image Understanding 73(3) (1999) 295–304

    Article  Google Scholar 

  5. Morris, D., Rehg, J.: Singularity analysis for articulated object tracking. In: Computer Vision and Pattern Recognition, Santa Barbara, California (1998) 289–296

    Google Scholar 

  6. Huang, Y., Huang, T. S.: Model-based human body tracking. In: International Conference on Pattern Recognition. Volume 1., Quebec city, Canada (2002) 552–555

    Google Scholar 

  7. Sidenbladh, H., Black, M. J., Fleet, D. J.: Stochastic tracking of 3d human figures using 2d image motion. In: ECCV (2). (2000) 702–718

    Google Scholar 

  8. Oliver, N.M., Rosario, B., Pentland, A.P.: A Bayesian Computer Vision System for Modeling Human Interactions. IEEE Trans. Pattern Analysis and Machine Intelligence 22 (2000) 831–843

    Article  Google Scholar 

  9. Hongeng, S., Bremond, F., Nevatia, R.: Representation and optimal recognition of human activities. In: IEEE Conf. on Computer Vision and Pattern Recognition. Volume 1. (2000) 818–825

    Google Scholar 

  10. Park, S., Aggarwal, J.: Recognition of human interaction using multiple features in grayscale images. In: Int’l Conference on Pattern Recognition. Volume 1., Barcelona, Spain (2000) 51–54

    MATH  Google Scholar 

  11. Hong, P., Turk, M., Huang, T. S.: Gesture modeling and recognition using finite state machines. In: IEEE Conf. on Face and Gesture Recognition. (2000)

    Google Scholar 

  12. Wada, T., Matsuyama, T.: Multiobject behavior recognition by event driven selective attention method. IEEE transaction on Pattern Analysis and Machine Intelligence 22 (2000) 873–887

    Article  Google Scholar 

  13. Park, S., Aggarwal, J.: Segmentation and tracking of interacting human body parts under occlusion and shadowing. In: IEEE Workshop on Motion and Video Computing, Orlando, FL (2002) 105–111

    Google Scholar 

  14. Gavrila, D.M., Philomin, V.: Real-time object detection using distance transforms. In: Proc. IEEE International Conference on Intelligent Vehicles, Stuttgart, Germany (1998) 274–279

    Google Scholar 

  15. Hill, F.: Computer Graphics. Macmillan (1990)

    Google Scholar 

  16. Lasdon, L., Waren, A.: GRG2 User’s Guide. (1989)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Park, S., Park, J., Aggarwal, J.K. (2003). Video Retrieval of Human Interactions Using Model-Based Motion Tracking and Multi-layer Finite State Automata. In: Bakker, E.M., Lew, M.S., Huang, T.S., Sebe, N., Zhou, X.S. (eds) Image and Video Retrieval. CIVR 2003. Lecture Notes in Computer Science, vol 2728. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45113-7_39

Download citation

  • DOI: https://doi.org/10.1007/3-540-45113-7_39

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40634-1

  • Online ISBN: 978-3-540-45113-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics