A Self-Training Approach for Visual Tracking and Recognition of Complex Human Activity Patterns

Bandouch, Jan; Jenkins, Odest Chadwicke; Beetz, Michael

doi:10.1007/s11263-012-0522-y

A Self-Training Approach for Visual Tracking and Recognition of Complex Human Activity Patterns

Published: 23 March 2012

Volume 99, pages 166–189, (2012)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Jan Bandouch¹,
Odest Chadwicke Jenkins² &
Michael Beetz¹

1415 Accesses
18 Citations
6 Altmetric
Explore all metrics

Abstract

Automatically observing and understanding human activities is one of the big challenges in computer vision research. Among the potential fields of application are areas such as robotics, human computer interaction or medical research. In this article we present our work on unintrusive observation and interpretation of human activities for the precise recognition of human fullbody motions. The presented system requires no more than three cameras and is capable of tracking a large spectrum of motions in a wide variety of scenarios. This includes scenarios where the subject is partially occluded, where it manipulates objects as part of its activities, or where it interacts with the environment or other humans. Our system is self-training, i.e. it is capable of learning models of human motion over time. These are used both to improve the prediction of human dynamics and to provide the basis for the recognition and interpretation of observed activities. The accuracy and robustness obtained by our system is the combined result of several contributions. By taking an anthropometric human model and optimizing it towards use in a probabilistic tracking framework we obtain a detailed biomechanical representation of human shape, posture and motion. Furthermore, we introduce a sophisticated hierarchical sampling strategy for tracking that is embedded in a probabilistic framework and outperforms state-of-the-art Bayesian methods. We then show how to track complex manipulation activities in everyday environments using a combination of learned human appearance models and implicit environment models. Finally, we discuss a locally consistent representation of human motion that we use as a basis for learning environment- and task-specific motion models. All methods presented in this article have been subject to extensive experimental evaluation on today’s benchmarks and several challenging sequences ranging from athletic exercises to ergonomic case studies to everyday manipulation tasks in a kitchen environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Simple and Robust Automatic Detection and Recognition of Human Movement Patterns in Tasks of Different Complexity

A Monitoring System for Home-Based Physiotherapy Exercises

A review of computer vision-based approaches for physical rehabilitation and assessment

Article Open access 19 June 2021

References

Agarwal, A., & Triggs, B. (2006). Recovering 3D human pose from monocular images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(1), 44–58.
Article Google Scholar
Anguelov, D., Koller, D., Pang, H.-C., Srinivasan, P., & Thrun, S. (2004). Recovering articulated object models from 3d range data. In 20th Conference on uncertainty in artificial intelligence (AUAI).
Google Scholar
Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., & Davis, J. (2005). Scape: shape completion and animation of people. ACM Transactions on Graphics, 24(3), 408–416.
Article Google Scholar
Arulampalam, S., Maskell, S., Gordon, N., & Clapp, T. (2002). A tutorial on particle filters for on-line non-linear/non-Gaussian Bayesian tracking. IEEE Transactions on Signal Processing, 50(2), 174–188.
Article Google Scholar
Balan, A. O., & Black, M. J. (2008). The naked truth: Estimating body shape under clothing. In European conference on computer vision (ECCV).
Google Scholar
Bandouch, J., Engstler, F., & Beetz, M. (2008). Evaluation of hierarchical sampling strategies in 3d human pose estimation. In 19th British machine vision conference (BMVC).
Google Scholar
Beetz, M., Stulp, F., Radig, B., Bandouch, J., Blodow, N., Dolha, M., Fedrizzi, A., Jain, D., Klank, U., Kresse, I., Maldonado, A., Marton, Z., Mösenlechner, L., Ruiz, F., Rusu, R. B., & Tenorth, M. (2008). The assistive kitchen—a demonstration scenario for cognitive technical systems. In IEEE 17th international symposium on robot and human interactive communication (RO-MAN). Invited paper.
Google Scholar
Bo, L., Sminchisescu, C., Kanaujia, A., & Metaxas, D. (2008). Fast algorithms for large scale conditional 3d prediction. In Computer vision and pattern recognition (2008).
Google Scholar
Bobick, A. F., & Davis, J. W. (2001). The recognition of human movement using temporal templates. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(3), 257–267.
Article Google Scholar
Bray, M., Koller-Meier, E., & Gool, L. V. (2007). Smart particle filtering for high-dimensional tracking. Computer Vision and Image Understanding, 106(1), 116–129.
Article Google Scholar
Bregler, C., Malik, J., & Pullen, K. (2004). Twist based acquisition and tracking of animal and human kinematics. International Journal of Computer Vision, 56(3), 179–194.
Article Google Scholar
Bubb, H. (1997). RAMSIS—a measuring and CAD-tool, serving as a standard for ergonomic assessments of workplaces, cars and other products. In 13th Triennial congress of the international ergonomics association.
Google Scholar
Cham, T.-J., & Rehg, J. M. (1999). A multiple hypothesis approach to figure tracking. In Computer vision and pattern recognition (CVPR).
Google Scholar
Cheung, K. M., Baker, S., & Kanade, T. (2003). Shape-from-silhouette of articulated objects and its use for human body kinematics estimation and motion capture. In Conference on computer vision and pattern recognition (CVPR).
Google Scholar
Datta, A., Sheikh, Y. A., & Kanade, T. (2009). Modeling the product manifold of posture and motion. In IEEE int. workshop on tracking humans for the evaluation of their motion in image sequences (THEMIS). In conjunction with ICCV2009.
Google Scholar
Deutscher, J., & Reid, I. (2005). Articulated body motion capture by stochastic search. International Journal of Computer Vision, 61(2), 185–205.
Article Google Scholar
Efros, A. A., Berg, A. C., Mori, G., & Malik, J. (2003). Recognizing action at a distance. In International conference on computer vision (ICCV).
Google Scholar
Engstler, F., Bandouch, J., & Bubb, H. (2009). Memoman—model based markerless capturing of human motion. In 17th World congress on ergonomics (international ergonomics association, IEA).
Google Scholar
Felzenszwalb, P. F., & Huttenlocher, D. P. (2005). Pictorial structures for object recognition. International Journal of Computer Vision, 61(1), 55–79.
Article Google Scholar
Gall, J., Rosenhahn, B., Brox, T., & Seidel, H.-P. (2010). Optimization and filtering for human motion capture. International Journal of Computer Vision, 87, 75–92.
Article Google Scholar
Gavrila, D. M. (1999). The visual analysis of human movement: a survey. Computer Vision and Image Understanding, 73(1), 82–98.
Article MATH Google Scholar
Gorelick, L., Blank, M., Shechtman, E., Irani, M., & Basri, R. (2007). Actions as space-time shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(12), 2247–2253.
Article Google Scholar
Grauman, K., Shakhnarovich, G., & Darrell, T. (2003). Inferring 3d structure with a statistical image-based shape model. In International conference on computer vision (ICCV).
Google Scholar
Grest, D., & Krüger, V. (2007). Gradient-enhanced particle filter for vision-based motion capture. In A. M. Elgammal, B. Rosenhahn, & R. Klette (Eds.), Workshop on human motion. Lecture notes in computer science (Vol. 4814, pp. 28–41). Berlin: Springer.
Google Scholar
Herda, L., Urtasun, R., & Fua, P. (2004). Hierarchical implicit surface joint limits to constrain video-based motion capture. In European conference on computer vision (ECCV).
Google Scholar
Horaud, R. P., Niskanen, M., Dewaele, G., & Boyer, E. (2008). Human motion tracking by registering an articulated surface to 3-d points and normals. IEEE Transactions on Pattern Analysis and Machine Intelligence. doi:10.1109/TPAMI.2008.108.
Ivekovič, V., Trucco, E., & Petillot, Y. R. (2008). Human body pose estimation with particle swarm optimisation. Evolutionary Computation, 16(4).
Jenkins, O. C., & Matarić, M. J. (2004). A spatio-temporal extension to isomap nonlinear dimension reduction. In International conference on machine learning (ICML).
Google Scholar
Ju, S. X., Black, M. J., & Yacoob, Y. (1996). Cardboard people: a parameterized model of articulated motion. In International conference on automatic face and gesture recognition.
Google Scholar
Kehl, R., & Gool, L. V. (2006). Markerless tracking of complex human motions from multiple views. Computer Vision and Image Understanding, 104(2), 190–209.
Article Google Scholar
Kirkpatrick, S., Gelatt, C. D. Jr., & Vecchi, M. P. (1983). Optimization by simulated annealing. Science, 220, 671–680.
Article MathSciNet MATH Google Scholar
Knoop, S., Vacek, S., & Dillmann, R. (2009). Fusion of 2d and 3d sensor data for articulated body tracking. Robotics and Autonomous Systems, 57(3), 321–329.
Article Google Scholar
Knossow, D., Ronfard, R., & Horaud, R. P. (2008). Human motion tracking with a kinematic parameterization of extremal contours. International Journal of Computer Vision. doi:10.1007/s11263-007-0116-2.
Kovar, L., Gleicher, M., & Pighin, F. (2002). Motion Graphs. In 29th annual conference on computer graphics and interactive techniques (SIGGRAPH).
Google Scholar
Krüger, V., Kragic, D., Ude, A., & Geib, C. (2007). The meaning of action: a review on action recognition and mapping. Advanced Robotics, 21(13), 1473–1501.
Google Scholar
Kulić, D., Takano, W., & Nakamura, Y. (2008). Incremental learning, clustering and hierarchy formation of whole body motion patterns using adaptive hidden Markov chains. The International Journal of Robotics Research, 27(7), 761–784.
Article Google Scholar
MacCormick, J., & Blake, A. (2000). A probabilistic exclusion principle for tracking multiple objects. International Journal of Computer Vision, 39(1), 57–71.
Article MATH Google Scholar
MacCormick, J., & Isard, M. (2000). Partitioned sampling, articulated objects, and interface-quality hand tracking. In European conference on computer vision (ECCV).
Google Scholar
Mikic, I., Trivedi, M., Hunter, E., & Cosman, P. (2001). Articulated body posture estimation from multi-camera voxel data. In Computer vision and pattern recognition (CVPR).
Google Scholar
Mitchelson, J., & Hilton, A. (2003). Simultaneous pose estimation of multiple people using multiple-view cues with hierarchical sampling. In British machine vision conference (BMVC).
Google Scholar
Moeslund, T. B., Hilton, A., & Krüger, V. (2006). A survey of advances in vision-based human motion capture and analysis. Computer Vision and Image Understanding, 104(2), 90–126.
Article Google Scholar
Oikonomopoulos, A., Ioannis, P., & Pantic, M. (2006). Spatio-temporal salient points for visual recognition of human actions. IEEE Transactions on Systems, Man and Cybernetics Part B Cybernetics, 36(3), 710–719.
Article Google Scholar
Pellegrini, S., Schindler, K., & Nardi, D. (2008). A generalization of the ICP algorithm for articulated bodies. In British machine vision conference (BMVC).
Google Scholar
Plänkers, R., & Fua, P. (2001). Tracking and modeling people in video sequences. Computer Vision and Image Understanding, 81(3), 285–302.
Article MATH Google Scholar
Poppe, R. (2007). Vision-based human motion analysis: An overview. Computer Vision and Image Understanding, 108(1–2), 4–18.
Article Google Scholar
Poppe, R. (2010). A survey on vision-based human action recognition. Image and Vision Computing. doi:10.1016/j.imavis.2009.11.014.
Ramanan, D., Forsyth, D. A., & Zisserman, A. (2007). Tracking people by learning their appearance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(1), 65–81.
Article Google Scholar
Rosenhahn, B., Brox, T., Kersting, U., Smith, A., Gurney, J., & Klette, R. (2006). A system for marker-less motion capture. Künstliche Intelligenz, 20(1), 45–51.
Google Scholar
Seidl, A. (1994). Das Menschmodell RAMSIS—Analyse, Synthese und Simulation dreidimensionaler Körperhaltungen des Menschen. Ph.D. thesis, Technische Universität München.
Seitz, T., Recluta, D., & Zimmermann, D. (2005). An approach for a human posture prediction model using internal/external forces and discomfort. In SAE 2005 world congress.
Google Scholar
Sheikh, Y., Sheikh, M., & Shah, M. (2005). Exploring the space of a human action. In International conference on computer vision (ICCV).
Google Scholar
Sidenbladh, H., Black, M. J., & Sigal, L. (2002). Implicit probabilistic models of human motion for synthesis and tracking. In European conference on computer vision (ECCV).
Google Scholar
Sigal, L., & Black, M. J. (2006a). Humaneva: synchronized video and motion capture dataset for evaluation of articulated human motion. Technical report, Brown University.
Sigal, L., & Black, M. J. (2006b). Predicting 3D people from 2D pictures. In International conference on articulated motion and deformable objects (AMDO).
Google Scholar
Sminchisescu, C., & Triggs, B. (2003). Estimating articulated human motion with covariance scaled sampling. The International Journal of Robotics Research, 22(6), 371–392.
Article Google Scholar
Taylor, G. W., Sigal, L., Fleet, D. J., & Hinton, G. E. (2010). Dynamical binary latent variable models for 3d human pose tracking. In Computer vision and pattern recognition (CVPR).
Google Scholar
Tenenbaum, J. B., Silva, V., & Langford, J. C. (2000). A Global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2319–2323.
Article Google Scholar
Tenorth, M., Bandouch, J., & Beetz, M. (2009). The TUM kitchen data set of everyday manipulation activities for motion tracking and action recognition. In IEEE int. workshop on tracking humans for the evaluation of their motion in image sequences (THEMIS). In conjunction with ICCV2009.
Google Scholar
Tenorth, M., & Beetz, M. (2009). KnowRob—knowledge processing for autonomous personal robots. In Intelligent robots and systems (IROS)..
Google Scholar
Urtasun, R., Fleet, D., & Fua, P. (2006). 3D People tracking with Gaussian process dynamical models. In Computer vision and pattern recognition (CVPR).
Google Scholar
Urtasun, R., & Fua, P. (2004). 3d human body tracking using deterministic temporal motion models. In European conference on computer vision (ECCV).
Google Scholar
Vondrak, M., Sigal, L., & Jenkins, O. (2008). Physical simulation for probabilistic motion tracking. In Computer vision and pattern recognition (CVPR).
Google Scholar
Weinland, D., Ronfard, R., & Boyer, E. (2006). Free viewpoint action recognition using motion history volumes. Computer Vision and Image Understanding, 104(2–3), 249–257.
Article Google Scholar
Wren, C. R., Azarbayejani, A. J., Darrell, T., & Pentland, A. P. (1997). Pfinder: real-time tracking of the human body. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7), 780–785.
Article Google Scholar
Yilmaz, A., & Shah, M. (2005). Actions sketch: A novel action representation. In Computer vision and pattern recognition (CVPR).
Google Scholar

Download references

Author information

Authors and Affiliations

Intelligent Autonomous Systems Group, Technische Universität München, Boltzmannstr. 3, 85748, Garching bei München, Germany
Jan Bandouch & Michael Beetz
Department of Computer Science, Brown University, 115 Waterman St., Providence, RI, 02912-1910, USA
Odest Chadwicke Jenkins

Authors

Jan Bandouch
View author publications
You can also search for this author in PubMed Google Scholar
Odest Chadwicke Jenkins
View author publications
You can also search for this author in PubMed Google Scholar
Michael Beetz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan Bandouch.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bandouch, J., Jenkins, O.C. & Beetz, M. A Self-Training Approach for Visual Tracking and Recognition of Complex Human Activity Patterns. Int J Comput Vis 99, 166–189 (2012). https://doi.org/10.1007/s11263-012-0522-y

Download citation

Received: 18 January 2011
Accepted: 06 March 2012
Published: 23 March 2012
Issue Date: September 2012
DOI: https://doi.org/10.1007/s11263-012-0522-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Self-Training Approach for Visual Tracking and Recognition of Complex Human Activity Patterns

Abstract

Access this article

Similar content being viewed by others

Simple and Robust Automatic Detection and Recognition of Human Movement Patterns in Tasks of Different Complexity

A Monitoring System for Home-Based Physiotherapy Exercises

A review of computer vision-based approaches for physical rehabilitation and assessment

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Self-Training Approach for Visual Tracking and Recognition of Complex Human Activity Patterns

Abstract

Access this article

Similar content being viewed by others

Simple and Robust Automatic Detection and Recognition of Human Movement Patterns in Tasks of Different Complexity

A Monitoring System for Home-Based Physiotherapy Exercises

A review of computer vision-based approaches for physical rehabilitation and assessment

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation