Human Action Segmentation and Recognition Using Discriminative Semi-Markov Models

Shi, Qinfeng; Cheng, Li; Wang, Li; Smola, Alex

doi:10.1007/s11263-010-0384-0

Human Action Segmentation and Recognition Using Discriminative Semi-Markov Models

Published: 14 October 2010

Volume 93, pages 22–32, (2011)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Qinfeng Shi¹,
Li Cheng²,
Li Wang³ &
…
Alex Smola⁴

1024 Accesses
99 Citations
Explore all metrics

Abstract

A challenging problem in human action understanding is to jointly segment and recognize human actions from an unseen video sequence, where one person performs a sequence of continuous actions.

In this paper, we propose a discriminative semi-Markov model approach, and define a set of features over boundary frames, segments, as well as neighboring segments. This enable us to conveniently capture a combination of local and global features that best represent each specific action type. To efficiently solve the inference problem of simultaneous segmentation and recognition, a Viterbi-like dynamic programming algorithm is utilized, which in practice is able to process 20 frames per second. Moreover, the model is discriminatively learned from large margin principle, and is formulated as an optimization problem with exponentially many constraints. To solve it efficiently, we present two different optimization algorithms, namely cutting plane method and bundle method, and demonstrate that each can be alternatively deployed in a “plug and play” fashion. From its theoretical aspect, we also analyze the generalization error of the proposed approach and provide a PAC-Bayes bound.

The proposed approach is evaluated on a variety of datasets, and is shown to perform competitively to the state-of-the-art methods. For example, on KTH dataset, it achieves 95.0% recognition accuracy, where the best known result on this dataset is 93.4% (Reddy and Shah in ICCV, 2009).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Joint Recognition and Segmentation of Actions via Probabilistic Integration of Spatio-Temporal Fisher Vectors

Spatio-Temporal Action Instance Segmentation and Localisation

FIFA: Fast Inference Approximation for Action Segmentation

References

Belongie, S., Malik, J., & Puzicha, J. (2002). Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(4), 509–522.
Article Google Scholar
Brand, M., Oliver, N., & Pentland, A. (1997). Coupled hidden Markov models for complex action recognition. In Proc. IEEE conf. computer vision and pattern recognition (p. 994). Washington: IEEE Comput. Soc.
Chapter Google Scholar
Cheng, L., Wang, S., Schuurmans, D., Caelli, T., & Vishwanathan, S. (2006). An online discriminative approach to background subtraction. In IEEE international conference on advanced video and signal based surveillance (AVSS).
Dollar, P., Rabaud, V., Cottrell, G., & Belongie, S. (2005). Behavior recognition via sparse spatio-temporal features. In VS-PETS workshop.
Ferguson, J. (1980). Variable duration models for speech. In Symposium on the application of hidden Markov models to text and speech (pp. 143–179).
Fox, E., Sudderth, E., Jordan, M., & Willsky, A. (2009). Sharing features among dynamical systems with beta processes. In NIPS.
Gavrila, D. (1999). The visual analysis of human movement: a survey. Computer Vision and Image Understanding, 73(1), 82–98.
Article MATH Google Scholar
Germain, P., Lacasse, A., Laviolette, F., & Marchand, M. (2009). PAC-Bayesian learning of linear classifiers. In ICML (pp. 353–360). New York: ACM.
Google Scholar
Gross, R., & Shi, J. (2001). The CMU motion of body (MoBo) database (Tech. Rep. Tech. Report CMU-RI-TR-01-18). Robotics Institute. Carnegie Mellon University.
Jhuang, H., Serre, T., Wolf, L., & Poggio, T. (2007). A biologically inspired system for action recognition. In ICCV.
Kale, A., Sundaresan, A., Rajagopalan, A., Cuntoor, N., RoyChowdhury, A., Kruger, V., & Chellappa, R. (2004). Identification of humans using gait. In IEEE trans. on image processing (pp. 1163–1173).
Ke, Y., Sukthankar, R., & Hebert, M. (2005). Efficient visual event detection using volumetric features. In ICCV (Vol. 1, pp. 166–173).
Google Scholar
Kimeldorf, G., & Wahba, G. (1971). Some results on Tchebycheffian spline functions. Journal of Mathematical Analysis and Applications, 33, 82–95.
Article MATH MathSciNet Google Scholar
Langford, J., & McAllester, D. (2004). Computable shell decomposition bounds. Journal of Machine Learning Research, 5, 529–547.
MathSciNet Google Scholar
Langford, J., & Shawe-Taylor, J. (2002). PAC-Bayes and margins. In NIPS (pp. 439–446). Cambridge: MIT Press.
Google Scholar
Langford, J., Seeger, M., & Megiddo, N. (2001). An improved predictive accuracy bound for averaging classifiers. In ICML (pp. 290–297).
Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Article Google Scholar
Lv, F., & Nevatia, R. (2006). Recognition and segmentation of 3-d human action using HMM and multi-class adaboost. In European conference on computer vision (Vol. IV, pp. 359–372).
McAllester, D. (1998). Some PAC-Bayesian theorems. In COLT (pp. 230–234). New York: ACM.
Chapter Google Scholar
McAllester, D. (2003a). PAC-Bayesian stochastic model selection. Machine Learning, 51(1), 5–21.
Article MATH Google Scholar
McAllester, D. (2003b). Simplified PAC-Bayesian margin bounds. In COLT (pp. 203–215). New York: ACM.
Google Scholar
Moeslund, T., Hilton, A., & Krüger, V. (2006). A survey of advances in vision-based human motion capture and analysis. Computer Vision and Image Understanding, 104(2), 90–126.
Article Google Scholar
Niebles, J., & Fei, L. F. (2007). A hierarchical model of shape and appearance for human action classification. In Proc. IEEE conf. computer vision and pattern recognition (pp. 1–8).
Nowozin, S., Bakir, G., & Tsuda, K. (2007). Discriminative subsequence mining for action classification. In ICCV.
Ostendorf, M., Digalakis, V., & Kimball, O. (1996). From HMMs to segment models: a unified view of stochastic modeling for speech recognition. IEEE Transactions on Speech and Audio Processing, 4(5), 360–378.
Article Google Scholar
Phillips, J., Humphreys, G., Noppeney, U., & Price, C. (2002). The neural substrates of action retrieval: an examination of semantic and visual routes to action. Visual Cognition, 9(4–5), 662–685.
Article Google Scholar
Ratsch, G., & Sonnenburg, S. (2006). Large scale hidden semi-Markov SVMs. In B. Schölkopf, J. Platt, & T. Hoffman (Eds.), NIPS (pp. 1161–1168) Cambridge: MIT Press.
Google Scholar
Reddy, K. Shah, J.L., M. (2009). Incremental action recognition using feature tree. In ICCV.
Sarawagi, S., & Cohen, W. (2004). Semi-Markov conditional random fields for information extraction. In NIPS.
Schindler, K., & van Gool, L. (2008). Action snippets: how many frames does human action recognition require? In Computer vision and pattern recognition (CVPR) New York: IEEE Press.
Google Scholar
Schuldt, C., Laptev, I., & Caputo, B. (2004). Recognizing human actions: a local SVM approach. In Proc intl conf pattern recognition (pp. 32–36). Washington: IEEE Comput. Soc.
Google Scholar
Shi, Q., Wang, L., Cheng, L., & Smola, A. (2008). Discriminative human action segmentation and recognition using semi-Markov model. In CVPR.
Sivic, J., & Zisserman, A. (2003). Video Google: a text retrieval approach to object matching in videos. In Proceedings of the international conference on computer vision (Vol. 2, 1470–1477).
Article Google Scholar
Sminchisescu, C., Kanaujia, A., Li, Z., & Metaxas, D. (2005). Conditional models for contextual human motion recognition. In IEEE international conference on computer vision (pp. 1808–1815).
Smola, A., Vishwanathan, S., & Le, Q. (2007). Bundle methods for machine learning. In NIPS.
Taskar, B., Guestrin, C., & Koller, D. (2004). Max-margin Markov networks. In S. Thrun, L. Saul, B. Schölkopf (Eds.), NIPS (pp. 25–32). Cambridge: MIT Press.
Google Scholar
Teo, C., Le, Q., Smola, A., & Vishwanathan, S. (2007). A scalable modular convex solver for regularized risk minimization. In KDD.
Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6, 1453–1484.
MathSciNet Google Scholar
Vapnik, V. (1995). The nature of statistical learning theory. New York: Springer.
MATH Google Scholar
Wang, L., & Suter, D. (2007). Recognizing human activities from silhouettes: motion subspace and factorial discriminative graphical model. In Proc. IEEE conf. computer vision and pattern recognition (pp. 1–8).
Wong, S., Kim, T., & Cipolla, R. (2007). Learning motion categories using both semantic and structural information. In CVPR (pp. 1–6).
Yamato, J., Ohya, J., & Ishii, K. (1992). Recognizing human action in time-sequential images using hidden Markov model. In Proc. IEEE conf. computer vision and pattern recognition (pp. 379–385).

Download references

Author information

Authors and Affiliations

University of Adelaide, Adelaide, Australia
Qinfeng Shi
Bioinformatics Institute, A*STAR, Singapore, Singapore
Li Cheng
Nanjing Forestry University, Nanjing, China
Li Wang
Yahoo! Research, Santa Clara, USA
Alex Smola

Authors

Qinfeng Shi
View author publications
You can also search for this author in PubMed Google Scholar
Li Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Li Wang
View author publications
You can also search for this author in PubMed Google Scholar
Alex Smola
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Cheng.

Additional information

A preliminary version has been published at Shi et al. (2008).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shi, Q., Cheng, L., Wang, L. et al. Human Action Segmentation and Recognition Using Discriminative Semi-Markov Models. Int J Comput Vis 93, 22–32 (2011). https://doi.org/10.1007/s11263-010-0384-0

Download citation

Received: 12 August 2008
Accepted: 13 September 2010
Published: 14 October 2010
Issue Date: May 2011
DOI: https://doi.org/10.1007/s11263-010-0384-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human Action Segmentation and Recognition Using Discriminative Semi-Markov Models

Abstract

Access this article

Similar content being viewed by others

Joint Recognition and Segmentation of Actions via Probabilistic Integration of Spatio-Temporal Fisher Vectors

Spatio-Temporal Action Instance Segmentation and Localisation

FIFA: Fast Inference Approximation for Action Segmentation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Human Action Segmentation and Recognition Using Discriminative Semi-Markov Models

Abstract

Access this article

Similar content being viewed by others

Joint Recognition and Segmentation of Actions via Probabilistic Integration of Spatio-Temporal Fisher Vectors

Spatio-Temporal Action Instance Segmentation and Localisation

FIFA: Fast Inference Approximation for Action Segmentation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation