Action Recognition Based on Learnt Motion Semantic Vocabulary

Zhao, Qiong; Lu, Zhiwu; Ip, Horace H. S.

doi:10.1007/978-3-642-15702-8_18

Qiong Zhao²²,
Zhiwu Lu²² &
Horace H. S. Ip²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6297))

Included in the following conference series:

Pacific-Rim Conference on Multimedia

1469 Accesses
2 Citations

Abstract

This paper presents a novel contextual spectral embedding (CSE) framework for human action recognition, which automatically learns the high-level features (motion semantic vocabulary) from a large vocabulary of abundant mid-level features (i.e. visual words). Our novelty is to exploit the inter-video context between mid-level features for spectral embedding, while the context is captured by the Pearson product moment correlation between mid-level features instead of Gaussian function computed over the vectors of point-wise information as mid-level feature representation. Our goal is to embed the mid-level features into a semantic low-dimensional space, and learn a much compact semantic vocabulary upon the CSE framework. Experiments on two action datasets demonstrate that our approach can achieve significantly improved results with respect to the state of the arts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Liu, J., Yang, Y., Shah, M.: Learning semantic visual vocabularies using diffusion distance. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 461–468 (2009)
Google Scholar
Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos ”in the wild”. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 1996–2003 (2009)
Google Scholar
Wang, L., Lu, Z., Ip, H.H.S.: Image categorization based on a hierarchical spatial markov model. In: Jiang, X., Petkov, N. (eds.) CAIP 2009. LNCS, vol. 5702, pp. 766–773. Springer, Heidelberg (2009)
Google Scholar
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local svm approach. In: Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, vol. 3, pp. 32–36 (2004)
Google Scholar
Liu, J., Shah, M.: Learning human actions via information maximization. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8 (2008)
Google Scholar
Savarese, S., DelPozo, A., Niebles, J., Fei-Fei, L.: Spatial-temporal correlatons for unsupervised action classification. In: IEEE Workshop on Motion and video Computing, WMVC 2008, pp. 1–8 (2008)
Google Scholar
Wong, S.F., Kim, T.K., Cipolla, R.: Learning motion categories using both semantic and structural information. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2007, pp. 1–6 (2007)
Google Scholar
Lu, Z., Ip, H.H.S.: Image categorization with spatial mismatch kernels. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 397–404 (2009)
Google Scholar
Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vision 79, 299–318 (2008)
Article Google Scholar
Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005)
Google Scholar
Ballan, L., Bertini, M., Del Bimbo, A., Seidenari, L., Serra, G.: Effective codebooks for human action categorization (2009)
Google Scholar
Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 2, pp. 524–531 (2005)
Google Scholar
Lafon, S., Lee, A.: Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 1393–1403 (2006)
Article Google Scholar
Yan, S., Xu, D., Zhang, B., Zhang, H.J., Yang, Q., Lin, S.: Graph embedding and extensions: A general framework for dimensionality reduction. IEEE Transactions on Pattern Analysis and Machine Intelligence 29, 40–51 (2007)
Article Google Scholar
Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: The Tenth IEEE International Conference on Computer Vision (ICCV 2005), pp. 1395–1402 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Innovative Applications of Internet And Multimedia Technologies (AIMtech), Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
Qiong Zhao, Zhiwu Lu & Horace H. S. Ip

Authors

Qiong Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Zhiwu Lu
View author publications
You can also search for this author in PubMed Google Scholar
Horace H. S. Ip
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, University of Nottingham, Jubilee Campus, NG8 1BB, Nottingham, UK
Guoping Qiu
The Centre for Multimedia Signal Processing, The Hong Kong Polytechnic University, Hong Kong, China
Kin Man Lam
Faculty of System Design, Tokyo Metropolitan University, 6-6, Asahigaoka, 191-0065, Hino-city, Tokyo
Hitoshi Kiya
Shanghai Key Laboratory of Intelligent Information Processing, Department of Computer Science & Engineering, Fudan University, Shanghai, China
Xiang-Yang Xue
Department of Electrical Engineering, University of Southern California, 90089-2564, Los Angeles, CA
C.-C. Jay Kuo
LIACS Media Lab, Leiden University,
Michael S. Lew

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, Q., Lu, Z., Ip, H.H.S. (2010). Action Recognition Based on Learnt Motion Semantic Vocabulary. In: Qiu, G., Lam, K.M., Kiya, H., Xue, XY., Kuo, CC.J., Lew, M.S. (eds) Advances in Multimedia Information Processing - PCM 2010. PCM 2010. Lecture Notes in Computer Science, vol 6297. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15702-8_18

Download citation

DOI: https://doi.org/10.1007/978-3-642-15702-8_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15701-1
Online ISBN: 978-3-642-15702-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics