Modeling spatio-temporal layout with Lie Algebrized Gaussians for action recognition

Chen, Meng; Gong, Liyu; Wang, Tianjiang; Liu, Fang; Feng, Qi

doi:10.1007/s11042-015-3008-4

Modeling spatio-temporal layout with Lie Algebrized Gaussians for action recognition

Published: 26 October 2015

Volume 75, pages 10335–10355, (2016)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Meng Chen¹,
Liyu Gong²,
Tianjiang Wang¹,
Fang Liu¹ &
…
Qi Feng¹

284 Accesses
2 Citations
Explore all metrics

Abstract

We propose a novel approach to model spatio-temporal distribution of local features for action recognition in videos. The proposed approach is based on the Lie Algebrized Gaussians (LAG) which is a feature aggregation approach and yields high-dimensional video signature. In the framework of LAG, local features extracted from a video are aggregated to train a video-specific Gaussian Mixture Model (GMM). Then the video-specific GMM is encoded as a vector based on Lie group theory and this step is also referred to as GMM vectorization. As the video-specific GMM gives a soft partition of the feature space, for each cell of the feature space (i.e. each Gaussian component), we use a GMM to model the spatio-temporal locations of the local features assigned to the Gaussian component. The location GMMs are encoded as vectors just like the local feature GMM. We term those vectors of location GMMs spatio-temporal LAG (STLAG). In addition, although the LAG and the popular Fisher Vector (FV) are derived from distinct theory perspectives, we find that they are closely related. Hence the power and ℓ ₂ normalization proposed for the FV are also beneficial to the LAG. Experimental results show that STLAG is very effective to model spatio-temporal layout compared with other techniques such as spatio-temporal pyramid and feature augmentation. Using the state-of-the-art dense trajectory features, our approach achieves state-of-the-art performance on two challenging datasets: Hollywood2 and HMDB51.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human action recognition using fusion of multiview and deep features: an application to video surveillance

Article 14 March 2020

Toward human activity recognition: a survey

Article 20 October 2022

PTDS CenterTrack: pedestrian tracking in dense scenes with re-identification and feature enhancement

Article 15 April 2024

Notes

Software available at http://lear.inrialpes.fr/~wang/improved_trajectories

References

Blank M, Gorelick L, Shechtman E, Irani M, Basri R (2005) Actions as space-time shapes. In: IEEE International Conference on Computer Vision
Boiman O, Shechtman E, Irani M (2008) In defense of nearest-neighbor based image classification. In: IEEE Conference on Computer Vision and Pattern Recognition
Cai Z, Wang L, Peng X, Qiao Y (2014) Multi-view super vector for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition
Chang C, Lin C (2011) LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2(27):1–27:27. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Article Google Scholar
Chen M, Gong L, Wang T, Feng Q (2015) Action recognition using lie algebrized gaussians over dense local spatio-temporal features. Multimedia Tools and Applications 74(6):2127–2142
Article Google Scholar
Gong L, Chen M, Hu C (2013) Lie algebrized gaussians for image representation. arXiv:1304.0823v1[cs.CV]
Hu C, Gong L, Wang T, Feng Q (2015) Effective human age estimation using a two-stage approach based on lie algebrized gaussians feature. Multimedia Tools and Applications 74(11):4139–4159
Article Google Scholar
Hu C, Gong L, Wang T, Liu F, Feng Q (2014) An effective head pose estimation approach using lie algebrized gaussians based face representation. Multimedia Tools and Applications 73(3):1863–1884
Article Google Scholar
Huang Y, Wu Z, Wang L, Tan T (2014) Feature coding in image classification: A comprehensive study. IEEE Transactions on Pattern Analysis and Machine Intelligence 36(3):493–506
Article Google Scholar
Jain M, Jégou H, Bouthemy P (2013) Better exploiting motion for better action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition
Jegou H, Perronnin F, Douze M, Sanchez J, Perez P, Schmid C (2012) Aggregating local image descriptors into compact codes. IEEE Transactions on Pattern Analysis and Machine Intelligence
Kantorov V, Laptev I (2014) Efficient feature extraction, encoding, and classification for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition
Ken Chatfield Victor Lempitsky AV, Zisserman A (2011) The devil is in the details: an evaluation of recent feature encoding methods. In: British Machine Vision Conference
Kihl O, Picard D, Gosselin PH (2014) Local polynomial space-time descriptors for action classification. Machine Vision and Applications 1–11
Krapac J, Verbeek J, Jurie F (2011) Modeling spatial layout with fisher vectors for image categorization. In: IEEE International Conference on Computer Vision
Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) Hmdb: A large video database for human motion recognition. In: IEEE International Conference on Computer Vision
Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition
Marszałek M, Laptev I, Schmid C (2009) Actions in context. In: IEEE Conference on Computer Vision and Pattern Recognition
McCann S, Lowe D (2012) Spatially local coding for object recognition. In: Asian Conference on Computer Vision
Oneata D, Verbeek J, Schmid C (2013) Action and event recognition with fisher vectors on a compact feature set. In: IEEE International Conference on Computer Vision
Oneata D, Verbeek J, Schmid C (2014) Efficient action localization with approximately normalized fisher vectors. In: IEEE Conference on Computer Vision and Pattern Recognition
Peng X, Qiao Y, Peng Q (2014) Motion boundary based sampling and 3d co-occurrence descriptors for action recognition. Image Vis Comput 32(9):616–628
Article Google Scholar
Peng X, Wang L, Qiao Y, Peng Q (2014) Boosting vlad with supervised dictionary learning and high-order statistics. In: European Conference on Computer Vision
Perronnin F (2008) Universal and adapted vocabularies for generic visual categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence
Perronnin F, Liu Y, Sanchez J, Poirier H (2010) Large-scale image retrieval with compressed fisher vectors. In: IEEE Conference on Computer Vision and Pattern Recognition
Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: European Conference on Computer Vision
Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted gaussian mixture models. Digital Signal Processing 10(1-3):19–41
Article Google Scholar
Sánchez J, Perronnin F, De Campos T (2012) Modeling the spatial layout of images beyond spatial pyramids. Pattern Recogn Lett 33(16):2216–2223
Article Google Scholar
Sanchez J, Perronnin F, Mensink T, Verbeek J (2013) Image classification with the fisher vector: Theory and practice. International Journal of Computer Vision
Schüldt C, Laptev I, Caputo B (2004) Recognizing human actions: A local svm approach. In: International Conference on Pattern Recognition
Sun L, Jia K, Chan TH, Fang Y, Wang G, Yan S (2014) Dl-sfa: Deeply-learned slow feature analysis for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition
Wang H, Kläser A, Schmid C, Liu CL (2013) Dense trajectories and motion boundary descriptors for action recognition. International Journal of Computer Vision
Wang H, Schmid C (2013) Action recognition with improved trajectories. In: IEEE International Conference on Computer Vision
Wang X, Wang L, Qiao Y (2012) A comparative study of encoding, pooling and normalization methods for action recognition. In: Asian Conference on Computer Vision
Wu J, Zhang Y, Lin W (2014) Towards good practices for action video encoding. In: IEEE Conference on Computer Vision and Pattern Recognition
Wu X, Xu D, Duan L, Luo J (2011) Action recognition using context and appearance distribution features. In: IEEE Conference on Computer Vision and Pattern Recognition
Yan S, Zhou X, Liu M, Hasegawa-Johnson M, Huang TS (2008) Regression from patch-kernel. In: IEEE Conference on Computer Vision and Pattern Recognition
Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition
Yang X, Tian Y (2014) Action recognition using super sparse coding vector with spatio-temporal awareness. In: European Conference on Computer Vision
Zhou X, Cui N, Li Z, Liang F, Huang TS (2009) Hierarchical gaussianization for image classification. In: IEEE International Conference on Computer Vision
Zhou X, Yu K, Zhang T, Huang TS (2010) Image classification using super-vector coding of local image descriptors. In: European Conference on Computer Vision
Zhou X, Zhuang X, Yan S, Chang S, Hasegawa-Johnson M, Huang TS (2008) Sift-bag kernel for video event analysis. In: ACM International Conference on Multimedia
Zhu J, Wang B, Yang X, Zhang W, Tu Z (2013) Action recognition with actons. In: IEEE International Conference on Computer Vision

Download references

Acknowledgements

This work was supported by grants from the National Natural Science Foundation of China (No.U1233119) and the Wuhan Key Science and Technology Project (No.2014010202010110).

Author information

Authors and Affiliations

School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China
Meng Chen, Tianjiang Wang, Fang Liu & Qi Feng
Eedoo Inc, Beijing, 100085, China
Liyu Gong

Authors

Meng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Liyu Gong
View author publications
You can also search for this author in PubMed Google Scholar
Tianjiang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Fang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Qi Feng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qi Feng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, M., Gong, L., Wang, T. et al. Modeling spatio-temporal layout with Lie Algebrized Gaussians for action recognition. Multimed Tools Appl 75, 10335–10355 (2016). https://doi.org/10.1007/s11042-015-3008-4

Download citation

Received: 16 March 2015
Revised: 18 August 2015
Accepted: 14 October 2015
Published: 26 October 2015
Issue Date: September 2016
DOI: https://doi.org/10.1007/s11042-015-3008-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modeling spatio-temporal layout with Lie Algebrized Gaussians for action recognition

Abstract

Access this article

Similar content being viewed by others

Human action recognition using fusion of multiview and deep features: an application to video surveillance

Toward human activity recognition: a survey

PTDS CenterTrack: pedestrian tracking in dense scenes with re-identification and feature enhancement

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Modeling spatio-temporal layout with Lie Algebrized Gaussians for action recognition

Abstract

Access this article

Similar content being viewed by others

Human action recognition using fusion of multiview and deep features: an application to video surveillance

Toward human activity recognition: a survey

PTDS CenterTrack: pedestrian tracking in dense scenes with re-identification and feature enhancement

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation