Skip to main content
Log in

Modeling spatio-temporal layout with Lie Algebrized Gaussians for action recognition

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

We propose a novel approach to model spatio-temporal distribution of local features for action recognition in videos. The proposed approach is based on the Lie Algebrized Gaussians (LAG) which is a feature aggregation approach and yields high-dimensional video signature. In the framework of LAG, local features extracted from a video are aggregated to train a video-specific Gaussian Mixture Model (GMM). Then the video-specific GMM is encoded as a vector based on Lie group theory and this step is also referred to as GMM vectorization. As the video-specific GMM gives a soft partition of the feature space, for each cell of the feature space (i.e. each Gaussian component), we use a GMM to model the spatio-temporal locations of the local features assigned to the Gaussian component. The location GMMs are encoded as vectors just like the local feature GMM. We term those vectors of location GMMs spatio-temporal LAG (STLAG). In addition, although the LAG and the popular Fisher Vector (FV) are derived from distinct theory perspectives, we find that they are closely related. Hence the power and 2 normalization proposed for the FV are also beneficial to the LAG. Experimental results show that STLAG is very effective to model spatio-temporal layout compared with other techniques such as spatio-temporal pyramid and feature augmentation. Using the state-of-the-art dense trajectory features, our approach achieves state-of-the-art performance on two challenging datasets: Hollywood2 and HMDB51.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. Software available at http://lear.inrialpes.fr/~wang/improved_trajectories

References

  1. Blank M, Gorelick L, Shechtman E, Irani M, Basri R (2005) Actions as space-time shapes. In: IEEE International Conference on Computer Vision

  2. Boiman O, Shechtman E, Irani M (2008) In defense of nearest-neighbor based image classification. In: IEEE Conference on Computer Vision and Pattern Recognition

  3. Cai Z, Wang L, Peng X, Qiao Y (2014) Multi-view super vector for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition

  4. Chang C, Lin C (2011) LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2(27):1–27:27. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

    Article  Google Scholar 

  5. Chen M, Gong L, Wang T, Feng Q (2015) Action recognition using lie algebrized gaussians over dense local spatio-temporal features. Multimedia Tools and Applications 74(6):2127–2142

    Article  Google Scholar 

  6. Gong L, Chen M, Hu C (2013) Lie algebrized gaussians for image representation. arXiv:1304.0823v1[cs.CV]

  7. Hu C, Gong L, Wang T, Feng Q (2015) Effective human age estimation using a two-stage approach based on lie algebrized gaussians feature. Multimedia Tools and Applications 74(11):4139–4159

    Article  Google Scholar 

  8. Hu C, Gong L, Wang T, Liu F, Feng Q (2014) An effective head pose estimation approach using lie algebrized gaussians based face representation. Multimedia Tools and Applications 73(3):1863–1884

    Article  Google Scholar 

  9. Huang Y, Wu Z, Wang L, Tan T (2014) Feature coding in image classification: A comprehensive study. IEEE Transactions on Pattern Analysis and Machine Intelligence 36(3):493–506

    Article  Google Scholar 

  10. Jain M, Jégou H, Bouthemy P (2013) Better exploiting motion for better action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition

  11. Jegou H, Perronnin F, Douze M, Sanchez J, Perez P, Schmid C (2012) Aggregating local image descriptors into compact codes. IEEE Transactions on Pattern Analysis and Machine Intelligence

  12. Kantorov V, Laptev I (2014) Efficient feature extraction, encoding, and classification for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition

  13. Ken Chatfield Victor Lempitsky AV, Zisserman A (2011) The devil is in the details: an evaluation of recent feature encoding methods. In: British Machine Vision Conference

  14. Kihl O, Picard D, Gosselin PH (2014) Local polynomial space-time descriptors for action classification. Machine Vision and Applications 1–11

  15. Krapac J, Verbeek J, Jurie F (2011) Modeling spatial layout with fisher vectors for image categorization. In: IEEE International Conference on Computer Vision

  16. Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) Hmdb: A large video database for human motion recognition. In: IEEE International Conference on Computer Vision

  17. Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition

  18. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition

  19. Marszałek M, Laptev I, Schmid C (2009) Actions in context. In: IEEE Conference on Computer Vision and Pattern Recognition

  20. McCann S, Lowe D (2012) Spatially local coding for object recognition. In: Asian Conference on Computer Vision

  21. Oneata D, Verbeek J, Schmid C (2013) Action and event recognition with fisher vectors on a compact feature set. In: IEEE International Conference on Computer Vision

  22. Oneata D, Verbeek J, Schmid C (2014) Efficient action localization with approximately normalized fisher vectors. In: IEEE Conference on Computer Vision and Pattern Recognition

  23. Peng X, Qiao Y, Peng Q (2014) Motion boundary based sampling and 3d co-occurrence descriptors for action recognition. Image Vis Comput 32(9):616–628

    Article  Google Scholar 

  24. Peng X, Wang L, Qiao Y, Peng Q (2014) Boosting vlad with supervised dictionary learning and high-order statistics. In: European Conference on Computer Vision

  25. Perronnin F (2008) Universal and adapted vocabularies for generic visual categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence

  26. Perronnin F, Liu Y, Sanchez J, Poirier H (2010) Large-scale image retrieval with compressed fisher vectors. In: IEEE Conference on Computer Vision and Pattern Recognition

  27. Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: European Conference on Computer Vision

  28. Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted gaussian mixture models. Digital Signal Processing 10(1-3):19–41

    Article  Google Scholar 

  29. Sánchez J, Perronnin F, De Campos T (2012) Modeling the spatial layout of images beyond spatial pyramids. Pattern Recogn Lett 33(16):2216–2223

    Article  Google Scholar 

  30. Sanchez J, Perronnin F, Mensink T, Verbeek J (2013) Image classification with the fisher vector: Theory and practice. International Journal of Computer Vision

  31. Schüldt C, Laptev I, Caputo B (2004) Recognizing human actions: A local svm approach. In: International Conference on Pattern Recognition

  32. Sun L, Jia K, Chan TH, Fang Y, Wang G, Yan S (2014) Dl-sfa: Deeply-learned slow feature analysis for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition

  33. Wang H, Kläser A, Schmid C, Liu CL (2013) Dense trajectories and motion boundary descriptors for action recognition. International Journal of Computer Vision

  34. Wang H, Schmid C (2013) Action recognition with improved trajectories. In: IEEE International Conference on Computer Vision

  35. Wang X, Wang L, Qiao Y (2012) A comparative study of encoding, pooling and normalization methods for action recognition. In: Asian Conference on Computer Vision

  36. Wu J, Zhang Y, Lin W (2014) Towards good practices for action video encoding. In: IEEE Conference on Computer Vision and Pattern Recognition

  37. Wu X, Xu D, Duan L, Luo J (2011) Action recognition using context and appearance distribution features. In: IEEE Conference on Computer Vision and Pattern Recognition

  38. Yan S, Zhou X, Liu M, Hasegawa-Johnson M, Huang TS (2008) Regression from patch-kernel. In: IEEE Conference on Computer Vision and Pattern Recognition

  39. Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition

  40. Yang X, Tian Y (2014) Action recognition using super sparse coding vector with spatio-temporal awareness. In: European Conference on Computer Vision

  41. Zhou X, Cui N, Li Z, Liang F, Huang TS (2009) Hierarchical gaussianization for image classification. In: IEEE International Conference on Computer Vision

  42. Zhou X, Yu K, Zhang T, Huang TS (2010) Image classification using super-vector coding of local image descriptors. In: European Conference on Computer Vision

  43. Zhou X, Zhuang X, Yan S, Chang S, Hasegawa-Johnson M, Huang TS (2008) Sift-bag kernel for video event analysis. In: ACM International Conference on Multimedia

  44. Zhu J, Wang B, Yang X, Zhang W, Tu Z (2013) Action recognition with actons. In: IEEE International Conference on Computer Vision

Download references

Acknowledgements

This work was supported by grants from the National Natural Science Foundation of China (No.U1233119) and the Wuhan Key Science and Technology Project (No.2014010202010110).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qi Feng.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, M., Gong, L., Wang, T. et al. Modeling spatio-temporal layout with Lie Algebrized Gaussians for action recognition. Multimed Tools Appl 75, 10335–10355 (2016). https://doi.org/10.1007/s11042-015-3008-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-015-3008-4

Keywords

Navigation