Skip to main content
Log in

Power difference template for action recognition

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

This paper proposes power difference template as a new spatial-temporal representation for action recognition. Specifically, spatial power features are first extracted according to the transform of Gaussian convolution on gradients between logarithmic and exponential domain. Using the forward–backward frame power difference method, we thus present normalized projection histogram (NPH) to characterize segmented action spatial features by normalizing histogram of the 2D horizontal–vertical projections. Furthermore, from the perspective of energy conservation, motion kinetic velocity (MKV) is introduced as a supplement for representing temporal relationships of power features by supposing that the variation of power is produced by motion in the form of kinetic energy. Our power difference template fusing NPH and MKV is further integrated to a bag of word model for training and testing under a support vector machine framework. Experiments on KTH, UCF Sports, UCF101 and HMDB datasets demonstrate the effectiveness of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Ma, S., et al.: Action recognition and localization by hierarchical space-time segments. In: Proceedings of IEEE Conference on Computer Vision, pp. 2744–2751 (2013)

  2. Cao, X., et al.: Action recognition using 3d daisy descriptor. Mach. Vis. Appl. 25, 159–171 (2014)

    Article  Google Scholar 

  3. Ballas, N., et al.: Space-time robust representation for action recognition. In: Proceedings of IEEE Conference Computer Vision, pp. 2704–2711 (2013)

  4. LE, Q., et al.: Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: Proceedings of IEEE Conference Computer Vision Pattern Recognition, pp. 3361–3368 (2011)

  5. Wang, H., et al.: Action recognition using nonnegative action component representation and sparse basis selection. IEEE Trans. Image Process. 23, 570–581 (2014)

    Article  MathSciNet  Google Scholar 

  6. Cao, L., et al.: Scene aligned pooling for complex video recognition. In: Proceedings of European Conference Computer Vision, pp. 688–701 (2012)

  7. Brendel, W., Todorovic, S.: Learning spatiotemporal graphs of human activities. In: Proceedings of IEEE International Conference Computer Vision, pp. 778–785 (2011)

  8. Laptev, I., Linderberg, T.: Space-time interest points. In: Proceedings of IEEE International Conference Computer Vision, pp. 3362–3364 (2003)

  9. Willems, T.T.G., Gool, L.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: Proceedings of European Conference Computer Vision (2008)

  10. Laptev, I., et al.: Learning realistic human actions from movies. In: IEEE Conference Computer Vision Pattern Recognition, pp. 23–28 (2008)

  11. Klaser, M.M.A., Schmid, C.: A spatio-temporal descriptor based on 3d gradients. In: Proceedings of Bmvc (2008)

  12. Kovashka, A., Grauman, K.: Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: Proceedings of IEEE Conference Computer Vision Pattern Recognition, pp. 2046–2053 (2010)

  13. Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23, 257–267 (2001)

    Article  Google Scholar 

  14. Rodriguez, J.A.M., Shah, M.: Action mach a spatio-temporal maximum average correlation height filter for action recognition. In: Proceedings of IEEE Conference Computer Vision Pattern Recognition, pp. 1–8 (2008)

  15. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1106–1114 (2012)

    Google Scholar 

  16. Tran, D., et al.: Learning spatiotemporal features with 3d convolutional network. In: ICCV, pp. 4489–4497 (2015)

  17. Veeriah, V., Zhuang, N., Qi, G.: Differential recurrent neural networks for action recognition. In: ICCV, pp. 4041–4049 (2015)

  18. Xin, M., et al.: Arch: Adaptive recurrent-convolutional hybrid networks for long-term action recognition. Neurocomputing 178, 87–102 (2016)

    Article  Google Scholar 

  19. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of IEEE Conference Computer Vision Pattern Recognition, pp. 886–893 (2005)

  20. Schuldt, I.L.C., Caputo, B.: Recognizing human actions: a local svm approach. In: Proceedings of 17th International Conference Pattern Recognition, pp. 32–36 (2004)

  21. Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28, 976–990 (2010)

    Article  Google Scholar 

  22. Wang, H., et al.: Evaluation of local spatio-temporal features for action recognition. In: Proceedings of IEEE British Machine Vision Conference, pp. 124.1–124.11 (2009)

  23. Kaaniche, M., Bremond, F.: Gesture recognition by learning local motion signatures. In: Proceedings of IEEE Conference Computer Vision Pattern Recognition, pp. 2745–2752 (2010)

  24. Wu, X., et al.: Action recognition using context and appearance distribution features. In: Proceedings of IEEE Conference Computer Vision Pattern Recognition, pp. 489–496 (2011)

  25. Sadanand, S., Corso, J.J.: Action bank: a high-level representation of activity in video. In: Proceedings of IEEE Conference Computer Vision Pattern Recogniton, pp. 1234–1241 (2013)

  26. Derpanis, K., et al.: Action spotting and recognition based on a spatiotemporal orientation analysis. IEEE Trans. Pattern. Anal. Mach. Intell. 35, 527–540 (2013)

    Article  Google Scholar 

  27. Eweiwi, M.C.A., Bauckhage, C.: Action recognition in still images by learning spatial interest regions from videos. Pattern. Recognit. Lett. 51, 8–15 (2014)

    Article  Google Scholar 

  28. Jiang, Z.L.Z., Davvis, L.: A unified tree-based framework for joint action localization, recognition and segmentation. Comput. Vis. Image Underst 117, 1345–1355 (2013)

    Article  Google Scholar 

  29. Adeli-Mosabbeb, E., Fathy, M.: Non-negative matrix completion for action detection. Image Vis. Comput. 39, 38–51 (2015)

    Article  Google Scholar 

  30. Sheng, W.Y.B., Sun, C.: Action recognition using direction-dependent feature pairs and non-negative low rank sparse model. Neuro 158, 73–80 (2015)

    Google Scholar 

  31. Kuehne, H., et al.: Hmdb: a large video database for human motion recognition. In: Proceedings of IEEE International Conference Computer Vision, pp. 2556–2563 (2011)

  32. Chang, C., Lin, C.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 1–27 (2011)

    Article  Google Scholar 

  33. Shih, Y., et al.: Style transfer for headshot portraits. ACM Trans. Graph. 33 (2014)

  34. Khan, R., et al.: Spatial histograms of soft pairwise similar patches to improve the bag-of-visual-words model. Comput. Vis. Image. Underst. 132, 102–112 (2015)

    Article  Google Scholar 

  35. Su, F.D.S., Agrawala, M.: De-emphasis of distracting image regions using texture power maps. In: Proceedings of Symposium Applied Perception Graphics Visual, pp. 119–124 (2005)

  36. Soomro, K., Zamir, A.R., Shah, M.: Ucf101: a dataset of 101 human actions classes from videos in the wild. Comput. Sci. (2016)

  37. Vedaldi, A., Fulkerson, B.: Vlfeat: an open and portable library of computer vision algorithms. VLFeat.org (Online). http://www.vlfeat.org/ (2008)

  38. Li, F., et al.: Libsvm-parallel-chi2 library, version 1.0. version. Dept. Mathe. Sci., Lund Uni. Lund, Sweden (Online). http://www.maths.lth.se/matematiklth/personal/sminchis/code/libsvm-chi2.html/ (2012)

  39. Iosifidis, A.T.A., Pitas, I.: Discriminant bag of words based representation for human action recognition. Pattern Recognit. Let. 49, 185–192 (2014)

    Article  Google Scholar 

  40. Karpathy, A., et al.: Large-scale video classification with convolutional neural networks. In: CVPR, pp. 1725–1732 (2014)

  41. Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings of IEEE International Conference Computer Vision, pp. 3551–3558 (2013)

Download references

Acknowledgements

This research is supported by the National Natural Science Foundation of China (Grant No.: 661273339). The author also would like to thank Berthold K. P. Horn for his good ideas during author’s visit study at MIT CSAIL.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liangliang Wang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, L., Li, R. & Fang, Y. Power difference template for action recognition. Machine Vision and Applications 28, 463–473 (2017). https://doi.org/10.1007/s00138-017-0848-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00138-017-0848-0

Keywords

Navigation