Abstract
This paper proposes power difference template as a new spatial-temporal representation for action recognition. Specifically, spatial power features are first extracted according to the transform of Gaussian convolution on gradients between logarithmic and exponential domain. Using the forward–backward frame power difference method, we thus present normalized projection histogram (NPH) to characterize segmented action spatial features by normalizing histogram of the 2D horizontal–vertical projections. Furthermore, from the perspective of energy conservation, motion kinetic velocity (MKV) is introduced as a supplement for representing temporal relationships of power features by supposing that the variation of power is produced by motion in the form of kinetic energy. Our power difference template fusing NPH and MKV is further integrated to a bag of word model for training and testing under a support vector machine framework. Experiments on KTH, UCF Sports, UCF101 and HMDB datasets demonstrate the effectiveness of the proposed algorithm.
Similar content being viewed by others
References
Ma, S., et al.: Action recognition and localization by hierarchical space-time segments. In: Proceedings of IEEE Conference on Computer Vision, pp. 2744–2751 (2013)
Cao, X., et al.: Action recognition using 3d daisy descriptor. Mach. Vis. Appl. 25, 159–171 (2014)
Ballas, N., et al.: Space-time robust representation for action recognition. In: Proceedings of IEEE Conference Computer Vision, pp. 2704–2711 (2013)
LE, Q., et al.: Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: Proceedings of IEEE Conference Computer Vision Pattern Recognition, pp. 3361–3368 (2011)
Wang, H., et al.: Action recognition using nonnegative action component representation and sparse basis selection. IEEE Trans. Image Process. 23, 570–581 (2014)
Cao, L., et al.: Scene aligned pooling for complex video recognition. In: Proceedings of European Conference Computer Vision, pp. 688–701 (2012)
Brendel, W., Todorovic, S.: Learning spatiotemporal graphs of human activities. In: Proceedings of IEEE International Conference Computer Vision, pp. 778–785 (2011)
Laptev, I., Linderberg, T.: Space-time interest points. In: Proceedings of IEEE International Conference Computer Vision, pp. 3362–3364 (2003)
Willems, T.T.G., Gool, L.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: Proceedings of European Conference Computer Vision (2008)
Laptev, I., et al.: Learning realistic human actions from movies. In: IEEE Conference Computer Vision Pattern Recognition, pp. 23–28 (2008)
Klaser, M.M.A., Schmid, C.: A spatio-temporal descriptor based on 3d gradients. In: Proceedings of Bmvc (2008)
Kovashka, A., Grauman, K.: Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: Proceedings of IEEE Conference Computer Vision Pattern Recognition, pp. 2046–2053 (2010)
Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23, 257–267 (2001)
Rodriguez, J.A.M., Shah, M.: Action mach a spatio-temporal maximum average correlation height filter for action recognition. In: Proceedings of IEEE Conference Computer Vision Pattern Recognition, pp. 1–8 (2008)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1106–1114 (2012)
Tran, D., et al.: Learning spatiotemporal features with 3d convolutional network. In: ICCV, pp. 4489–4497 (2015)
Veeriah, V., Zhuang, N., Qi, G.: Differential recurrent neural networks for action recognition. In: ICCV, pp. 4041–4049 (2015)
Xin, M., et al.: Arch: Adaptive recurrent-convolutional hybrid networks for long-term action recognition. Neurocomputing 178, 87–102 (2016)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of IEEE Conference Computer Vision Pattern Recognition, pp. 886–893 (2005)
Schuldt, I.L.C., Caputo, B.: Recognizing human actions: a local svm approach. In: Proceedings of 17th International Conference Pattern Recognition, pp. 32–36 (2004)
Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28, 976–990 (2010)
Wang, H., et al.: Evaluation of local spatio-temporal features for action recognition. In: Proceedings of IEEE British Machine Vision Conference, pp. 124.1–124.11 (2009)
Kaaniche, M., Bremond, F.: Gesture recognition by learning local motion signatures. In: Proceedings of IEEE Conference Computer Vision Pattern Recognition, pp. 2745–2752 (2010)
Wu, X., et al.: Action recognition using context and appearance distribution features. In: Proceedings of IEEE Conference Computer Vision Pattern Recognition, pp. 489–496 (2011)
Sadanand, S., Corso, J.J.: Action bank: a high-level representation of activity in video. In: Proceedings of IEEE Conference Computer Vision Pattern Recogniton, pp. 1234–1241 (2013)
Derpanis, K., et al.: Action spotting and recognition based on a spatiotemporal orientation analysis. IEEE Trans. Pattern. Anal. Mach. Intell. 35, 527–540 (2013)
Eweiwi, M.C.A., Bauckhage, C.: Action recognition in still images by learning spatial interest regions from videos. Pattern. Recognit. Lett. 51, 8–15 (2014)
Jiang, Z.L.Z., Davvis, L.: A unified tree-based framework for joint action localization, recognition and segmentation. Comput. Vis. Image Underst 117, 1345–1355 (2013)
Adeli-Mosabbeb, E., Fathy, M.: Non-negative matrix completion for action detection. Image Vis. Comput. 39, 38–51 (2015)
Sheng, W.Y.B., Sun, C.: Action recognition using direction-dependent feature pairs and non-negative low rank sparse model. Neuro 158, 73–80 (2015)
Kuehne, H., et al.: Hmdb: a large video database for human motion recognition. In: Proceedings of IEEE International Conference Computer Vision, pp. 2556–2563 (2011)
Chang, C., Lin, C.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 1–27 (2011)
Shih, Y., et al.: Style transfer for headshot portraits. ACM Trans. Graph. 33 (2014)
Khan, R., et al.: Spatial histograms of soft pairwise similar patches to improve the bag-of-visual-words model. Comput. Vis. Image. Underst. 132, 102–112 (2015)
Su, F.D.S., Agrawala, M.: De-emphasis of distracting image regions using texture power maps. In: Proceedings of Symposium Applied Perception Graphics Visual, pp. 119–124 (2005)
Soomro, K., Zamir, A.R., Shah, M.: Ucf101: a dataset of 101 human actions classes from videos in the wild. Comput. Sci. (2016)
Vedaldi, A., Fulkerson, B.: Vlfeat: an open and portable library of computer vision algorithms. VLFeat.org (Online). http://www.vlfeat.org/ (2008)
Li, F., et al.: Libsvm-parallel-chi2 library, version 1.0. version. Dept. Mathe. Sci., Lund Uni. Lund, Sweden (Online). http://www.maths.lth.se/matematiklth/personal/sminchis/code/libsvm-chi2.html/ (2012)
Iosifidis, A.T.A., Pitas, I.: Discriminant bag of words based representation for human action recognition. Pattern Recognit. Let. 49, 185–192 (2014)
Karpathy, A., et al.: Large-scale video classification with convolutional neural networks. In: CVPR, pp. 1725–1732 (2014)
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings of IEEE International Conference Computer Vision, pp. 3551–3558 (2013)
Acknowledgements
This research is supported by the National Natural Science Foundation of China (Grant No.: 661273339). The author also would like to thank Berthold K. P. Horn for his good ideas during author’s visit study at MIT CSAIL.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, L., Li, R. & Fang, Y. Power difference template for action recognition. Machine Vision and Applications 28, 463–473 (2017). https://doi.org/10.1007/s00138-017-0848-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00138-017-0848-0