Skip to main content
Log in

Automated video analysis for action recognition using descriptors derived from optical acceleration

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Velocity descriptors based on optical flow are the core of most of the existing video analysis techniques. We hypothesize that acceleration is crucial as velocity to represent videos and consequently develop a method to compute optical acceleration. To effectively encode the motion information, we develop two acceleration descriptors—histogram of optical acceleration and histogram of spatial gradient of acceleration (HSGA). To assess the significance of optical acceleration for motion description, we applied it for human action recognition. Action recognition system presented in this paper uses our acceleration descriptor—HSGA, in conjunction with the velocity descriptor—motion boundary histogram. Experiments performed on standard action recognition datasets reveal that the use of acceleration in combination with velocity results in a superior motion descriptor.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Amraee, S., Vafaei, A., Jamshidi, K., Adibi, P.: Abnormal event detection in crowded scenes using one-class SVM. Signal Image Video Process. 12(6), 1115–1123 (2018)

    Article  Google Scholar 

  2. Lu, X., Yao, H., Sun, X., Zhang, Y.: Locally aggregated histogram-based descriptors. Signal Image Video Process. 12(2), 323–330 (2018)

    Article  Google Scholar 

  3. Li, J., Nikolov, S.G., Benton, C.P., Scott-Samuel, N.E.: Adaptive summarisation of surveillance video sequences. In: Proceedings of the IEEE Conference on Advanced Video and Signal Based Surveillance, pp. 546–551 (2007)

  4. Van Luong, H., Raket, L.L., Huang, X., Forchhammer, S.: Side information and noise learning for distributed video coding using optical flow and clustering. IEEE Trans. Image Process. 21(12), 4782–4796 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  5. Prince, J.L., McVeigh, E.R.: Motion estimation from tagged MR image sequences. IEEE Trans. Medical Imaging 11(2), 238–249 (1992)

    Article  Google Scholar 

  6. Yamato, J., Ohya, J., Ishii, K.: Recognizing human action in time-sequential images using hidden markov model. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 379–385 (1992)

  7. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Proceedings of Advances in Neural Information Processing Systems, pp. 568–576 (2014)

  8. Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)

  9. Dedeoğlu, Y., Töreyin, B.U., Güdükbay, U., Çetin, A.E.: Silhouette-based method for object classification and human action recognition in video. In: ECCV Workshop on Computer Vision in Human-Computer Interaction, pp. 64–77 (2006)

  10. Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64(2–3), 107–123 (2005)

    Article  Google Scholar 

  11. Scovanner, P., Ali, S., Shah, M.: A 3-dimensional SIFT descriptor and its application to action recognition. In: Proceedings of International Conference on Multimedia, pp. 357–360 (2007)

  12. Klaser, A., Marszałek, M., Schmid, C., et al.: A spatio-temporal descriptor based on 3D-gradients. In: Proceedings of British Machine Vision Conference (2008)

  13. Willems, G., Tuytelaars, T., Gool, L.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: Proceedings of European Conference on Computer Vision, pp. 650–663 (2008)

  14. Matikainen, P., Hebert, M., Sukthankar, R.: Trajectons: Action recognition through the motion analysis of tracked features. In: Proceedings of IEEE International Conference on Computer Vision, pp. 514–521 (2009)

  15. Messing, R., Pal, C., Kautz, H.: Activity recognition using the velocity histories of tracked keypoints. In: Proceedings of IEEE International Conference on Computer Vision, pp. 104–111 (2009)

  16. Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vis. 103(1), 60–79 (2013)

    Article  MathSciNet  Google Scholar 

  17. Islam, S., Qasim, T., Yasir, M., Bhatti, N., Mahmood, H., Zia, M.: Single- and two-person action recognition based on silhouette shape and optical point descriptors. Signal Image Video Process. 12(5), 853–860 (2018)

    Article  Google Scholar 

  18. Jiang, Y.G., Dai, Q., Xue, X., Liu, W., Ngo, C.W.: Trajectory-based modeling of human actions with motion reference points. In: Proceedings of European Conference on Computer Vision, pp. 425–438 (2012)

  19. Vig, E., Dorr, M., Cox, D.: Space-variant descriptor sampling for action recognition based on saliency and eye movements. In: Proceedings of European Conference on Computer Vision, pp. 84–97 (2012)

  20. Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings of IEEE International Conference on Computer Vision, pp. 3551–3558 (2013)

  21. Lan, Z., Lin, M., Li, X., Hauptmann, A.G., Raj, B.: Beyond gaussian pyramid: Multi-skip feature stacking for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 204–212 (2015)

  22. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015)

  23. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014)

  24. Liu, J., Wang, G., Hu, P., Duan, L.Y., Kot, A.C.: Global context-aware attention LSTM networks for 3d action recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, vol. 7, p. 43 (2017)

  25. Liu, J., Shahroudy, A., Xu, D., Kot, A.C., Wang, G.: Skeleton-based action recognition using spatio-temporal LSTM network with trust gates. IEEE Trans. Pattern Anal Mach Intell 40(12), 3007–3021 (2018)

    Article  Google Scholar 

  26. Liu, J., Shahroudy, A., Wang, G., Duan, L.Y., Kot, A.C.: SSNet: Scale selection network for online 3d action prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8349–8358 (2018)

  27. Edison, A., Jiji, C.: Optical acceleration for motion description in videos. In: Proceedings of the CVPR Workshops, pp. 39–47 (2017)

  28. Nallaivarothayan, H., Fookes, C., Denman, S., Sridharan, S.: An MRF based abnormal event detection approach using motion and appearance features. In: Proceedings of IEEE International Conference on Advanced Video and Signal Based Surveillance, pp. 343–348 (2014)

  29. Kataoka, H., He, Y., Shirakabe, S., Satoh, Y.: Motion representation with acceleration images. In: Proceedings of the ECCV Workshops, pp. 18–24 (2016)

  30. Farnebäck, G.: Two-frame motion estimation based on polynomial expansion. In: Proceedings of Scandinavian Conference on Image Analysis, vol. 2749 (2003)

  31. Edison, A., Jiji, C.: HSGA: A novel acceleration descriptor for human action recognition. In: Proceedings of the National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics, pp. 1–4 (2015)

  32. Peng, X., Wang, L., Wang, X., Qiao, Y.: Bag of visual words and fusion methods for action recognition: comprehensive study and good practice. Comput Vis Image Underst 150, 109–125 (2016)

    Article  Google Scholar 

  33. Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos in the wild. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1996–2003 (2009)

  34. Reddy, K.K., Shah, M.: Recognizing 50 human action categories of web videos. Mach. Vis. Appl. 24(5), 971–981 (2013)

    Article  Google Scholar 

  35. Soomro, K., Zamir, A.R., Shah, M.: UCF101: A dataset of 101 human actions classes from videos in the wild. CRCV-TR-12-01 (2012)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anitha Edison.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Edison, A., Jiji, C.V. Automated video analysis for action recognition using descriptors derived from optical acceleration. SIViP 13, 915–922 (2019). https://doi.org/10.1007/s11760-019-01428-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-019-01428-1

Keywords

Navigation