Skip to main content
Log in

Action recognition using edge trajectories and motion acceleration descriptor

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

This paper presents a method for action recognition based on edge trajectories. First, to exploit long-term motion information for action representation more effectively, we propose to track edge points across video frames to extract spatiotemporal edge trajectories and use the ones derived from the edge points located on the boundaries of action-related area to describe actions. Second, besides the existing shape, histogram of oriented gradients, histogram of optical flow and motion boundary histogram, a new trajectory descriptor named histogram of motion acceleration is introduced, which is computed using the temporal derivative of the optical flow in the spatiotemporal neighborhood centered along a trajectory and describes the temporal relative motion of actions. Finally, using Fisher vector to encode trajectory descriptors and MKL-based multi-class SVM to predict action labels, we evaluate the proposed approach on seven benchmark datasets, namely KTH, ADL, UT-Interaction, UCF sports, YouTube, HMDB51 and UCF101. The experimental results demonstrate the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28(6), 976–990 (2010)

  2. Weinland, D., Ronfard, R., Boyer, E.: A survey of vision-based methods for action representation, segmentation and recognition. Comput. Vis. Image Underst. 115(2), 224–241 (2011)

  3. Aggarwal, J., Ryoo, M.: Human activity analysis: a review. ACM Comput. Surv. 43(3), 1–43 (2011)

  4. Chaquet, J.M., Carmona, E.J., Caballero, A.F.: A survey of video datasets for human action and activity recognition. Comput. Vis. Image Underst. 117, 633–659 (2013)

    Article  Google Scholar 

  5. Niebles, J., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vis. 79(3), 299–318 (2008)

  6. Niebles, J., Fei-Fei, L.: A hierarchical model of shape and appearance for human action classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2007)

  7. Zhang, Y., Liu, X., Chang, M.-C., Ge, W., Chen, T.: Spatio-temporal phrases for activity recognition. In: Proceedings of European Conference on Computer Vision (ECCV), pp. 707–721 (2012)

  8. Gaur, U., Zhu, Y., Song, B., Roy-Chowdhury, A.: A “string of feature graphs” model for recognition of complex activities in natural videos. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 2595–2602 (2011)

  9. Bregonzio, M., Xiang, T., Gong, S.: Fusing appearance and distribution information of interest points for action recognition. Pattern Recognit. 45(3), 1220–1234 (2012)

  10. Li, N., Cheng, X., Zhang, S., Wu, Z.: Realistic human action recognition by Fast HOG3D and self-organization feature map. Mach. Vis. Appl. 25, 1793–1812 (2014)

    Article  Google Scholar 

  11. Sun, J., Wu, X., Yan, S., Cheong, L.-F., Chua, T.-S., Li, J.: Hierarchical spatio-temporal context modeling for action recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2004–2011 (2009)

  12. Messing, R., Pal, C., Kautz, H.: Activity recognition using the velocity histories of tracked keypoints. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 104–111 (2009)

  13. Wu, S., Oreifej, O., Shah, M.: Action recognition in videos acquired by a moving camera using motion decomposition of Lagrangian particle trajectories. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 1419–1426 (2011)

  14. Wang, H., Kläser, A., Schmid, C., Liu, C.-L.: Action recognition by dense trajectories. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3169–3176 (2011)

  15. Yuan, F., Xia, G.-S., Sahbi, H., Prinet, V.: Mid-level features and spatio-temporal context for activity recognition. Pattern Recognit. 45(12), 4182–4191 (2012)

  16. Wang, H., Kläser, A., Schmid, C., Liu, C.-L.: Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vis. 103(1), 60–79 (2013)

  17. Raptis, M., Kokkinos, I., Soatto, S.: Discovering discriminative action parts from mid-level video representations. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1242–1249 (2012)

  18. Sun, J., Mu, Y., Yan, S., Cheong, L.-F.: Activity recognition using dense long-duration trajectories. In: Proceedings of International Conference on Multimedia and Expo (ICME), pp. 322–327 (2010)

  19. Bregonzio, M., Li, J., Gong, S., Xiang, T.: Discriminative topics modelling for action feature selection and recognition. In: Proceedings of British Machine Vision Conference (BMVC), pp. 1–11 (2010)

  20. Yuan, F., Prinet, V., Yuan, J.: Middle-level representation for human activities recognition: the role of spatio-temporal relationships. Proc. Eur. Conf. Comput. Vis. Workshop Hum. Motion (ECCVW) 6553, 168–180 (2010)

    Google Scholar 

  21. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

  22. Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64(2–3), 107–123 (2005)

  23. Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Proceedings of International Joint Conference on Artificial Intelligence, pp. 674–679 (1981)

  24. Sundaram, N., Brox, T., Keutzer, K.: Dense point trajectories by GPU-accelerated large displacement optical flow. Proc. Eur. Conf. Comput. Vis. (ECCV) 6311, 438–451 (2010)

    Google Scholar 

  25. Wang, H., Ullah, M.M., Klaser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: Proceedings of British Machine Vision Conference (BMVC), pp. 1–11 (2009)

  26. Vig, E., Dorr, M., Cox, D.: Space-variant descriptor sampling for action recognition based on saliency and eye movements. In: Proceedings of European Conference on Computer Vision (ECCV), pp. 84–97 (2012)

  27. Somasundaram, G., Cherian, A., Morellas, V., Papanikolopoulos, N.: Action recognition using global spatio-temporal features derived from sparse representations. Comput. Vis. Image Underst. 123, 1–13 (2014)

    Article  Google Scholar 

  28. Peng, X., Qiao, Y., Peng, Q.: Exploring motion boundary based sampling and spatial-temporal context descriptors for action recognition. In: Proceedings of British Machine Vision Conference (BMVC), pp. 59.1–59.11 (2013)

  29. Peng, X., Qiao, Y., Peng, Q.: Motion boundary based sampling and 3D co-occurrence descriptors for action recognition. Image Vis. Comput. 32(9), 616–628 (2014)

    Article  Google Scholar 

  30. Murthy, O.R., Goecke, R.: Ordered trajectories for large scale human action recognition. In: Proceedings of IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 412–419 (2013)

  31. Yi, Y., Lin, Y.: Human action recognition with salient trajectories. Signal Process. 93(11), 2932–2941 (2013)

  32. Wang, L., Wang, Y., Jiang, T., Zhao, D., Gao, W.: Learning discriminative features for fast frame-based action recognition. Pattern Recognit. 46, 1832–1840 (2013)

    Article  Google Scholar 

  33. Bai, S., Matsumoto, T., Takeuchi, Y., Kudo, H., Ohnishi, N.: Informative patches sampling for image classification by utilizing bottom-up and top-down information. Mach. Vis. Appl. 24(5), 959–970(2013)

  34. Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2008)

  35. Jain, M., Jégou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2555–2562 (2013)

  36. Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2013)

  37. Zach, C., Pock, T., Bischof, H.: A duality based approach for realtime TV-L1 optical flow. Pattern Recognit. 4713, 214–223 (2007)

    Article  Google Scholar 

  38. Wang, X., Wang, L., Qiao, Y.: A comparative study of encoding, pooling and normalization methods for action recognition. In: Proceedings of Asian Conference on Computer Vision (ACCV), pp. 572–585 (2012)

  39. Peng, X., Wang, L., Wang, X., Qiao, Y.: Bag of visual words and fusion methods for action recognition: comprehensive study and good practice. arXiv:1405.4506 (2014)

  40. Ballan, L., Bertini, M., Bimbo, A., Seidenari, L., Serra, G.: Effective codebooks for human action representation and classification in unconstrained videos. IEEE Trans. Multimed. 14(4), 1234–1245 (2012)

    Article  Google Scholar 

  41. Tian, Y., Ruan, Q., An, G., Xu, W.: Context and locality constrained linear coding for human action recognition. Neurocomputing 167, 359–370 (2015)

    Article  Google Scholar 

  42. Zhou, W., Wang, C., Xiao, B., Zhang, Z.: Human action recognition using weighted pooling. IET Comput. Vis. 8(6), 579–587 (2014)

    Article  Google Scholar 

  43. Sánchez, J., Perronnin, F., Mensink, T., Verbeek, J.: Image classification with the fisher vector: theory and practice. Int. J. Comput. Vis. 105(3), 222–245 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  44. Rakotomamonjy, A., Bach, F.R., Canu, S., Grandvalet, Y.: Simplemkl. J. Mach. Learn. Res. 9, 2491–2521 (2008)

    MathSciNet  MATH  Google Scholar 

  45. Gönen, M., Alpaydın, E.: Multiple kernel learning algorithms. J. Mach. Learn. Res. 12, 2211–2268 (2011)

    MathSciNet  MATH  Google Scholar 

  46. Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. Proc. Int. Conf. Pattern Recognit. (ICPR) 3, 32–36 (2004)

    Google Scholar 

  47. Raptis, M., Soatto, S.: Tracklet descriptors for action modeling and video analysis. Proc. Eur. Conf. Comput. Vis. (ECCV) 6311, 577–590 (2010)

    Google Scholar 

  48. Ryoo, M., Chen, C.-C., Aggarwal, J., Roy-Chowdhury, A.: An overview of contest on semantic description of human activities. Proc. Int. Conf. Pattern Recognit. (ICPR) 6388, 270–285 (2010)

    Google Scholar 

  49. Rodriguez, M., Ahmed, J., Shah, M.: Action MACH: a spatiotemporal maximum average correlation height filter for action recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2008)

  50. Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos in the wild. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1996–2003 (2009)

  51. Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 2556–2563 (2011)

  52. Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. In: Technical Report CRCV-TR-12-01, UCF Center for Research in Computer Vision (2012)

  53. Jiang, Y.-G., Liu, J., Roshan Zamir, A., Laptev, I., Piccardi, M., Shah, M., Sukthankar R.: THUMOS challenge: action recognition with a large number of classes (2013)

  54. Yu, J., Jeon, M., Pedrycz, W.: Weighted feature trajectories and concatenated bag-of-features for action recognition. Neurocomputing 131, 200–207 (2014)

    Article  Google Scholar 

  55. Wu, J., Zhang, Y., Lin, W.: Towards good practices for action video encoding, in: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2577–2584 (2014)

  56. Cai, Z., Wang, L., Peng, X., Qiao, Y.: Multi-view super vector for action recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 596–603 (2014)

  57. Cho, J., Lee, M., Chang, H.J., Oh, S.: Robust action recognition using local motion and group sparsity. Pattern Recognit. 47(5), 1813–1825 (2014)

    Article  Google Scholar 

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China (Grant No. 61572395) and the Specialized Research Fund for the Doctoral Program of Higher Education (Grant No. 20110201110012).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chun Qi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Qi, C. Action recognition using edge trajectories and motion acceleration descriptor. Machine Vision and Applications 27, 861–875 (2016). https://doi.org/10.1007/s00138-016-0746-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00138-016-0746-x

Keywords

Navigation