Skip to main content
Log in

Action recognition using 3D DAISY descriptor

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

In this paper we propose a novel spatial-temporal descriptor for action recognition. We extend a recent image local descriptor, DAISY, to three dimensions to deal with the information in the additional temporal domain in videos. The new 3D DAISY descriptor is both functionally discriminative and computationally efficient. We use the bag-of-words framework and non-linear SVM for classification. The experiments on public action datasets, KTH, WEIZMANN, YouTube, and UT-Interaction, demonstrate the promising results of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: a review. ACM Comput. Surv. (CSUR) 43(3) (2011)

  2. Ali, S. Basharat, A., Shah, M.: Chaotic invariants for human action recognition. In: ICCV (2007)

  3. Bay, H., Ess, A., Tuytelaars, T., Gool, L.V.: SURF: speeded up robust features. CVIU 110(3), 346–359 (2008)

    Google Scholar 

  4. Bregonzio, M., Gong, S., Xiang, T.: Recognising action as clouds of space-time interest points, CVPR (2009)

  5. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001)

  6. Chen, C.-C., Aggarwal, K.K.: Recognizing human action from a far field of view. IEEE workshop on motion and video computing (WMVC) (2009)

  7. Deng, C., Cao, X., Liu, H., Chen. J.: A global spatio-temporal representation for action recognition. In: ICPR, Istanbul, pp. 1816–1819 (2010)

  8. Dollár, P., Rabaud, V., Cottrell, G., Belongie S.: Behavior recognition via sparse spatio-temporal features. In: VS-PETS, pp. 65–72 (2005)

  9. Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: ICCV, Nice, pp. 726–733 (2003)

  10. Fathi, A., Mori, G.: Action recognition by learning mid-level motion features. In: CVPR, Anchorage, (2008)

  11. Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. PAMI 29(12), 2247–2253 (2007)

    Article  Google Scholar 

  12. Jhuang, H., Serre, T., Wolf, L., Poggio, T.: A biologically inspired system for action recognition. IEEE international conference on computer vision (ICCV) (2007)

  13. Ke, Y., Sukthankar, R., Hebert, M.: Efficient visual event detection using volumetric features. In: ICCV, pp. 166–173 (2005)

  14. Kläser, A., Marszałek, M., Schmid. C.: A spatio-temporal descriptor based on 3D-gradients. In: BMVC, pp. 995–1004 (2008)

  15. Kovashka, A., Grauman, K.: Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: CVPR, San Francisco (2010)

  16. Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR, Anchorage (2008)

  17. Laptev, I.: On space-time interest points. IJCV 64(2), 107–123 (2005)

    Article  MathSciNet  Google Scholar 

  18. Lin, Z., Jiang, Z., Davis, L.S.: Recognizing actions by shape-motion prototype trees. In: ICCV, Kyoto (2009)

  19. Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos ”in the wild”. In: CVPR, Miami, pp. 1996–2003 (2009)

  20. Liu, J., Shah, M.: Learning human actions via information maximization. In: CVPR, Anchorage (2008)

  21. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)

    Article  Google Scholar 

  22. Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: CVPR, Miami (2009)

  23. Messing, R., Pal, C., Kautz, H.: Activity recognition using the velocity histories of tracked keypoints. In: ICCV, Kyoto (2009)

  24. Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. PAMI 27(10), 1615–1630 (2005)

    Article  Google Scholar 

  25. Niebles, J.C., Wang, H., Li, F.-F.: Unsupervised learning of human action categories using spatial-temporal words. IJCV 79(3), 299–318 (2008)

    Article  Google Scholar 

  26. Nowozin, S., Bakir, G., Tsuda K.: Discriminative subsequence mining for action classification. In: ICCV, pp. 1919–1923 (2007)

  27. Rodriguez, M.D., Ahmed, J., Shah, M.: Action MACH a spatio-temporal maximum average correlation height filter for action recognition. In: CVPR, Anchorage (2008)

  28. Ryoo, M.S., Aggarwal, J.K.: Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. In: ICCV, Kyoto, pp. 1593–1600 (2009)

  29. Ryoo, M.S., Aggarwal, J.K.: UT-interaction dataset, ICPR contest on semantic description of human activities (SDHA) (2010)

  30. Schüldt, C., Laptev, I. Caputo, B.: Recognizing human actions: a local SVM approach. In: ICPR, pp. 32–36 (2004)

  31. Scovanner, P., Ali, S., Shah, M.: A 3-dimensional SIFT descriptor and its application to action recognition. In: ACM multimedia, pp. 357–360 (2007)

  32. Sun, J., Wu, X., Yan, S., Cheong, L.-F., Chua, T.-S., Li, J.: Hierarchical spatio-temporal context modeling for action recognition. In: CVPR, pp. 2004–2011 (2009)

  33. Tola, E., Lepetit, V., Fua, P.: A fast local descriptor for dense matching. In: CVPR, Anchorage (2008)

  34. Waltisberg, D., Yao, A., Gall, J., Gool, L.V.: Variations of a Hough-voting action recognition system. Recognizing Patterns in Signals, Speech, Images and Videos, LNCS, vol. 6388, (2010)

  35. Wang, H., Ullah, M.M., Kläser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: BMVC, London (2009)

  36. Weinland, D., Ronfard, R., Boyer, E.: Free viewpoint action recognition using motion history volumes. CVIU 104(2–3), 249–257 (2006)

    Google Scholar 

  37. Willems, G., Tuytelaars, T., Gool, L.V.: An efficient dense and scale-invariant spatio-temporal interest point detector. ECCV 2, 650–663 (2008)

    Google Scholar 

  38. Winder, S., Hua, G., Brown, M.: Picking the best daisy. In: CVPR, pp. 178–185 (2009)

  39. Yilmaz, A. Shah, M.: Actions sketch: a novel action representation. In: CVPR, pp. 984–989 (2005)

  40. Zhang, Z., Hu, Y., Chan, S., Chia, L.-T.: Motion context: a new representation for human action recognition. In: ECCV, pp. 817–829 (2008)

  41. Zhang, J., Marszałek, M., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: a comprehensive study. IJCV 73(2), 213–238 (2007)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by National Natural Science Foundation of China (61332012), National Basic Research Program of China (2013CB329305), 100 Talents Programme of The Chinese Academy of Sciences, and Strategic Priority Research Program of the Chinese Academy of Sciences (\(XDA06030601\)).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaochun Cao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cao, X., Zhang, H., Deng, C. et al. Action recognition using 3D DAISY descriptor. Machine Vision and Applications 25, 159–171 (2014). https://doi.org/10.1007/s00138-013-0545-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00138-013-0545-6

Keywords

Navigation