Abstract
In this paper we propose a novel spatial-temporal descriptor for action recognition. We extend a recent image local descriptor, DAISY, to three dimensions to deal with the information in the additional temporal domain in videos. The new 3D DAISY descriptor is both functionally discriminative and computationally efficient. We use the bag-of-words framework and non-linear SVM for classification. The experiments on public action datasets, KTH, WEIZMANN, YouTube, and UT-Interaction, demonstrate the promising results of our method.
Similar content being viewed by others
References
Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: a review. ACM Comput. Surv. (CSUR) 43(3) (2011)
Ali, S. Basharat, A., Shah, M.: Chaotic invariants for human action recognition. In: ICCV (2007)
Bay, H., Ess, A., Tuytelaars, T., Gool, L.V.: SURF: speeded up robust features. CVIU 110(3), 346–359 (2008)
Bregonzio, M., Gong, S., Xiang, T.: Recognising action as clouds of space-time interest points, CVPR (2009)
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001)
Chen, C.-C., Aggarwal, K.K.: Recognizing human action from a far field of view. IEEE workshop on motion and video computing (WMVC) (2009)
Deng, C., Cao, X., Liu, H., Chen. J.: A global spatio-temporal representation for action recognition. In: ICPR, Istanbul, pp. 1816–1819 (2010)
Dollár, P., Rabaud, V., Cottrell, G., Belongie S.: Behavior recognition via sparse spatio-temporal features. In: VS-PETS, pp. 65–72 (2005)
Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: ICCV, Nice, pp. 726–733 (2003)
Fathi, A., Mori, G.: Action recognition by learning mid-level motion features. In: CVPR, Anchorage, (2008)
Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. PAMI 29(12), 2247–2253 (2007)
Jhuang, H., Serre, T., Wolf, L., Poggio, T.: A biologically inspired system for action recognition. IEEE international conference on computer vision (ICCV) (2007)
Ke, Y., Sukthankar, R., Hebert, M.: Efficient visual event detection using volumetric features. In: ICCV, pp. 166–173 (2005)
Kläser, A., Marszałek, M., Schmid. C.: A spatio-temporal descriptor based on 3D-gradients. In: BMVC, pp. 995–1004 (2008)
Kovashka, A., Grauman, K.: Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: CVPR, San Francisco (2010)
Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR, Anchorage (2008)
Laptev, I.: On space-time interest points. IJCV 64(2), 107–123 (2005)
Lin, Z., Jiang, Z., Davis, L.S.: Recognizing actions by shape-motion prototype trees. In: ICCV, Kyoto (2009)
Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos ”in the wild”. In: CVPR, Miami, pp. 1996–2003 (2009)
Liu, J., Shah, M.: Learning human actions via information maximization. In: CVPR, Anchorage (2008)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)
Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: CVPR, Miami (2009)
Messing, R., Pal, C., Kautz, H.: Activity recognition using the velocity histories of tracked keypoints. In: ICCV, Kyoto (2009)
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. PAMI 27(10), 1615–1630 (2005)
Niebles, J.C., Wang, H., Li, F.-F.: Unsupervised learning of human action categories using spatial-temporal words. IJCV 79(3), 299–318 (2008)
Nowozin, S., Bakir, G., Tsuda K.: Discriminative subsequence mining for action classification. In: ICCV, pp. 1919–1923 (2007)
Rodriguez, M.D., Ahmed, J., Shah, M.: Action MACH a spatio-temporal maximum average correlation height filter for action recognition. In: CVPR, Anchorage (2008)
Ryoo, M.S., Aggarwal, J.K.: Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. In: ICCV, Kyoto, pp. 1593–1600 (2009)
Ryoo, M.S., Aggarwal, J.K.: UT-interaction dataset, ICPR contest on semantic description of human activities (SDHA) (2010)
Schüldt, C., Laptev, I. Caputo, B.: Recognizing human actions: a local SVM approach. In: ICPR, pp. 32–36 (2004)
Scovanner, P., Ali, S., Shah, M.: A 3-dimensional SIFT descriptor and its application to action recognition. In: ACM multimedia, pp. 357–360 (2007)
Sun, J., Wu, X., Yan, S., Cheong, L.-F., Chua, T.-S., Li, J.: Hierarchical spatio-temporal context modeling for action recognition. In: CVPR, pp. 2004–2011 (2009)
Tola, E., Lepetit, V., Fua, P.: A fast local descriptor for dense matching. In: CVPR, Anchorage (2008)
Waltisberg, D., Yao, A., Gall, J., Gool, L.V.: Variations of a Hough-voting action recognition system. Recognizing Patterns in Signals, Speech, Images and Videos, LNCS, vol. 6388, (2010)
Wang, H., Ullah, M.M., Kläser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: BMVC, London (2009)
Weinland, D., Ronfard, R., Boyer, E.: Free viewpoint action recognition using motion history volumes. CVIU 104(2–3), 249–257 (2006)
Willems, G., Tuytelaars, T., Gool, L.V.: An efficient dense and scale-invariant spatio-temporal interest point detector. ECCV 2, 650–663 (2008)
Winder, S., Hua, G., Brown, M.: Picking the best daisy. In: CVPR, pp. 178–185 (2009)
Yilmaz, A. Shah, M.: Actions sketch: a novel action representation. In: CVPR, pp. 984–989 (2005)
Zhang, Z., Hu, Y., Chan, S., Chia, L.-T.: Motion context: a new representation for human action recognition. In: ECCV, pp. 817–829 (2008)
Zhang, J., Marszałek, M., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: a comprehensive study. IJCV 73(2), 213–238 (2007)
Acknowledgments
This work was supported by National Natural Science Foundation of China (61332012), National Basic Research Program of China (2013CB329305), 100 Talents Programme of The Chinese Academy of Sciences, and Strategic Priority Research Program of the Chinese Academy of Sciences (\(XDA06030601\)).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cao, X., Zhang, H., Deng, C. et al. Action recognition using 3D DAISY descriptor. Machine Vision and Applications 25, 159–171 (2014). https://doi.org/10.1007/s00138-013-0545-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00138-013-0545-6