Abstract
This paper presents a method for action recognition based on edge trajectories. First, to exploit long-term motion information for action representation more effectively, we propose to track edge points across video frames to extract spatiotemporal edge trajectories and use the ones derived from the edge points located on the boundaries of action-related area to describe actions. Second, besides the existing shape, histogram of oriented gradients, histogram of optical flow and motion boundary histogram, a new trajectory descriptor named histogram of motion acceleration is introduced, which is computed using the temporal derivative of the optical flow in the spatiotemporal neighborhood centered along a trajectory and describes the temporal relative motion of actions. Finally, using Fisher vector to encode trajectory descriptors and MKL-based multi-class SVM to predict action labels, we evaluate the proposed approach on seven benchmark datasets, namely KTH, ADL, UT-Interaction, UCF sports, YouTube, HMDB51 and UCF101. The experimental results demonstrate the effectiveness of our method.
Similar content being viewed by others
References
Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28(6), 976–990 (2010)
Weinland, D., Ronfard, R., Boyer, E.: A survey of vision-based methods for action representation, segmentation and recognition. Comput. Vis. Image Underst. 115(2), 224–241 (2011)
Aggarwal, J., Ryoo, M.: Human activity analysis: a review. ACM Comput. Surv. 43(3), 1–43 (2011)
Chaquet, J.M., Carmona, E.J., Caballero, A.F.: A survey of video datasets for human action and activity recognition. Comput. Vis. Image Underst. 117, 633–659 (2013)
Niebles, J., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vis. 79(3), 299–318 (2008)
Niebles, J., Fei-Fei, L.: A hierarchical model of shape and appearance for human action classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2007)
Zhang, Y., Liu, X., Chang, M.-C., Ge, W., Chen, T.: Spatio-temporal phrases for activity recognition. In: Proceedings of European Conference on Computer Vision (ECCV), pp. 707–721 (2012)
Gaur, U., Zhu, Y., Song, B., Roy-Chowdhury, A.: A “string of feature graphs” model for recognition of complex activities in natural videos. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 2595–2602 (2011)
Bregonzio, M., Xiang, T., Gong, S.: Fusing appearance and distribution information of interest points for action recognition. Pattern Recognit. 45(3), 1220–1234 (2012)
Li, N., Cheng, X., Zhang, S., Wu, Z.: Realistic human action recognition by Fast HOG3D and self-organization feature map. Mach. Vis. Appl. 25, 1793–1812 (2014)
Sun, J., Wu, X., Yan, S., Cheong, L.-F., Chua, T.-S., Li, J.: Hierarchical spatio-temporal context modeling for action recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2004–2011 (2009)
Messing, R., Pal, C., Kautz, H.: Activity recognition using the velocity histories of tracked keypoints. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 104–111 (2009)
Wu, S., Oreifej, O., Shah, M.: Action recognition in videos acquired by a moving camera using motion decomposition of Lagrangian particle trajectories. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 1419–1426 (2011)
Wang, H., Kläser, A., Schmid, C., Liu, C.-L.: Action recognition by dense trajectories. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3169–3176 (2011)
Yuan, F., Xia, G.-S., Sahbi, H., Prinet, V.: Mid-level features and spatio-temporal context for activity recognition. Pattern Recognit. 45(12), 4182–4191 (2012)
Wang, H., Kläser, A., Schmid, C., Liu, C.-L.: Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vis. 103(1), 60–79 (2013)
Raptis, M., Kokkinos, I., Soatto, S.: Discovering discriminative action parts from mid-level video representations. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1242–1249 (2012)
Sun, J., Mu, Y., Yan, S., Cheong, L.-F.: Activity recognition using dense long-duration trajectories. In: Proceedings of International Conference on Multimedia and Expo (ICME), pp. 322–327 (2010)
Bregonzio, M., Li, J., Gong, S., Xiang, T.: Discriminative topics modelling for action feature selection and recognition. In: Proceedings of British Machine Vision Conference (BMVC), pp. 1–11 (2010)
Yuan, F., Prinet, V., Yuan, J.: Middle-level representation for human activities recognition: the role of spatio-temporal relationships. Proc. Eur. Conf. Comput. Vis. Workshop Hum. Motion (ECCVW) 6553, 168–180 (2010)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64(2–3), 107–123 (2005)
Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Proceedings of International Joint Conference on Artificial Intelligence, pp. 674–679 (1981)
Sundaram, N., Brox, T., Keutzer, K.: Dense point trajectories by GPU-accelerated large displacement optical flow. Proc. Eur. Conf. Comput. Vis. (ECCV) 6311, 438–451 (2010)
Wang, H., Ullah, M.M., Klaser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: Proceedings of British Machine Vision Conference (BMVC), pp. 1–11 (2009)
Vig, E., Dorr, M., Cox, D.: Space-variant descriptor sampling for action recognition based on saliency and eye movements. In: Proceedings of European Conference on Computer Vision (ECCV), pp. 84–97 (2012)
Somasundaram, G., Cherian, A., Morellas, V., Papanikolopoulos, N.: Action recognition using global spatio-temporal features derived from sparse representations. Comput. Vis. Image Underst. 123, 1–13 (2014)
Peng, X., Qiao, Y., Peng, Q.: Exploring motion boundary based sampling and spatial-temporal context descriptors for action recognition. In: Proceedings of British Machine Vision Conference (BMVC), pp. 59.1–59.11 (2013)
Peng, X., Qiao, Y., Peng, Q.: Motion boundary based sampling and 3D co-occurrence descriptors for action recognition. Image Vis. Comput. 32(9), 616–628 (2014)
Murthy, O.R., Goecke, R.: Ordered trajectories for large scale human action recognition. In: Proceedings of IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 412–419 (2013)
Yi, Y., Lin, Y.: Human action recognition with salient trajectories. Signal Process. 93(11), 2932–2941 (2013)
Wang, L., Wang, Y., Jiang, T., Zhao, D., Gao, W.: Learning discriminative features for fast frame-based action recognition. Pattern Recognit. 46, 1832–1840 (2013)
Bai, S., Matsumoto, T., Takeuchi, Y., Kudo, H., Ohnishi, N.: Informative patches sampling for image classification by utilizing bottom-up and top-down information. Mach. Vis. Appl. 24(5), 959–970(2013)
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2008)
Jain, M., Jégou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2555–2562 (2013)
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2013)
Zach, C., Pock, T., Bischof, H.: A duality based approach for realtime TV-L1 optical flow. Pattern Recognit. 4713, 214–223 (2007)
Wang, X., Wang, L., Qiao, Y.: A comparative study of encoding, pooling and normalization methods for action recognition. In: Proceedings of Asian Conference on Computer Vision (ACCV), pp. 572–585 (2012)
Peng, X., Wang, L., Wang, X., Qiao, Y.: Bag of visual words and fusion methods for action recognition: comprehensive study and good practice. arXiv:1405.4506 (2014)
Ballan, L., Bertini, M., Bimbo, A., Seidenari, L., Serra, G.: Effective codebooks for human action representation and classification in unconstrained videos. IEEE Trans. Multimed. 14(4), 1234–1245 (2012)
Tian, Y., Ruan, Q., An, G., Xu, W.: Context and locality constrained linear coding for human action recognition. Neurocomputing 167, 359–370 (2015)
Zhou, W., Wang, C., Xiao, B., Zhang, Z.: Human action recognition using weighted pooling. IET Comput. Vis. 8(6), 579–587 (2014)
Sánchez, J., Perronnin, F., Mensink, T., Verbeek, J.: Image classification with the fisher vector: theory and practice. Int. J. Comput. Vis. 105(3), 222–245 (2013)
Rakotomamonjy, A., Bach, F.R., Canu, S., Grandvalet, Y.: Simplemkl. J. Mach. Learn. Res. 9, 2491–2521 (2008)
Gönen, M., Alpaydın, E.: Multiple kernel learning algorithms. J. Mach. Learn. Res. 12, 2211–2268 (2011)
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. Proc. Int. Conf. Pattern Recognit. (ICPR) 3, 32–36 (2004)
Raptis, M., Soatto, S.: Tracklet descriptors for action modeling and video analysis. Proc. Eur. Conf. Comput. Vis. (ECCV) 6311, 577–590 (2010)
Ryoo, M., Chen, C.-C., Aggarwal, J., Roy-Chowdhury, A.: An overview of contest on semantic description of human activities. Proc. Int. Conf. Pattern Recognit. (ICPR) 6388, 270–285 (2010)
Rodriguez, M., Ahmed, J., Shah, M.: Action MACH: a spatiotemporal maximum average correlation height filter for action recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2008)
Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos in the wild. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1996–2003 (2009)
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 2556–2563 (2011)
Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. In: Technical Report CRCV-TR-12-01, UCF Center for Research in Computer Vision (2012)
Jiang, Y.-G., Liu, J., Roshan Zamir, A., Laptev, I., Piccardi, M., Shah, M., Sukthankar R.: THUMOS challenge: action recognition with a large number of classes (2013)
Yu, J., Jeon, M., Pedrycz, W.: Weighted feature trajectories and concatenated bag-of-features for action recognition. Neurocomputing 131, 200–207 (2014)
Wu, J., Zhang, Y., Lin, W.: Towards good practices for action video encoding, in: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2577–2584 (2014)
Cai, Z., Wang, L., Peng, X., Qiao, Y.: Multi-view super vector for action recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 596–603 (2014)
Cho, J., Lee, M., Chang, H.J., Oh, S.: Robust action recognition using local motion and group sparsity. Pattern Recognit. 47(5), 1813–1825 (2014)
Acknowledgments
This work is supported by the National Natural Science Foundation of China (Grant No. 61572395) and the Specialized Research Fund for the Doctoral Program of Higher Education (Grant No. 20110201110012).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, X., Qi, C. Action recognition using edge trajectories and motion acceleration descriptor. Machine Vision and Applications 27, 861–875 (2016). https://doi.org/10.1007/s00138-016-0746-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00138-016-0746-x