Motion keypoint trajectory and covariance descriptor for human action recognition

Yi, Yun; Wang, Hanli

doi:10.1007/s00371-016-1345-6

Motion keypoint trajectory and covariance descriptor for human action recognition

Original Article
Published: 09 January 2017

Volume 34, pages 391–403, (2018)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Yun Yi^1,2,3 &
Hanli Wang^1,2

1063 Accesses
23 Citations
Explore all metrics

Abstract

Human action recognition from videos is a challenging task in computer vision. In recent years, histogram-based descriptors that are calculated along dense trajectories have shown promising results for human action recognition, but they usually ignore motion information of the tracking points, and the relationship between different motion variables is not well utilized. To address this issue, we propose a motion keypoint trajectory (MKT) approach and a trajectory-based covariance (TBC) descriptor, which is calculated along the motion keypoint trajectories. The proposed MKT approach tracks motion keypoints at multiple spatial scales and employs an optical flow rectification algorithm to reduce the influence of camera motions and thus achieves better performance than the improved dense trajectory (IDT) approach well known in the literature. In particular, MKT is faster than IDT, because MKT does not need to use human detection and extracts fewer trajectories than IDT. Furthermore, the TBC descriptor outperforms the classical histogram-based descriptors, such as the Histogram of Oriented Gradient, Histogram of Optical Flow and Motion Boundary Histogram. Experimental results on three challenging datasets (i.e., Olympic Sports, HMDB51 and UCF50) demonstrate that our approach is able to achieve better recognition performances than a number of state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Action recognition using edge trajectories and motion acceleration descriptor

Article 30 January 2016

Motion Boundary Trajectory for Human Action Recognition

Human Action Recognition Using Trajectory-Based Spatiotemporal Descriptors

Notes

References

Arsigny, V., Fillard, P., Pennec, X., Ayache, N.: Log-Euclidean metrics for fast and simple calculus on diffusion tensors. Magn. Reson. Med. 56(2), 411–421 (2006)
Article Google Scholar
Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: European Conference on Computer Vision, pp. 404–417 (2006)
Bilinski, P., Bremond, F.: Video covariance matrix logarithm for human action recognition in videos. In: International Joint Conference on Artificial Intelligence, pp. 2140–2147 (2015)
Borges, P.V.K., Conci, N., Cavallaro, A.: Video-based human behavior understanding: a survey. IEEE Trans. Circuits Syst. Video Technol. 23(11), 1993–2008 (2013)
Article Google Scholar
Brendel, W., Todorovic, S.: Learning spatiotemporal graphs of human activities. In: International Conference on Computer Vision, pp. 778–785 (2011)
Cheng, G., Huang, Y., Wan, Y., Buckles, B.P.: Exploring temporal structure of trajectory components for action recognition. Int. J. Intell. Syst. 30(2), 99–119 (2015)
Article Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 886–893 (2005)
Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: European Conference on Computer Vision, pp. 428–441 (2006)
Dawn, D.D., Shaikh, S.H.: A comprehensive survey of human action recognition with spatio-temporal interest point (STIP) detector. Vis. Comput. 32(3), 289–306 (2016)
Article Google Scholar
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
MATH Google Scholar
Farnebäck, G.: Two-frame motion estimation based on polynomial expansion. In: Image Analysis, pp. 363–370 (2003)
Förstner, W., Moonen, B.: A metric for covariance matrices. In: Geodesy—The Challenge of the 3rd Millennium, pp. 299–309 (2003)
Guo, K., Ishwar, P., Konrad, J.: Action recognition from video using feature covariance matrices. IEEE Trans. Image Process. 22(6), 2479–2494 (2013)
Article MathSciNet MATH Google Scholar
Hoai, M., Zisserman, A.: Improving human action recognition using score distribution and ranking. In: Asian Conference on Computer Vision, pp. 3–20 (2014)
Jain, M., Jegou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2555–2562 (2013)
Junejo, I.N., Junejo, K.N., Aghbari, Z.A.: Silhouette-based human action recognition using sax-shapes. Vis. Comput. 30(3), 259–269 (2014)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Annual Conference on Neural Information Processing Systems, pp. 1097–1105 (2012)
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: International Conference on Computer Vision, pp. 2556–2563 (2011)
Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64(2), 107–123 (2005)
Article Google Scholar
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2169–2178 (2006)
Li, Y., Ye, J., Wang, T., Huang, S.: Augmenting bag-of-words: a robust contextual representation of spatiotemporal interest points for action recognition. Vis. Comput. 31(10), 1383–1394 (2015)
Article Google Scholar
Ma, J., Zhao, J., Tian, J., Yuille, A.L., Tu, Z.: Robust point matching via vector field consensus. IEEE Trans. Image Process. 23(4), 1706–1721 (2014)
Article MathSciNet MATH Google Scholar
Messing, R., Pal, C., Kautz, H.: Activity recognition using the velocity histories of tracked keypoints. In: International Conference on Computer Vision, pp. 104–111 (2009)
Niebles, J.C., Chen, C.W., Fei-Fei, L.: Modeling temporal structure of decomposable motion segments for activity classification. In: European Conference on Computer Vision, pp. 392–405 (2010)
Oneata, D., Verbeek, J., Schmid, C.: Action and event recognition with fisher vectors on a compact feature set. In: International Conference on Computer Vision, pp. 1817–1824 (2013)
Pang, Y., Yuan, Y., Li, X.: Gabor-based region covariance matrices for face recognition. IEEE Trans. Circuits Syst. Video Technol. 18(7), 989–993 (2008)
Article Google Scholar
Reddy, K.K., Shah, M.: Recognizing 50 human action categories of web videos. Mach. Vis. Appl. 24(5), 971–981 (2013)
Article Google Scholar
Sanchez, J., Perronnin, F., Mensink, T., Verbeek, J.: Image classification with the fisher vector: theory and practice. Int. J. Comput. Vis. 105(3), 222–245 (2013)
Article MathSciNet MATH Google Scholar
Shi, J., Tomasi, C.: Good features to track. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 593–600 (1994)
Sun, J., Mu, Y., Yan, S., Cheong, L.F.: Activity recognition using dense long-duration trajectories. In: IEEE International Conference on Multimedia and Expo, pp. 322–327 (2010)
Truong, A., Boujut, H., Zaharia, T.: Laban descriptors for gesture recognition and emotional analysis. Vis. Comput. 32(1), 83–98 (2016)
Article Google Scholar
Tuzel, O., Porikli, F., Meer, P.: Region covariance: a fast descriptor for detection and classification. In: European Conference on Computer Vision, pp. 589–600 (2006)
Vishwakarma, S., Agrawal, A.: A survey on activity recognition and behavior understanding in video surveillance. Vis. Comput. 29(10), 983–1009 (2013)
Article Google Scholar
Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vis. 103(1), 60–79 (2013)
Article MathSciNet Google Scholar
Wang, H., Oneata, D., Verbeek, J., Schmid, C.: A robust and efficient video representation for action recognition. Int. J. Comput. Vis. 119(3), 219–238 (2016)
Article MathSciNet Google Scholar
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: International Conference on Computer Vision, pp. 3551–3558 (2013)
Wang, H., Yi, Y., Wu, J.: Human action recognition with trajectory based covariance descriptor in unconstrained videos. In: ACM international conference on Multimedia, pp. 1175–1178 (2015)
Willems, G., Tuytelaars, T., Gool, L.V.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: European Conference on Computer Vision, pp. 650–663 (2008)
Wu, J., Hu, D., Chen, F.: Action recognition by hidden temporal models. Vis. Comput. 30(12), 1395–1404 (2014)
Article Google Scholar
Zhou, L., Lu, Z., Leung, H., Shang, L.: Spatial temporal pyramid matching using temporal sparse representation for human motion retrieval. Vis. Comput. 30(6), 845–854 (2014)
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grants 61472281 and 61622115, the Program for Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning (No. GZ2015005), and the NSF of Jiangxi Province under Grant 20161BAB202069.

Author information

Authors and Affiliations

Department of Computer Science and Technology, Tongji University, Shanghai, 201804, People’s Republic of China
Yun Yi & Hanli Wang
Key Laboratory of Embedded System and Service Computing, Ministry of Education, Tongji University, Shanghai, 200092, People’s Republic of China
Yun Yi & Hanli Wang
Department of Mathematics and Computer Science, Gannan Normal University, Ganzhou, 341000, People’s Republic of China
Yun Yi

Authors

Yun Yi
View author publications
You can also search for this author in PubMed Google Scholar
Hanli Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hanli Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yi, Y., Wang, H. Motion keypoint trajectory and covariance descriptor for human action recognition. Vis Comput 34, 391–403 (2018). https://doi.org/10.1007/s00371-016-1345-6

Download citation

Published: 09 January 2017
Issue Date: March 2018
DOI: https://doi.org/10.1007/s00371-016-1345-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Motion keypoint trajectory and covariance descriptor for human action recognition

Abstract

Access this article

Similar content being viewed by others

Action recognition using edge trajectories and motion acceleration descriptor

Motion Boundary Trajectory for Human Action Recognition

Human Action Recognition Using Trajectory-Based Spatiotemporal Descriptors

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Motion keypoint trajectory and covariance descriptor for human action recognition

Abstract

Access this article

Similar content being viewed by others

Action recognition using edge trajectories and motion acceleration descriptor

Motion Boundary Trajectory for Human Action Recognition

Human Action Recognition Using Trajectory-Based Spatiotemporal Descriptors

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation