MCTD: Motion-Coordinate-Time Descriptor for 3D Skeleton-Based Action Recognition

Liang, Qi; Wang, Feng

doi:10.1007/978-3-319-77380-3_55

Qi Liang¹⁹ &
Feng Wang¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10735))

Included in the following conference series:

Pacific Rim Conference on Multimedia

2761 Accesses

Abstract

During the past few years, 3D-skeleton based action recognition has received increasing research attentions. Numerous approaches have been proposed. Most existing approaches extract the pose features in each frame along the video sequence for recognizing different actions. However, the motion information between adjacent poses is missing. In this paper, we propose a new descriptor by employing motion for action recognition. In our approach, the Lie algebra is employed to extract the motion between neighboring poses. The spatial coordinate and the timestamp information are also used to describe the space-temporal distribution of motion in the video sequence. For classification, we modify the SVM kernel to measure the distance between different action instances. Our experiments on three common datasets show that the proposed descriptor outperforms the state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 155.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chaudhry, R., Ofli, F., Kurillo, G., Bajcsy, R., Vidal, R.: Bio-inspired dynamic 3D discriminative skeletal features for human action recognition. In: Conference on Computer Vision and Pattern Recognition Workshops, pp. 471–478 (2013)
Google Scholar
Eweiwi, A., Cheema, M.S., Bauckhage, C., Gall, J.: Efficient pose-based action recognition. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9007, pp. 428–443. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16814-2_28
Chapter Google Scholar
Gavrila, D., Davis, L., et al.: Towards 3-D model-based tracking and recognition of human movement: a multi-view approach. In: International workshop on automatic face and gesture recognition, pp. 272–277. Citeseer (1995)
Google Scholar
Hussein, M.E., Torki, M., Gowayyed, M.A., El-Saban, M.: Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations. In: International Joint Conference on Artificial Intelligence IJCAI, vol. 13, pp. 2466–2472 (2013)
Google Scholar
Johansson, G.: Visual perception of biological motion and a model for its analysis. Percept. psychophys. 14(2), 201–211 (1973)
Article Google Scholar
Koniusz, P., Cherian, A., Porikli, F.: Tensor representations via kernel linearization for action recognition from 3D skeletons. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 37–53. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_3
Chapter Google Scholar
Koniusz, P., Yan, F., Gosselin, P.H., Mikolajczyk, K.: Higher-order occurrence pooling for bags-of-words: visual concept detection. IEEE Trans. Pattern Anal. Mach. Intell. 39(2), 313–326 (2017)
Article Google Scholar
Lan, Z., Hauptmann, A.G.: Beyond spatial pyramid matching: space-time extended descriptor for action recognition. arXiv preprint arXiv:1510.04565 (2015)
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition CVPR, pp. 1–8. IEEE (2008)
Google Scholar
Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3D points. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 9–14. IEEE (2010)
Google Scholar
Lv, F., Nevatia, R.: Recognition and segmentation of 3-D human action using HMM and multi-class AdaBoost. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 359–372. Springer, Heidelberg (2006). https://doi.org/10.1007/11744085_28
Chapter Google Scholar
Nie, S., Ji, Q.: Capturing global and local dynamics for human action recognition. In: International Conference on Pattern Recognition, pp. 1946–1951. IEEE (2014)
Google Scholar
Shao, Z., Li, Y.: A new descriptor for multiple 3D motion trajectories recognition. In: IEEE International Conference on Robotics and Automation ICRA, pp. 4749–4754. IEEE (2013)
Google Scholar
Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3D skeletons as points in a lie group. In: IEEE Conference on Computer Vision and Pattern Recognition (2014)
Google Scholar
Vemulapalli, R., Chellapa, R.: Rolling rotations for recognizing human actions from 3D skeletal data. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4471–4479 (2016)
Google Scholar
Wang, J., Liu, Z., Wu, Y.: Learning Actionlet ensemble for 3D human action recognition. Human Action Recognition with Depth Cameras. SCS, pp. 11–40. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-04561-0_2
Chapter Google Scholar
Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining Actionlet ensemble for action recognition with depth cameras. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1290–1297. IEEE (2012)
Google Scholar
Wang, P., Li, W., Ogunbona, P., Gao, Z., Zhang, H.: Mining mid-level features for action recognition based on effective skeleton representation. In: DlCTA, pp. 1–8. IEEE (2014)
Google Scholar
Wang, P., Li, Z., Hou, Y., Li, W.: Action recognition based on joint trajectory maps using convolutional neural networks. In: ACM on Multimedia Conference, pp. 102–106. ACM (2016)
Google Scholar
Xia, L., Chen, C.C., Aggarwal, J.: View invariant human action recognition using histograms of 3D joints. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 20–27. IEEE (2012)
Google Scholar
Yang, X., Tian, Y.L.: Eigenjoints-based action recognition using naive-bayes-nearest-neighbor. In: Computer vision and pattern recognition workshops (CVPRW), pp. 14–19. IEEE (2012)
Google Scholar

Download references

Acknowledgments

The work described in this paper was supported by the National Natural Science Foundation of China (No. 61103127 and No. 61375016).

Author information

Authors and Affiliations

Shanghai Key Laboratory of Multidimensional Information Processing, Department of Computer Science and Technology, East China Normal University, Shanghai, China
Qi Liang & Feng Wang

Authors

Qi Liang
View author publications
You can also search for this author in PubMed Google Scholar
Feng Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Feng Wang .

Editor information

Editors and Affiliations

University of Electronic Science and Technology of China, Chengdu, China
Bing Zeng
University of Chinese Academy of Sciences, Beijing, China
Qingming Huang
University of Ottawa, Ottawa, Ontario, Canada
Abdulmotaleb El Saddik
University of Electronic Science and Technology of China, Chengdu, China
Hongliang Li
Chinese Academy of Sciences, Beijing, China
Shuqiang Jiang
Harbin Institute of Technology, Harbin, China
Xiaopeng Fan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liang, Q., Wang, F. (2018). MCTD: Motion-Coordinate-Time Descriptor for 3D Skeleton-Based Action Recognition. In: Zeng, B., Huang, Q., El Saddik, A., Li, H., Jiang, S., Fan, X. (eds) Advances in Multimedia Information Processing – PCM 2017. PCM 2017. Lecture Notes in Computer Science(), vol 10735. Springer, Cham. https://doi.org/10.1007/978-3-319-77380-3_55

Download citation

DOI: https://doi.org/10.1007/978-3-319-77380-3_55
Published: 10 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77379-7
Online ISBN: 978-3-319-77380-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics