A Spatio-Temporal Convolutional Neural Network for Skeletal Action Recognition

Hu, Lizhang; Xu, Jinhua

doi:10.1007/978-3-319-70090-8_39

A Spatio-Temporal Convolutional Neural Network for Skeletal Action Recognition

Lizhang Hu¹⁸ &
Jinhua Xu¹⁸

Conference paper
First Online: 28 October 2017

4397 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10636))

Abstract

Human action recognition based on 3D skeleton data is a rapidly growing research area in computer vision. Convolutional Neural Networks (CNNs) have been proved to be the most effective representation learning in many vision tasks, but there is little work of CNNs for skeletal action recognition due to the variable-length of time sequences and lack of big skeleton datasets. In this paper, we propose a Spatio-Temporal CNN for skeleton based action recognition. A CNN architecture with two convolutional layers is used, in which the first layer is used to capture the spatial patterns and second layer for spatio-temporal patterns. Some techniques including data augmentation and segment pooling strategy are employed for long sequences. Experimental results on MSR Action3D, MSR DailyActivity3D and UT-Kinect show that our approach achieves comparable results with those of the state-of-the-art models.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: a review. ACM Comput. Surv. 43(3), 1–43 (2011)
Article Google Scholar
Han, J.: Enhanced computer vision with microsoft kinect sensor: a review. IEEE Trans. Cybern. 43(5), 1318–1334 (2013)
Article Google Scholar
Chen, L., Wei, H., Ferryman, J.M.: A survey of human motion analysis using depth imagery. Pattern Recogn. Lett. 34, 1995–2006 (2013)
Article Google Scholar
Ye, M., Zhang, Q., Wang, L., Zhu, J., Yang, R., Gall, J.: A survey on human motion analysis from depth data. In: Grzegorzek, M., Theobalt, C., Koch, R., Kolb, A. (eds.) Time-of-Flight and Depth Imaging. Sensors, Algorithms, and Applications. LNCS, vol. 8200, pp. 149–187. Springer, Heidelberg (2013). doi:10.1007/978-3-642-44964-2_8
Chapter Google Scholar
Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. IEEE Conf. Comput. Vis. Pattern Recogn. 411(1), 1297–1304 (2011)
Google Scholar
Xia, L., Chen, C.C., Aggarwal, J.K.: View invariant human action recognition using histograms of 3D joints. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 20–27 (2012)
Google Scholar
Yang, X., Tian, Y.L.: Effective 3D action recognition using eigenjoints. J. Vis. Commun. Image Represent. 25(1), 2–11 (2014)
Article MathSciNet Google Scholar
Zanfir, M., Leordeanu, M., Sminchisescu, C.: The moving pose: an efficient 3D kinematics descriptor for low-latency action recognition and detection. In: IEEE International Conference on Computer Vision, pp. 2752–2759 (2013)
Google Scholar
Ofli, F., Chaudhry, R., Kurillo, G., Vidal, R., Bajcsy, R.: Sequence of the most informative joints (smij): a new representation for human skeletal action recognition. J. Vis. Commun. Image Represent. 25(1), 24–38 (2014)
Article Google Scholar
Hussein, M.E., Torki, M., Gowayyed, M.A., El-Saban, M.: Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations. In: The 23rd International Joint Conference on Artificial Intelligence (2013)
Google Scholar
Sivalingam, R., Somasundaram, G., Bhatawadekar, V., Morellas, V., Papanikolopoulos, N.: Sparse representation of point trajectories for action classification. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 3601–3606 (2012)
Google Scholar
Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1110–1118 (2015)
Google Scholar
Zhu, W., Lan, C., Xing, J., Zeng, W., Li, Y., Shen, L., Xie, X.: Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)
Google Scholar
Tao, L., Vidal, R.: Moving poselets: a discriminative and interpretable skeletal motion representation for action recognition. In: IEEE Conference on Computer Vision Workshop (2015)
Google Scholar
Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1290–1297 (2012)
Google Scholar
Wang, C., Wang, Y., Yuille, A.: An approach to pose-based action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (2013)
Google Scholar
Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3D points. In: Workshop on Human Activity Understanding from 3D Data, pp. 9–14 (2010)
Google Scholar
Zhu, Y., Chen, W., Guo, G.: Fusing spatiotemporal features and joints for 3D action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 486–491 (2013)
Google Scholar
Collobert, R., Kavukcuoglu, K., Farabet, C.: Torch7: a matlab-like environment for machine learning. In: BigLearn, NIPS Workshop (2011)
Google Scholar
Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3D skeletons as points in a lie group. In: IEEE Conference on Computer Vision and Pattern Recognition (2014)
Google Scholar
Gowayyed, M.A., Torki, M., Hussein, M.E., El-Saban, M.: Histogram of oriented displacements (HOD): describing trajectories of human joints for action recognition. In: International Joint Conference on Artificial Intelligence, pp. 1351–1357 (2013)
Google Scholar
Anirudh, R., Turaga, P., Su, J., Srivastava, A.: Elastic functional coding of human actions: from vector-fields to latent variables. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3147–3155 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University, 3663 North Zhongshan Road, Shanghai, China
Lizhang Hu & Jinhua Xu

Authors

Lizhang Hu
View author publications
You can also search for this author in PubMed Google Scholar
Jinhua Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jinhua Xu .

Editor information

Editors and Affiliations

Guangdong University of Technology, Guangzhou, China
Derong Liu
Guangdong University of Technology, Guangzhou, China
Shengli Xie
South China University of Technology, Guangzhou, China
Yuanqing Li
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Dongbin Zhao
King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia
El-Sayed M. El-Alfy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hu, L., Xu, J. (2017). A Spatio-Temporal Convolutional Neural Network for Skeletal Action Recognition. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10636. Springer, Cham. https://doi.org/10.1007/978-3-319-70090-8_39

Download citation

DOI: https://doi.org/10.1007/978-3-319-70090-8_39
Published: 28 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70089-2
Online ISBN: 978-3-319-70090-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics