skip to main content
10.1145/2733373.2806296acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
short-paper

ConvNets-Based Action Recognition from Depth Maps through Virtual Cameras and Pseudocoloring

Published: 13 October 2015 Publication History

Abstract

In this paper, we propose to adopt ConvNets to recognize human actions from depth maps on relatively small datasets based on Depth Motion Maps (DMMs). In particular, three strategies are developed to effectively leverage the capability of ConvNets in mining discriminative features for recognition. Firstly, different viewpoints are mimicked by rotating virtual cameras around subject represented by the 3D points of the captured depth maps. This not only synthesizes more data from the captured ones, but also makes the trained ConvNets view-tolerant. Secondly, DMMs are constructed and further enhanced for recognition by encoding them into Pseudo-RGB images, turning the spatial-temporal motion patterns into textures and edges. Lastly, through transferring learning the models originally trained over ImageNet for image classification, the three ConvNets are trained independently on the color-coded DMMs constructed in three orthogonal planes. The proposed algorithm was extensively evaluated on MSRAction3D, MSRAction3DExt and UTKinect-Action datasets and achieved the state-of-the-art results on these datasets.

References

[1]
B. R. Abidi, Y. Zheng, A. V. Gribok, and M. A. Abidi. Improving weapon detection in single energy X-ray images through pseudocoloring. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 36(6):784--796, 2006.
[2]
S. Ji, W. Xu, M. Yang, and K. Yu. 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell., 35(1):221--231, Jan. 2013.
[3]
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv:1408.5093, 2014.
[4]
A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convolutional neural networks. In CVPR, 2014.
[5]
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
[6]
W. Li, Z. Zhang, and Z.Liu. Action recognition based on a bag of 3D points. In CVPRW, 2010.
[7]
O. Oreifej and Z. Liu. HON4D: Histogram of oriented 4D normals for activity recognition from depth sequences. In CVPR, 2013.
[8]
K. Simonyan and A. Zisserman. Two-stream convolutional networks for action recognition in videos. In NIPS, 2014.
[9]
J. Smisek, M. Jancosek, and T. Pajdla. 3D with kinect. In ICCVW, 2011.
[10]
J. Wang, Z. Liu, Y. Wu, and J. Yuan. Mining actionlet ensemble for action recognition with depth cameras. In CVPR, 2012.
[11]
L. Xia and J. Aggarwal. Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. In CVPR, 2013.
[12]
L. Xia, C.-C. Chen, and J. Aggarwal. View invariant human action recognition using histograms of 3D joints. In CVPRW, 2012.
[13]
X. Yang and Y. Tian. Super normal vector for activity recognition using depth sequences. In CVPR, 2014.
[14]
X. Yang, C. Zhang, and Y. Tian. Recognizing actions using depth motion maps-based histograms of oriented gradients. In ACM MM, 2012.
[15]
Y. Zhu, W. Chen, and G. Guo. Fusing spatiotemporal features and joints for 3D action recognition. In CVPRW, 2013.

Cited By

View all
  • (2024)HARNet: design and evaluation of a deep genetic algorithm for recognizing yoga posturesSignal, Image and Video Processing10.1007/s11760-024-03173-618:S1(553-564)Online publication date: 15-May-2024
  • (2023)Multi-view Multi-modal Approach Based on 5S-CNN and BiLSTM Using Skeleton, Depth and RGB Data for Human Activity RecognitionWireless Personal Communications10.1007/s11277-023-10324-4130:2(1141-1159)Online publication date: 12-Mar-2023
  • (2022)Analysis of Behavioral Image Recognition of Pan-Entertainment of Contemporary College Students’ NetworkScientific Programming10.1155/2022/11762792022Online publication date: 1-Jan-2022
  • Show More Cited By

Index Terms

  1. ConvNets-Based Action Recognition from Depth Maps through Virtual Cameras and Pseudocoloring

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '15: Proceedings of the 23rd ACM international conference on Multimedia
    October 2015
    1402 pages
    ISBN:9781450334594
    DOI:10.1145/2733373
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 October 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. action recognition
    2. convnets
    3. pseudocoloring
    4. virtual cameras

    Qualifiers

    • Short-paper

    Conference

    MM '15
    Sponsor:
    MM '15: ACM Multimedia Conference
    October 26 - 30, 2015
    Brisbane, Australia

    Acceptance Rates

    MM '15 Paper Acceptance Rate 56 of 252 submissions, 22%;
    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)14
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 08 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)HARNet: design and evaluation of a deep genetic algorithm for recognizing yoga posturesSignal, Image and Video Processing10.1007/s11760-024-03173-618:S1(553-564)Online publication date: 15-May-2024
    • (2023)Multi-view Multi-modal Approach Based on 5S-CNN and BiLSTM Using Skeleton, Depth and RGB Data for Human Activity RecognitionWireless Personal Communications10.1007/s11277-023-10324-4130:2(1141-1159)Online publication date: 12-Mar-2023
    • (2022)Analysis of Behavioral Image Recognition of Pan-Entertainment of Contemporary College Students’ NetworkScientific Programming10.1155/2022/11762792022Online publication date: 1-Jan-2022
    • (2022)Discriminative Multi-View Dynamic Image Fusion for Cross-View 3-D Action RecognitionIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.307017933:10(5332-5345)Online publication date: Oct-2022
    • (2022)Heterogeneous Network Building and Embedding for Efficient Skeleton-Based Human Action Recognition2022 5th International Conference on Pattern Recognition and Artificial Intelligence (PRAI)10.1109/PRAI55851.2022.9904108(364-369)Online publication date: 19-Aug-2022
    • (2022)Human Action Recognition using Skeleton features2022 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct)10.1109/ISMAR-Adjunct57072.2022.00066(289-296)Online publication date: Oct-2022
    • (2022)Deep learning and RGB-D based human action, human–human and human–object interaction recognitionJournal of Visual Communication and Image Representation10.1016/j.jvcir.2022.10353186:COnline publication date: 1-Jul-2022
    • (2022)Real-time human action recognition using raw depth video-based recurrent neural networksMultimedia Tools and Applications10.1007/s11042-022-14075-582:11(16213-16235)Online publication date: 28-Oct-2022
    • (2022)Online suspicious event detection in a constrained environment with RGB+D camera using multi-stream CNNs and SVMMultimedia Tools and Applications10.1007/s11042-022-12656-y81:23(32857-32881)Online publication date: 15-Apr-2022
    • (2022)3DFCNN: real-time action recognition using 3D deep neural networks with raw depth informationMultimedia Tools and Applications10.1007/s11042-022-12091-z81:17(24119-24143)Online publication date: 19-Mar-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media