short-paper

ConvNets-Based Action Recognition from Depth Maps through Virtual Cameras and Pseudocoloring

Authors:

Philip OgunbonaAuthors Info & Claims

MM '15: Proceedings of the 23rd ACM international conference on Multimedia

Pages 1119 - 1122

https://doi.org/10.1145/2733373.2806296

Published: 13 October 2015 Publication History

Abstract

In this paper, we propose to adopt ConvNets to recognize human actions from depth maps on relatively small datasets based on Depth Motion Maps (DMMs). In particular, three strategies are developed to effectively leverage the capability of ConvNets in mining discriminative features for recognition. Firstly, different viewpoints are mimicked by rotating virtual cameras around subject represented by the 3D points of the captured depth maps. This not only synthesizes more data from the captured ones, but also makes the trained ConvNets view-tolerant. Secondly, DMMs are constructed and further enhanced for recognition by encoding them into Pseudo-RGB images, turning the spatial-temporal motion patterns into textures and edges. Lastly, through transferring learning the models originally trained over ImageNet for image classification, the three ConvNets are trained independently on the color-coded DMMs constructed in three orthogonal planes. The proposed algorithm was extensively evaluated on MSRAction3D, MSRAction3DExt and UTKinect-Action datasets and achieved the state-of-the-art results on these datasets.

References

[1]

B. R. Abidi, Y. Zheng, A. V. Gribok, and M. A. Abidi. Improving weapon detection in single energy X-ray images through pseudocoloring. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 36(6):784--796, 2006.

Digital Library

[2]

S. Ji, W. Xu, M. Yang, and K. Yu. 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell., 35(1):221--231, Jan. 2013.

Digital Library

[3]

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv:1408.5093, 2014.

[4]

A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convolutional neural networks. In CVPR, 2014.

Digital Library

[5]

A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.

Digital Library

[6]

W. Li, Z. Zhang, and Z.Liu. Action recognition based on a bag of 3D points. In CVPRW, 2010.

[7]

O. Oreifej and Z. Liu. HON4D: Histogram of oriented 4D normals for activity recognition from depth sequences. In CVPR, 2013.

Digital Library

[8]

K. Simonyan and A. Zisserman. Two-stream convolutional networks for action recognition in videos. In NIPS, 2014.

Digital Library

[9]

J. Smisek, M. Jancosek, and T. Pajdla. 3D with kinect. In ICCVW, 2011.

[10]

J. Wang, Z. Liu, Y. Wu, and J. Yuan. Mining actionlet ensemble for action recognition with depth cameras. In CVPR, 2012.

[11]

L. Xia and J. Aggarwal. Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. In CVPR, 2013.

Digital Library

[12]

L. Xia, C.-C. Chen, and J. Aggarwal. View invariant human action recognition using histograms of 3D joints. In CVPRW, 2012.

[13]

X. Yang and Y. Tian. Super normal vector for activity recognition using depth sequences. In CVPR, 2014.

Digital Library

[14]

X. Yang, C. Zhang, and Y. Tian. Recognizing actions using depth motion maps-based histograms of oriented gradients. In ACM MM, 2012.

Digital Library

[15]

Y. Zhu, W. Chen, and G. Guo. Fusing spatiotemporal features and joints for 3D action recognition. In CVPRW, 2013.

Digital Library

Cited By

Subramanian RGovindaraj V(2024)HARNet: design and evaluation of a deep genetic algorithm for recognizing yoga posturesSignal, Image and Video Processing10.1007/s11760-024-03173-618:S1(553-564)Online publication date: 15-May-2024
https://doi.org/10.1007/s11760-024-03173-6
Kumar RKumar S(2023)Multi-view Multi-modal Approach Based on 5S-CNN and BiLSTM Using Skeleton, Depth and RGB Data for Human Activity RecognitionWireless Personal Communications10.1007/s11277-023-10324-4130:2(1141-1159)Online publication date: 12-Mar-2023
https://doi.org/10.1007/s11277-023-10324-4
Cui HWang Y(2022)Analysis of Behavioral Image Recognition of Pan-Entertainment of Contemporary College Students’ NetworkScientific Programming10.1155/2022/11762792022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/1176279
Show More Cited By

Index Terms

ConvNets-Based Action Recognition from Depth Maps through Virtual Cameras and Pseudocoloring
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks

Recommendations

Fusing Multiple Features for Depth-Based Action Recognition
Special Section on Visual Understanding with RGB-D Sensors

Human action recognition is a very active research topic in computer vision and pattern recognition. Recently, it has shown a great potential for human action recognition using the three-dimensional (3D) depth data captured by the emerging RGB-D ...
A four-stream ConvNet based on spatial and depth flow for human action classification using RGB-D data
Abstract
Appearance and depth-based action recognition has been researched exclusively for improving recognition accuracy by considering motion and shape recovery particulars from RGB-D video data. Convolutional neural networks (CNN) have shown evidences ...
Dual-stream cross-modality fusion transformer for RGB-D action recognition
Abstract
RGB-D-based action recognition can achieve accurate and robust performance due to rich complementary information, and thus has many application scenarios. However, existing works combine multiple modalities by late fusion or learn multimodal ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '15: Proceedings of the 23rd ACM international conference on Multimedia

October 2015

1402 pages

ISBN:9781450334594

DOI:10.1145/2733373

General Chairs:
Xiaofang Zhou
The University of Queensland, Australia
,
Alan F. Smeaton
Dublin City University, Ireland
,
Qi Tian
The University of Texas at San Antonio, USA
,
Program Chairs:
Dick C.A. Bulterman
FXPAL, USA
,
Heng Tao Shen
The University of Queensland, Australia
,
Ketan Mayer-Patel
The University of North Carolina, USA
,
Shuicheng Yan
National University of Singapore, Singapore

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 October 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

MM '15

Sponsor:

SIGMM

MM '15: ACM Multimedia Conference

October 26 - 30, 2015

Brisbane, Australia

Acceptance Rates

MM '15 Paper Acceptance Rate 56 of 252 submissions, 22%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

69
Total Citations
View Citations
468
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)0

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Subramanian RGovindaraj V(2024)HARNet: design and evaluation of a deep genetic algorithm for recognizing yoga posturesSignal, Image and Video Processing10.1007/s11760-024-03173-618:S1(553-564)Online publication date: 15-May-2024
https://doi.org/10.1007/s11760-024-03173-6
Kumar RKumar S(2023)Multi-view Multi-modal Approach Based on 5S-CNN and BiLSTM Using Skeleton, Depth and RGB Data for Human Activity RecognitionWireless Personal Communications10.1007/s11277-023-10324-4130:2(1141-1159)Online publication date: 12-Mar-2023
https://doi.org/10.1007/s11277-023-10324-4
Cui HWang Y(2022)Analysis of Behavioral Image Recognition of Pan-Entertainment of Contemporary College Students’ NetworkScientific Programming10.1155/2022/11762792022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/1176279
Wang YXiao YLu JTan BCao ZZhang ZZhou J(2022)Discriminative Multi-View Dynamic Image Fusion for Cross-View 3-D Action RecognitionIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.307017933:10(5332-5345)Online publication date: Oct-2022
https://doi.org/10.1109/TNNLS.2021.3070179
Yang QWang TWang QLei Y(2022)Heterogeneous Network Building and Embedding for Efficient Skeleton-Based Human Action Recognition2022 5th International Conference on Pattern Recognition and Artificial Intelligence (PRAI)10.1109/PRAI55851.2022.9904108(364-369)Online publication date: 19-Aug-2022
https://doi.org/10.1109/PRAI55851.2022.9904108
Patil AA SR AV NR G(2022)Human Action Recognition using Skeleton features2022 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct)10.1109/ISMAR-Adjunct57072.2022.00066(289-296)Online publication date: Oct-2022
https://doi.org/10.1109/ISMAR-Adjunct57072.2022.00066
Khaire PKumar P(2022)Deep learning and RGB-D based human action, human–human and human–object interaction recognitionJournal of Visual Communication and Image Representation10.1016/j.jvcir.2022.10353186:COnline publication date: 1-Jul-2022
https://dl.acm.org/doi/10.1016/j.jvcir.2022.103531
Sánchez-Caballero AFuentes-Jiménez DLosada-Gutiérrez C(2022)Real-time human action recognition using raw depth video-based recurrent neural networksMultimedia Tools and Applications10.1007/s11042-022-14075-582:11(16213-16235)Online publication date: 28-Oct-2022
https://doi.org/10.1007/s11042-022-14075-5
Khaire PKumar P(2022)Online suspicious event detection in a constrained environment with RGB+D camera using multi-stream CNNs and SVMMultimedia Tools and Applications10.1007/s11042-022-12656-y81:23(32857-32881)Online publication date: 15-Apr-2022
https://doi.org/10.1007/s11042-022-12656-y
Sánchez-Caballero Ade López-Diz SFuentes-Jimenez DLosada-Gutiérrez CMarrón-Romera MCasillas-Pérez DSarker M(2022)3DFCNN: real-time action recognition using 3D deep neural networks with raw depth informationMultimedia Tools and Applications10.1007/s11042-022-12091-z81:17(24119-24143)Online publication date: 19-Mar-2022
https://doi.org/10.1007/s11042-022-12091-z
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten