research-article

Multifeature Selection for 3D Human Action Recognition

Authors:

Shao-Zi LiAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 14, Issue 2

Article No.: 45, Pages 1 - 18

https://doi.org/10.1145/3177757

Published: 22 May 2018 Publication History

Abstract

In mainstream approaches for 3D human action recognition, depth and skeleton features are combined to improve recognition accuracy. However, this strategy results in high feature dimensions and low discrimination due to redundant feature vectors. To solve this drawback, a multi-feature selection approach for 3D human action recognition is proposed in this paper. First, three novel single-modal features are proposed to describe depth appearance, depth motion, and skeleton motion. Second, a classification entropy of random forest is used to evaluate the discrimination of the depth appearance based features. Finally, one of the three features is selected to recognize the sample according to the discrimination evaluation. Experimental results show that the proposed multi-feature selection approach significantly outperforms other approaches based on single-modal feature and feature fusion.

References

[1]

J. Wang, Z. Liu, Y. Wu, and J. Yuan. 2012. Mining actionlet ensemble for action recognition with depth cameras. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12), 1290--1297.

Digital Library

[2]

J. Wang, Z. Liu, J. Chorowski, Z. Chen, and Y. Wu. 2012. Robust 3d action recognition with random occupancy patterns. In European Conference on Computer Vision (ECCV’12). Springer, 872--885.

[3]

A. W. Vieira, E. R. Nascimento, G. L. Oliveira, Z. Liu, and M. F. Campos. 2012. Stop: Space-time occupancy patterns for 3d action recognition from depth map sequences. In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. Springer, 252--259.

[4]

H. Rahmani, A. Mahmood, D. Q. Huynh, and A. Mian. 2014. HOPC: Histogram of oriented principal components of 3D pointclouds for action recognition. In European Conference on Computer Vision (ECCV’14). Springer, 742--757.

[5]

L. Xia, C.-C. Chen, and J. Aggarwal. 2012. View invariant human action recognition using histograms of 3d joints. In Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’12). IEEE, 20--27.

[6]

X. Yang and Y. Tian. 2012. Eigenjoints-based action recognition using naive-bayes-nearest-neighbor. In Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’12). IEEE, 14--19.

[7]

R. Vemulapalli, F. Arrate, and R. Chellappa. 2014. Human action recognition by representing 3D skeletons as points in a lie group. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14).

Digital Library

[8]

O. Oreifej and Z. Liu. 2013. Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’13). IEEE, 716--723.

Digital Library

[9]

X. Yang and Y. Tian. 2014. Super normal vector for activity recognition using depth sequences. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14).

Digital Library

[10]

X. Yang, C. Zhang, and Y. Tian. 2012. Recognizing actions using depth motion maps-based histograms of oriented gradients. In Proceedings of the 20th ACM International Conference on Multimedia. ACM, 1057--1060.

Digital Library

[11]

A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, 1097--1105.

Digital Library

[12]

C. Chen, R. Jafari, and N. Kehtarnavaz. 2015. Action recognition from depth sequences using depth motion maps-based local binary patterns. In Proceedings of the 2015 IEEE Winter Conference on Applications of Computer Vision, 1092--1099.

Digital Library

[13]

P. Wang, W. Li, Z. Gao, C. Tang, J. Zhang, and P. Ogunbona. 2015. Convnets-based action recognition from depth maps through virtual cameras and pseudocoloring. In Proceedings of the 23rd ACM International Conference on Multimedia. ACM, 1119--1122.

Digital Library

[14]

P. Wang, W. Li, Z. Gao, J. Zhang, C. Tang, and P. O. Ogunbona. 2016. Action recognition from depth maps using deep convolutional neural networks. IEEE Transactions on Human-Machine Systems 46, 498--509.

[15]

A. Chaaraoui, J. Padilla-Lopez, and F. Flórez-Revuelta. 2013. Fusion of skeletal and silhouette-based features for human action recognition with rgb-d devices. In Proceedings of the IEEE International Conference on Computer Vision Workshops, 91--97.

Digital Library

[16]

Y. Liu, L. Qin, Z. Cheng, Y. Zhang, W. Zhang, and Q. Huang. 2014. Da-ccd: A novel action representation by deep architecture of local depth feature. In Proceedings of the 2014 IEEE International Conference on Image Processing (ICIP’14). IEEE, 833--837.

[17]

Y. Kong and Y. Fu. 2015. Bilinear heterogeneous information machine for RGB-D action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1054--1062.

[18]

J.-F. Hu, W.-S. Zheng, J. Lai, and J. Zhang. 2015. Jointly learning heterogeneous features for RGB-D activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5344--5352.

[19]

I. Guyon and A. Elisseeff. 2003. An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157--1182.

Digital Library

[20]

K. Kira and L. A. Rendell. 1992. The feature selection problem: Traditional methods and a new algorithm. In Proceedings of the National Conference on Artificial Intelligence, 129--134.

Digital Library

[21]

R. Kohavi and G. H. John. 1997. Wrappers for feature subset selection. Artificial Intelligence 97, 273--324.

Digital Library

[22]

Y. Yang, Z. Ma, A. G. Hauptmann, and N. Sebe. 2013. Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Transactions on Multimedia 15, 661--669.

Digital Library

[23]

J. Weston, A. Elisseeff, B. Scholkopf, and M. E. Tipping. 2003. Use of the zero norm with linear models and kernel methods. Journal of Machine Learning Research 3, 1439--1461.

Digital Library

[24]

M. Huang, S. Z. Su, G. R. Cai, H. B. Zhang, D. Cao, and S. Z. Li. 2017. Meta-action descriptor for action recognition in RGBD video. IET Computer Vision 11, 301--308.

[25]

M. Huang, G.-R. Cai, H.-B. Zhang, S. Yu, D.-Y. Gong, D.-L. Cao, S. Li, and S.-Z. Su. 2018. Discriminative parts learning for 3d human action recognition. Neurocomputing 291 (2018), 84--96.

[26]

J. Wu, Y. Zhang, and W. Lin. 2016. Good practices for learning to recognize actions using FV and VLAD. IEEE Transactions on Systems, Man, and Cybernetics 46, 2978--2990.

[27]

X. Peng, L. Wang, X. Wang, and Y. Qiao. 2016. Bag of visual words and fusion methods for action recognition. Computer Vision and Image Understanding. 109--125.

Digital Library

[28]

A. Liu, Y. Su, W. Nie, and M. S. Kankanhalli. 2017. Hierarchical clustering multi-task learning for joint human action grouping and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 102--114.

Digital Library

[29]

W. Li, Z. Zhang, and Z. Liu. 2010. Action recognition based on a bag of 3d points. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’10). IEEE, 9--14.

[30]

A. Jalal, M. Z. Uddin, J. T. Kim, and T.-S. Kim. 2012. Recognition of human home activities via depth silhouettes and r transformation for smart homes. Indoor and Built Environment 21, 184--190.

[31]

C. Lu, J. Jia, and C.-K. Tang. 2014. Range-sample depth feature for action recognition. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Digital Library

[32]

R. Yang and R. Yang. 2015. DMM-pyramid based deep architectures for action recognition with depth cameras. In Asian Conference on Computer Vision (ACCV’14). Springer, 37--49.

[33]

B. B. Amor, J. Su, and A. Srivastava. 2016. Action recognition using rate-invariant analysis of skeletal shape trajectories. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 1--13.

Digital Library

[34]

A. Shahroudy, T. T. Ng, Q. Yang, and G. Wang. 2016. Multimodal multipart learning for action recognition in depth videos. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 2123--2129.

Digital Library

[35]

P. Wang, Z. Li, Y. Hou, and W. Li. 2016. Action recognition based on joint trajectory maps using convolutional neural networks. In Proceedings of the 2016 ACM on Multimedia Conference. ACM, 102--106.

Digital Library

[36]

D. Wu and L. Shao. 2014. Leveraging hierarchical parametric networks for skeletal joints based action segmentation and recognition. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14).

Digital Library

[37]

Y. Du, W. Wang, and L. Wang. 2015. Hierarchical recurrent neural network for skeleton based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1110--1118.

[38]

L. Liu and L. Shao. 2013. Learning discriminative representations from RGB-D video data. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence, AAAI Press, 1493--1500.

Digital Library

[39]

L. Liu, L. Shao, X. Li, and K. Lu. 2016. Learning spatio-temporal representations for action recognition: A genetic programming approach. IEEE Transactions on Systems, Man, and Cybernetics 46, 158--170.

[40]

W. Chen and G. Guo. 2015. TriViews: A general framework to use 3D depth data effectively for action recognition. Journal of Visual Communication and Image Representation 26, 182--191.

Digital Library

[41]

R. N. Bracewell. 1986. The Fourier Transform and Its Applications. McGraw-Hill New York.

[42]

M. Li, H. Leung, and H. P. Shum. 2016. Human action recognition via skeletal and depth based feature fusion. In Proceedings of the 9th International Conference on Motion in Games. ACM, 123--132.

Digital Library

Cited By

Liu XGao B(2025)Individual Contribution-Based Spatial-Temporal Attention on Skeleton Sequences for Human Interaction RecognitionIEEE Access10.1109/ACCESS.2024.352518513(6463-6474)Online publication date: 2025
https://doi.org/10.1109/ACCESS.2024.3525185
Liu YCui GLuo JChang XYao L(2024)Two-stream Multi-level Dynamic Point Transformer for Two-person Interaction RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363947020:5(1-22)Online publication date: 5-Jan-2024
https://dl.acm.org/doi/10.1145/3639470
Zheng NSong XSu TLiu WYan YNie L(2023)Egocentric Early Action Prediction via Adversarial Knowledge DistillationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/354449319:2(1-21)Online publication date: 6-Feb-2023
https://dl.acm.org/doi/10.1145/3544493
Show More Cited By

Index Terms

Multifeature Selection for 3D Human Action Recognition
1. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms
      1. Feature selection

Recommendations

Fusing Multiple Features for Depth-Based Action Recognition
Special Section on Visual Understanding with RGB-D Sensors

Human action recognition is a very active research topic in computer vision and pattern recognition. Recently, it has shown a great potential for human action recognition using the three-dimensional (3D) depth data captured by the emerging RGB-D ...
Automatic 3D face recognition from depth and intensity Gabor features

As is well known, traditional 2D face recognition based on optical (intensity or color) images faces many challenges, such as illumination, expression, and pose variation. In fact, the human face generates not only 2D texture information but also 3D ...
Human action recognition via skeletal and depth based feature fusion
MIG '16: Proceedings of the 9th International Conference on Motion in Games

This paper addresses the problem of recognizing human actions captured with depth cameras. Human action recognition is a challenging task as the articulated action data is high dimensional in both spatial and temporal domains. An effective approach to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 14, Issue 2

May 2018

208 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3210458

Editor:
Alberto Del Bimbo
University of Firenze, Italy

Issue’s Table of Contents

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 May 2018

Accepted: 01 December 2017

Revised: 01 December 2017

Received: 01 July 2017

Published in TOMM Volume 14, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Nature Science Foundation of China
Natural Science Foundation of Fujian Province
Fujian Province 2011 Collaborative Innovation Center of TCM Health Management, Collaborative Innovation Center of Chinese Oolong Tea Industry
Fujian Provincial Key Projects of Technology

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
283
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liu XGao B(2025)Individual Contribution-Based Spatial-Temporal Attention on Skeleton Sequences for Human Interaction RecognitionIEEE Access10.1109/ACCESS.2024.352518513(6463-6474)Online publication date: 2025
https://doi.org/10.1109/ACCESS.2024.3525185
Liu YCui GLuo JChang XYao L(2024)Two-stream Multi-level Dynamic Point Transformer for Two-person Interaction RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363947020:5(1-22)Online publication date: 5-Jan-2024
https://dl.acm.org/doi/10.1145/3639470
Zheng NSong XSu TLiu WYan YNie L(2023)Egocentric Early Action Prediction via Adversarial Knowledge DistillationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/354449319:2(1-21)Online publication date: 6-Feb-2023
https://dl.acm.org/doi/10.1145/3544493
Guan WSong XWang KWen HNi HWang YChang X(2023)Egocentric Early Action Prediction via Multimodal Transformer-Based Dual Action PredictionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.324827133:9(4472-4483)Online publication date: Sep-2023
https://doi.org/10.1109/TCSVT.2023.3248271
Singh NRathaur PSingh APatel N(2023)Video Insights Application, A Machine Learning Approach2023 5th International Conference on Advances in Computing, Communication Control and Networking (ICAC3N)10.1109/ICAC3N60023.2023.10541630(389-394)Online publication date: 15-Dec-2023
https://doi.org/10.1109/ICAC3N60023.2023.10541630
Zhang JFeng ZSu YXing M(2021)Bayesian Covariance Representation with Global Informative Prior for 3D Action RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/346023517:4(1-22)Online publication date: 12-Nov-2021
https://dl.acm.org/doi/10.1145/3460235
Rashmi MAshwin TGuddeti R(2021)Surveillance video analysis for student action recognition and localization inside computer laboratories of a smart campusMultimedia Tools and Applications10.1007/s11042-020-09741-580:2(2907-2929)Online publication date: 1-Jan-2021
https://dl.acm.org/doi/10.1007/s11042-020-09741-5
Liu JSong SLiu CLi YHu Y(2020)A Benchmark Dataset and Comparison Study for Multi-modal Human Action AnalyticsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/336521216:2(1-24)Online publication date: 22-May-2020
https://dl.acm.org/doi/10.1145/3365212
Zheng YLi XLu X(2019)Unsupervised Learning of Human Action Categories in Still Images with Deep RepresentationsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/336216115:4(1-20)Online publication date: 16-Dec-2019
https://dl.acm.org/doi/10.1145/3362161

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents