skip to main content
research-article

Multifeature Selection for 3D Human Action Recognition

Published: 22 May 2018 Publication History

Abstract

In mainstream approaches for 3D human action recognition, depth and skeleton features are combined to improve recognition accuracy. However, this strategy results in high feature dimensions and low discrimination due to redundant feature vectors. To solve this drawback, a multi-feature selection approach for 3D human action recognition is proposed in this paper. First, three novel single-modal features are proposed to describe depth appearance, depth motion, and skeleton motion. Second, a classification entropy of random forest is used to evaluate the discrimination of the depth appearance based features. Finally, one of the three features is selected to recognize the sample according to the discrimination evaluation. Experimental results show that the proposed multi-feature selection approach significantly outperforms other approaches based on single-modal feature and feature fusion.

References

[1]
J. Wang, Z. Liu, Y. Wu, and J. Yuan. 2012. Mining actionlet ensemble for action recognition with depth cameras. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12), 1290--1297.
[2]
J. Wang, Z. Liu, J. Chorowski, Z. Chen, and Y. Wu. 2012. Robust 3d action recognition with random occupancy patterns. In European Conference on Computer Vision (ECCV’12). Springer, 872--885.
[3]
A. W. Vieira, E. R. Nascimento, G. L. Oliveira, Z. Liu, and M. F. Campos. 2012. Stop: Space-time occupancy patterns for 3d action recognition from depth map sequences. In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. Springer, 252--259.
[4]
H. Rahmani, A. Mahmood, D. Q. Huynh, and A. Mian. 2014. HOPC: Histogram of oriented principal components of 3D pointclouds for action recognition. In European Conference on Computer Vision (ECCV’14). Springer, 742--757.
[5]
L. Xia, C.-C. Chen, and J. Aggarwal. 2012. View invariant human action recognition using histograms of 3d joints. In Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’12). IEEE, 20--27.
[6]
X. Yang and Y. Tian. 2012. Eigenjoints-based action recognition using naive-bayes-nearest-neighbor. In Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’12). IEEE, 14--19.
[7]
R. Vemulapalli, F. Arrate, and R. Chellappa. 2014. Human action recognition by representing 3D skeletons as points in a lie group. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14).
[8]
O. Oreifej and Z. Liu. 2013. Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’13). IEEE, 716--723.
[9]
X. Yang and Y. Tian. 2014. Super normal vector for activity recognition using depth sequences. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14).
[10]
X. Yang, C. Zhang, and Y. Tian. 2012. Recognizing actions using depth motion maps-based histograms of oriented gradients. In Proceedings of the 20th ACM International Conference on Multimedia. ACM, 1057--1060.
[11]
A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, 1097--1105.
[12]
C. Chen, R. Jafari, and N. Kehtarnavaz. 2015. Action recognition from depth sequences using depth motion maps-based local binary patterns. In Proceedings of the 2015 IEEE Winter Conference on Applications of Computer Vision, 1092--1099.
[13]
P. Wang, W. Li, Z. Gao, C. Tang, J. Zhang, and P. Ogunbona. 2015. Convnets-based action recognition from depth maps through virtual cameras and pseudocoloring. In Proceedings of the 23rd ACM International Conference on Multimedia. ACM, 1119--1122.
[14]
P. Wang, W. Li, Z. Gao, J. Zhang, C. Tang, and P. O. Ogunbona. 2016. Action recognition from depth maps using deep convolutional neural networks. IEEE Transactions on Human-Machine Systems 46, 498--509.
[15]
A. Chaaraoui, J. Padilla-Lopez, and F. Flórez-Revuelta. 2013. Fusion of skeletal and silhouette-based features for human action recognition with rgb-d devices. In Proceedings of the IEEE International Conference on Computer Vision Workshops, 91--97.
[16]
Y. Liu, L. Qin, Z. Cheng, Y. Zhang, W. Zhang, and Q. Huang. 2014. Da-ccd: A novel action representation by deep architecture of local depth feature. In Proceedings of the 2014 IEEE International Conference on Image Processing (ICIP’14). IEEE, 833--837.
[17]
Y. Kong and Y. Fu. 2015. Bilinear heterogeneous information machine for RGB-D action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1054--1062.
[18]
J.-F. Hu, W.-S. Zheng, J. Lai, and J. Zhang. 2015. Jointly learning heterogeneous features for RGB-D activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5344--5352.
[19]
I. Guyon and A. Elisseeff. 2003. An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157--1182.
[20]
K. Kira and L. A. Rendell. 1992. The feature selection problem: Traditional methods and a new algorithm. In Proceedings of the National Conference on Artificial Intelligence, 129--134.
[21]
R. Kohavi and G. H. John. 1997. Wrappers for feature subset selection. Artificial Intelligence 97, 273--324.
[22]
Y. Yang, Z. Ma, A. G. Hauptmann, and N. Sebe. 2013. Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Transactions on Multimedia 15, 661--669.
[23]
J. Weston, A. Elisseeff, B. Scholkopf, and M. E. Tipping. 2003. Use of the zero norm with linear models and kernel methods. Journal of Machine Learning Research 3, 1439--1461.
[24]
M. Huang, S. Z. Su, G. R. Cai, H. B. Zhang, D. Cao, and S. Z. Li. 2017. Meta-action descriptor for action recognition in RGBD video. IET Computer Vision 11, 301--308.
[25]
M. Huang, G.-R. Cai, H.-B. Zhang, S. Yu, D.-Y. Gong, D.-L. Cao, S. Li, and S.-Z. Su. 2018. Discriminative parts learning for 3d human action recognition. Neurocomputing 291 (2018), 84--96.
[26]
J. Wu, Y. Zhang, and W. Lin. 2016. Good practices for learning to recognize actions using FV and VLAD. IEEE Transactions on Systems, Man, and Cybernetics 46, 2978--2990.
[27]
X. Peng, L. Wang, X. Wang, and Y. Qiao. 2016. Bag of visual words and fusion methods for action recognition. Computer Vision and Image Understanding. 109--125.
[28]
A. Liu, Y. Su, W. Nie, and M. S. Kankanhalli. 2017. Hierarchical clustering multi-task learning for joint human action grouping and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 102--114.
[29]
W. Li, Z. Zhang, and Z. Liu. 2010. Action recognition based on a bag of 3d points. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’10). IEEE, 9--14.
[30]
A. Jalal, M. Z. Uddin, J. T. Kim, and T.-S. Kim. 2012. Recognition of human home activities via depth silhouettes and r transformation for smart homes. Indoor and Built Environment 21, 184--190.
[31]
C. Lu, J. Jia, and C.-K. Tang. 2014. Range-sample depth feature for action recognition. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[32]
R. Yang and R. Yang. 2015. DMM-pyramid based deep architectures for action recognition with depth cameras. In Asian Conference on Computer Vision (ACCV’14). Springer, 37--49.
[33]
B. B. Amor, J. Su, and A. Srivastava. 2016. Action recognition using rate-invariant analysis of skeletal shape trajectories. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 1--13.
[34]
A. Shahroudy, T. T. Ng, Q. Yang, and G. Wang. 2016. Multimodal multipart learning for action recognition in depth videos. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 2123--2129.
[35]
P. Wang, Z. Li, Y. Hou, and W. Li. 2016. Action recognition based on joint trajectory maps using convolutional neural networks. In Proceedings of the 2016 ACM on Multimedia Conference. ACM, 102--106.
[36]
D. Wu and L. Shao. 2014. Leveraging hierarchical parametric networks for skeletal joints based action segmentation and recognition. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14).
[37]
Y. Du, W. Wang, and L. Wang. 2015. Hierarchical recurrent neural network for skeleton based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1110--1118.
[38]
L. Liu and L. Shao. 2013. Learning discriminative representations from RGB-D video data. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence, AAAI Press, 1493--1500.
[39]
L. Liu, L. Shao, X. Li, and K. Lu. 2016. Learning spatio-temporal representations for action recognition: A genetic programming approach. IEEE Transactions on Systems, Man, and Cybernetics 46, 158--170.
[40]
W. Chen and G. Guo. 2015. TriViews: A general framework to use 3D depth data effectively for action recognition. Journal of Visual Communication and Image Representation 26, 182--191.
[41]
R. N. Bracewell. 1986. The Fourier Transform and Its Applications. McGraw-Hill New York.
[42]
M. Li, H. Leung, and H. P. Shum. 2016. Human action recognition via skeletal and depth based feature fusion. In Proceedings of the 9th International Conference on Motion in Games. ACM, 123--132.

Cited By

View all
  • (2025)Individual Contribution-Based Spatial-Temporal Attention on Skeleton Sequences for Human Interaction RecognitionIEEE Access10.1109/ACCESS.2024.352518513(6463-6474)Online publication date: 2025
  • (2024)Two-stream Multi-level Dynamic Point Transformer for Two-person Interaction RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363947020:5(1-22)Online publication date: 5-Jan-2024
  • (2023)Egocentric Early Action Prediction via Adversarial Knowledge DistillationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/354449319:2(1-21)Online publication date: 6-Feb-2023
  • Show More Cited By

Index Terms

  1. Multifeature Selection for 3D Human Action Recognition

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 14, Issue 2
    May 2018
    208 pages
    ISSN:1551-6857
    EISSN:1551-6865
    DOI:10.1145/3210458
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 May 2018
    Accepted: 01 December 2017
    Revised: 01 December 2017
    Received: 01 July 2017
    Published in TOMM Volume 14, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Feature selection
    2. action recognition

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • Nature Science Foundation of China
    • Natural Science Foundation of Fujian Province
    • Fujian Province 2011 Collaborative Innovation Center of TCM Health Management, Collaborative Innovation Center of Chinese Oolong Tea Industry
    • Fujian Provincial Key Projects of Technology

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 01 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Individual Contribution-Based Spatial-Temporal Attention on Skeleton Sequences for Human Interaction RecognitionIEEE Access10.1109/ACCESS.2024.352518513(6463-6474)Online publication date: 2025
    • (2024)Two-stream Multi-level Dynamic Point Transformer for Two-person Interaction RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363947020:5(1-22)Online publication date: 5-Jan-2024
    • (2023)Egocentric Early Action Prediction via Adversarial Knowledge DistillationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/354449319:2(1-21)Online publication date: 6-Feb-2023
    • (2023)Egocentric Early Action Prediction via Multimodal Transformer-Based Dual Action PredictionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.324827133:9(4472-4483)Online publication date: Sep-2023
    • (2023)Video Insights Application, A Machine Learning Approach2023 5th International Conference on Advances in Computing, Communication Control and Networking (ICAC3N)10.1109/ICAC3N60023.2023.10541630(389-394)Online publication date: 15-Dec-2023
    • (2021)Bayesian Covariance Representation with Global Informative Prior for 3D Action RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/346023517:4(1-22)Online publication date: 12-Nov-2021
    • (2021)Surveillance video analysis for student action recognition and localization inside computer laboratories of a smart campusMultimedia Tools and Applications10.1007/s11042-020-09741-580:2(2907-2929)Online publication date: 1-Jan-2021
    • (2020)A Benchmark Dataset and Comparison Study for Multi-modal Human Action AnalyticsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/336521216:2(1-24)Online publication date: 22-May-2020
    • (2019)Unsupervised Learning of Human Action Categories in Still Images with Deep RepresentationsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/336216115:4(1-20)Online publication date: 16-Dec-2019

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media