Abstract
The intelligent recognition of human behavior and action in massive video data is the key application direction in the field of artificial intelligence. With the development of intelligent communication network, multimedia communication has become a hot spot in the field of video analysis. 3D convolution is an efficient deep learning model. It can learn the temporal and spatial features of target images at the same time. A 3D max residual feature map convolution network (3D-MRCNN) is proposed in this paper. Problems can be solved by the proposed model that the deficiencies of the network degradation and gradient disappearance caused by convolution calculation. The proposed model is preprocessed by 2D convolution firstly. A learning network including 3D-max feature map (3D-MFM) and residual structure is established after the convolution splitting is completed. Finally, the output vectors corresponding to the two different inputs are connected and fused into the support vector machine (SVM) classification. The accuracy of 3D-MRCNN can achieve 85.7% by experimenting on the representative UCF101 data set. And it has higher accuracy and operating efficiency compared with the models which have strong correlation with 3D-MRCNN.
Abbreviations
- 3D-MRCNN:
-
3D max residual feature map convolution network
- 3D-MFM:
-
3D-max feature map
- SVM:
-
Support vector machine
- IDT:
-
Improved dense trajectory
- TSN:
-
Temporal Segment Networks
- C3D:
-
Convolution 3D
- P3D:
-
Pseudo-3D Residual Net
References
Pandeya YR, Lee JW (2021) Deep learning-based late fusion of multimodal information for emotion classification of music video. Multimedia Tools and Applications 80(38):1–19. https://doi.org/10.1007/s11042-020-08836-3
Li H-F, Cryer S, Acharya LP, Raymond J (2020) Video and image classification using atomisation s-pray image patterns and deep learning. Biosystems Engine-ering 200:13–22. https://doi.org/10.1016/j.bio-systemseng.2020.08.016
Wang H, Schmid C (2013) Action Recognition with Improved Trajectories. IEEE International Conference on Computer Vision:3551–3558. https://doi.org/10.1109/ICCV.2013.441
Simonyan K, Zisserman A (2014) Two-Stream convolution Networks for Action Recognition in Videos. Computer Vision and Pattern Recognition.
Ning L, Wang Z, Guo Q (2014) Preferred Route Indoor Mobility Model for Heterogeneous Networks. IEEE Commun Lett 18(5):821–824. https://doi.org/10.1109/LCOMM.2014.033114.140344
Zhang RZ, Zhong H, Zheng TY, Ning L (2021) Trajectory Mining-Based City-Level Mobility Model for 5G NB-IoT Networks. Wirel Commun Mob Comput 2021:12. https://doi.org/10.1155/2021/5356193
Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional Two-Stream Network Fusion for Video Action Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition:1933–1941. https://doi.org/10.1109/CVPR.2016.213
Ng JY-H, Hausknecht M, Vijayanarasimhan S et al (2015) Beyond short snippets: Deep networks for video classification. 2015 IEEE Conference on Computer Vision and Pattern Recognition:4694–4702. https://doi.org/10.1109/CVPR.2015.7299101
Wang LM, Xiong XJ, Wang Z et al (2016) Temporal Segment Networks: Towards Good Practices for Deep Action Recognition. Computer Vision–ECCV 2016 9912:20–36. https://doi.org/10.1007/978-3-319-46484-8_2
Varol G, Laptev, et al (2017) Long-term temporal convolutions for action recognition. IEEE Trans Pattern Anal Mach Intell 40(6):1510–1517. https://doi.org/10.1109/TPAMI.2017.2712608
Ji SW, Xu W, Yang M, Yu K et al (2010) 3D convolution Neural Networks for Human Action Recognition. IEEE Trans Pattern Anal Mach Intell 35(1):495–502. https://doi.org/10.1109/TPAMI.2012.59
Tran D, Bourdev L, Fergus R, Torresani L (2015) Learning spatiotemporal features with 3d convolution networks. IEEE International Conference onComputer Vision:4489–4497.
Qiu ZF, Yao T, Mei T (2017) Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks. 2017 IEEE International Conference on Computer Vision:5534–5542. https://doi.org/10.1109/ICCV.2017.590
Zhang QW, Huang KQ, Wang X, Jiang B, Gan Y (2019) Efficient multiview video plus depth coding for 3D-HEVC based on complexity classification of the treeblock. Real-Time Image Processing 16(6):1909–1926. https://doi.org/10.1007/s11554-017-0692-5
Wu X, He R, Sun ZN, Tan TN (2018) A Light CNN for Deep Face Representation With Noisy Labels IEEE Transactions on Information Forensics and Security 13(11):2884–2896. https://doi.org/10.1109/TIFS.2018.2833032
Gayathri N, Mahesh K (2020) Improved Fuzzy-Based SVM Classification System Using Feature Extraction for Video Indexing and Retrieval. Int J Fuzzy Syst 22(8):1716–1729. https://doi.org/10.1007/s40815-020-00884-z
Chen YS, Guo B, Wang W, Suo XH, Zhang Z (2020) Using efficient group pseudo-3D network to learn spatio-temporal features. SIViP 15(2):361–369. https://doi.org/10.1007/s11760-020-01758-5
Binol H, Aaron C-M, M. Khalid K-N et al (2020) SelectStitch: Automated Frame Segmentation and Stitching to Create Composite Images from Otoscope Video Clips. https://doi.org/10.1101/2020.08.12.20173765
Yu WY, Zhao M, Xu J et al (2020) Feature extraction of positron image and imaging algorithm based on 3D convolution operation. Optik 217:164952. https://doi.org/10.1016/j.ijleo.2020.164952
Chollet F (2017) Xception: Deep Learning with Depthwise Separable Convolutions. 2017 IEEE Conference on Computer Vision and Pattern Recognition:1800–1807. https://doi.org/10.1109/CVPR.2017.195
Lu WD, Gong Y, Liu X et al (2018) Collaborative Energy and Information Transfer in Green Wireless Sensor Networks for Smart Cities. IEEE Trans Industr Inf 14(4):1585–1593. https://doi.org/10.1109/TII.2017.2777846
Jiang YN, Li Y, Zhang HK (2019) Hyperspectral Image Classification Based on 3-D Separable ResNet and Transfer Learning. IEEE Geosci Remote Sens Lett 16(12):1949–1953. https://doi.org/10.1109/LGRS.2019.2913011
Diehl PU, Matthew C (2015) Unsupervised Learning of Digit Recognition Using Spike-Timing-Dependent Plasticity. Front Comput Neurosci 9:99. https://doi.org/10.3389/fncom.2015.00099
Soomro K , Zamir AR, Shah M (2012) UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild. Computer Science
Funding
The paper was supported by project of shenzhen science and technology innovation committee (JCYJ20190809145407809), Ministry of Education 1+X Certificate System 2020 Special Research Project Assessment and Evaluation System Construction (HX-0310), project of shenzhen Institute of Information Technology School-level Innovative Scientific Research Team (TD2020E001), Science and Technology Program of Guangzhou (No. 2019050001), Program for Guangdong Innovative and Enterpreneurial Teams (No. 2019BT02C241), Program for Chang Jiang Scholars and Innovative Research Teams in Universities (No. IRT_17R40), Guangdong Provincial Key Laboratory of Optical Information Materials and Technology (No. 2017B030301007), Guangzhou Key Laboratory of Electronic Paper Displays Materials and Devices (201705030007) and the 111 Project.
Author information
Authors and Affiliations
Contributions
Zhi Yao and Bao Peng designed the research and carried out the experiments. Hailing Sun and Guofu Zhou carried out experimental guidance and theoretical analysis. All authors contributed to the literature review, derivation of conclusion and edited the manuscript.
Corresponding author
Ethics declarations
Conflicts of Interest
The authors declare that they have no competing interests.
Data Availability
The UCF101 data set was used to support this study and its application details are in https://www.crcv.ucf.edu/research/data-sets/ucf101/. The data set is cited at relevant places within the text as references.
Code Availability
The program of this paper is supported by custom code. It can be applied from the corresponding author on reasonable request.
Ethics Approval
Not applicable.
Consent to Participate
Not applicable.
Consent for Publication
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Peng, B., Yao, Z., Wu, Q. et al. 3D Convolutional Neural Network for Human Behavior Analysis in Intelligent Sensor Network. Mobile Netw Appl 27, 1559–1568 (2022). https://doi.org/10.1007/s11036-021-01873-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11036-021-01873-8