Skip to main content

Advertisement

Log in

3D Convolutional Neural Network for Human Behavior Analysis in Intelligent Sensor Network

  • Published:
Mobile Networks and Applications Aims and scope Submit manuscript

Abstract

The intelligent recognition of human behavior and action in massive video data is the key application direction in the field of artificial intelligence. With the development of intelligent communication network, multimedia communication has become a hot spot in the field of video analysis. 3D convolution is an efficient deep learning model. It can learn the temporal and spatial features of target images at the same time. A 3D max residual feature map convolution network (3D-MRCNN) is proposed in this paper. Problems can be solved by the proposed model that the deficiencies of the network degradation and gradient disappearance caused by convolution calculation. The proposed model is preprocessed by 2D convolution firstly. A learning network including 3D-max feature map (3D-MFM) and residual structure is established after the convolution splitting is completed. Finally, the output vectors corresponding to the two different inputs are connected and fused into the support vector machine (SVM) classification. The accuracy of 3D-MRCNN can achieve 85.7% by experimenting on the representative UCF101 data set. And it has higher accuracy and operating efficiency compared with the models which have strong correlation with 3D-MRCNN.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Abbreviations

3D-MRCNN:

3D max residual feature map convolution network

3D-MFM:

3D-max feature map

SVM:

Support vector machine

IDT:

Improved dense trajectory

TSN:

Temporal Segment Networks

C3D:

Convolution 3D

P3D:

Pseudo-3D Residual Net

References

  1. Pandeya YR, Lee JW (2021) Deep learning-based late fusion of multimodal information for emotion classification of music video. Multimedia Tools and Applications 80(38):1–19. https://doi.org/10.1007/s11042-020-08836-3

    Article  Google Scholar 

  2. Li H-F, Cryer S, Acharya LP, Raymond J (2020) Video and image classification using atomisation s-pray image patterns and deep learning. Biosystems Engine-ering 200:13–22. https://doi.org/10.1016/j.bio-systemseng.2020.08.016

    Article  Google Scholar 

  3. Wang H, Schmid C (2013) Action Recognition with Improved Trajectories. IEEE International Conference on Computer Vision:3551–3558. https://doi.org/10.1109/ICCV.2013.441

  4. Simonyan K, Zisserman A (2014) Two-Stream convolution Networks for Action Recognition in Videos. Computer Vision and Pattern Recognition.

  5. Ning L, Wang Z, Guo Q (2014) Preferred Route Indoor Mobility Model for Heterogeneous Networks. IEEE Commun Lett 18(5):821–824. https://doi.org/10.1109/LCOMM.2014.033114.140344

    Article  Google Scholar 

  6. Zhang RZ, Zhong H, Zheng TY, Ning L (2021) Trajectory Mining-Based City-Level Mobility Model for 5G NB-IoT Networks. Wirel Commun Mob Comput 2021:12. https://doi.org/10.1155/2021/5356193

    Article  Google Scholar 

  7. Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional Two-Stream Network Fusion for Video Action Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition:1933–1941. https://doi.org/10.1109/CVPR.2016.213

  8. Ng JY-H, Hausknecht M, Vijayanarasimhan S et al (2015) Beyond short snippets: Deep networks for video classification. 2015 IEEE Conference on Computer Vision and Pattern Recognition:4694–4702. https://doi.org/10.1109/CVPR.2015.7299101

  9. Wang LM, Xiong XJ, Wang Z et al (2016) Temporal Segment Networks: Towards Good Practices for Deep Action Recognition. Computer Vision–ECCV 2016 9912:20–36. https://doi.org/10.1007/978-3-319-46484-8_2

  10. Varol G, Laptev, et al (2017) Long-term temporal convolutions for action recognition. IEEE Trans Pattern Anal Mach Intell 40(6):1510–1517. https://doi.org/10.1109/TPAMI.2017.2712608

    Article  Google Scholar 

  11. Ji SW, Xu W, Yang M, Yu K et al (2010) 3D convolution Neural Networks for Human Action Recognition. IEEE Trans Pattern Anal Mach Intell 35(1):495–502. https://doi.org/10.1109/TPAMI.2012.59

    Article  Google Scholar 

  12. Tran D, Bourdev L, Fergus R, Torresani L (2015) Learning spatiotemporal features with 3d convolution networks. IEEE International Conference onComputer Vision:4489–4497.

  13. Qiu ZF, Yao T, Mei T (2017) Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks. 2017 IEEE International Conference on Computer Vision:5534–5542. https://doi.org/10.1109/ICCV.2017.590

  14. Zhang QW, Huang KQ, Wang X, Jiang B, Gan Y (2019) Efficient multiview video plus depth coding for 3D-HEVC based on complexity classification of the treeblock. Real-Time Image Processing 16(6):1909–1926. https://doi.org/10.1007/s11554-017-0692-5

    Article  Google Scholar 

  15. Wu X, He R, Sun ZN, Tan TN (2018) A Light CNN for Deep Face Representation With Noisy Labels IEEE Transactions on Information Forensics and Security 13(11):2884–2896. https://doi.org/10.1109/TIFS.2018.2833032

    Article  Google Scholar 

  16. Gayathri N, Mahesh K (2020) Improved Fuzzy-Based SVM Classification System Using Feature Extraction for Video Indexing and Retrieval. Int J Fuzzy Syst 22(8):1716–1729. https://doi.org/10.1007/s40815-020-00884-z

    Article  Google Scholar 

  17. Chen YS, Guo B, Wang W, Suo XH, Zhang Z (2020) Using efficient group pseudo-3D network to learn spatio-temporal features. SIViP 15(2):361–369. https://doi.org/10.1007/s11760-020-01758-5

    Article  Google Scholar 

  18. Binol H, Aaron C-M, M. Khalid K-N et al (2020) SelectStitch: Automated Frame Segmentation and Stitching to Create Composite Images from Otoscope Video Clips. https://doi.org/10.1101/2020.08.12.20173765

  19. Yu WY, Zhao M, Xu J et al (2020) Feature extraction of positron image and imaging algorithm based on 3D convolution operation. Optik 217:164952. https://doi.org/10.1016/j.ijleo.2020.164952

    Article  Google Scholar 

  20. Chollet F (2017) Xception: Deep Learning with Depthwise Separable Convolutions. 2017 IEEE Conference on Computer Vision and Pattern Recognition:1800–1807. https://doi.org/10.1109/CVPR.2017.195

  21. Lu WD, Gong Y, Liu X et al (2018) Collaborative Energy and Information Transfer in Green Wireless Sensor Networks for Smart Cities. IEEE Trans Industr Inf 14(4):1585–1593. https://doi.org/10.1109/TII.2017.2777846

    Article  Google Scholar 

  22. Jiang YN, Li Y, Zhang HK (2019) Hyperspectral Image Classification Based on 3-D Separable ResNet and Transfer Learning. IEEE Geosci Remote Sens Lett 16(12):1949–1953. https://doi.org/10.1109/LGRS.2019.2913011

    Article  Google Scholar 

  23. Diehl PU, Matthew C (2015) Unsupervised Learning of Digit Recognition Using Spike-Timing-Dependent Plasticity. Front Comput Neurosci 9:99. https://doi.org/10.3389/fncom.2015.00099

    Article  Google Scholar 

  24. Soomro K , Zamir AR, Shah M (2012) UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild. Computer Science

Download references

Funding

The paper was supported by project of shenzhen science and technology innovation committee (JCYJ20190809145407809), Ministry of Education 1+X Certificate System 2020 Special Research Project Assessment and Evaluation System Construction (HX-0310), project of shenzhen Institute of Information Technology School-level Innovative Scientific Research Team (TD2020E001), Science and Technology Program of Guangzhou (No. 2019050001), Program for Guangdong Innovative and Enterpreneurial Teams (No. 2019BT02C241), Program for Chang Jiang Scholars and Innovative Research Teams in Universities (No. IRT_17R40), Guangdong Provincial Key Laboratory of Optical Information Materials and Technology (No. 2017B030301007), Guangzhou Key Laboratory of Electronic Paper Displays Materials and Devices (201705030007) and the 111 Project.

Author information

Authors and Affiliations

Authors

Contributions

Zhi Yao and Bao Peng designed the research and carried out the experiments. Hailing Sun and Guofu Zhou carried out experimental guidance and theoretical analysis. All authors contributed to the literature review, derivation of conclusion and edited the manuscript.

Corresponding author

Correspondence to Qibao Wu.

Ethics declarations

Conflicts of Interest

The authors declare that they have no competing interests.

Data Availability

The UCF101 data set was used to support this study and its application details are in https://www.crcv.ucf.edu/research/data-sets/ucf101/. The data set is cited at relevant places within the text as references.

Code Availability

The program of this paper is supported by custom code. It can be applied from the corresponding author on reasonable request.

Ethics Approval

Not applicable.

Consent to Participate

Not applicable.

Consent for Publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Peng, B., Yao, Z., Wu, Q. et al. 3D Convolutional Neural Network for Human Behavior Analysis in Intelligent Sensor Network. Mobile Netw Appl 27, 1559–1568 (2022). https://doi.org/10.1007/s11036-021-01873-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11036-021-01873-8

Keyword

Navigation