Abstract
At present, the security of the Internet of things (IoT) has aroused great concern in artificial intelligence area, Artificial Intelligence of Things (AIoT) are widely used in various intelligent surveillance scenarios. However, due to the weak interpretability of the model and high data security risks, developing a robust and explainable deep learning network framework for scene understanding under AIoT is extremely difficult. In addition, the fusion of IoT and AI also poses several challenges. To solve these difficulties, we develop a self-learning and explainable deep learning network toward the security of AIoT. The constructed system contains video collection, upload and display as well as data analysis and early warning operation at the embedded device end, and automatically recognizes the behaviors of scene by our developed visual recognition algorithms. In addition, the cloud computing platform can be controlled through our developed network. Our developed visual recognition algorithms contribute to three aspects. First, we propose a lightweight reinforcement learning network model by extracting spatial–temporal feature of different behavior characteristic. Then, we propose a self-paced learning framework through fusing the deep reinforcement learning and transfer learning. Finally, we propose a multi-perspective deep transfer learning model to solve the problem of weak explanation of model. The experimental results show that our proposed model is able to provide high interpretability of model and outperforms the state-of-the-art methods.













Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The data that support the findings of this paper are available from the corresponding author.
References
Tironi M, Valderrama M (2019) The militarization of the urban sky in Santiago de Chile: the vision multiple of a video-surveillance system of aerostatic balloons. Urban Geogr 14:1–20
Zhang Y, Wan JF, Wang T, Zhang YH (2018) Physically-based rendering for indoor scene understanding using convolutional neural networks, In: Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, pp 980–988
Sakaridis C, Dai D, Van GL (2017) Semantic foggy scene understanding with synthetic data. Int J Comput Vis 8(2):108–120
Qiu Z, Zhuang Y, Hu H et al (2020) Using stacked sparse auto-encoder and superpixel CRF for long-term visual scene understanding of UGVs. IEEE Trans Syst Man Cybern Syst 50(4):1331–1342
Arulkumaran K, Deisenroth MP, Brundage M et al (2016) A brief survey of deep reinforcement learning. IEEE Signal Process Mag 34(6)
An S, Liu W, Venkatesh S (2007) Face recognition using kernel ridge regression. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp 1–7
Mirza M, Osindero S (2014) Conditional Generative Adversarial Nets, arXiv:1411.1784
Idrees H, Saleemi I, Seibert C, Shah M (2013) Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2547–2554
Pentina A, Sharmanska V, Lampert CH (2015) Curriculum learning of multiple tasks. In: Proceedings of the 28th IEEE Conference on Computer Vision and Pattern Recognition, pp 2547–2554
Lin L, Wang K, Meng D et al (2017) Active self-paced learning for cost-effective and progressive face identification. IEEE Trans Pattern Anal Mach Intell 99:7–19
Holzinger A, Biemann C, Constantinos SP, Douglas BK (2017) What do we need to build explainable ai systems for the medical domain?, arXiv:1411.1784
Arrieta AB, Diaz-Rodriguez N, Del Ser J, Bennetot A, Tabik S, Barbado A, Garcia S, Gil-Lopez S, Molina D, Benjamins R, Chatila R, Herrera F (2020) Explainable ARTIFICIAL INTELLIgence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf Fus 58:82–115
Khan SD, Basalamah S (2021) Scale and density invariant head detection deep model for crowd counting in pedestrian crowds. Vis Comput 37(4):1–11
Zhang C, Li H, Wang X, Yang X (2015) Cross-scene crowd counting via deep convolutional neural networks. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 833–841
Xia YZ, Zhang BL (2016) Face occlusion detection using deep convolutional neural networks. Int J Pattern Recogn Artif Intell 30(09):1–24
Fernández G, Svensson ÁFL, Morelande MR (2020) Multiple target tracking based on sets of trajectories. IEEE Trans Aerosp Electr Syst 56(3):1685–1707
Hui L , Zhaohong D , Haitao Y et al (2021) circRNA-binding protein site prediction based on multi-view deep learning, subspace learning and multi-view classifier. Brief Bioinf pp 990–1012.
Zhang C, Li HS, Wang XG, Yang XK (2015) Cross-scene crowd counting via deep convolutional neural networks. In CVPR, pp 833–841
Radford A, Metz L, Chintala S (2015) Unsupervised representation learning with deep convolutional generative adversarial networks, Comput Sci pp 513–527
Xing JL, Niu ZH, Huang JS, Hu WM, Zhou X, Yan SC (2018) Towards robust and accurate multi-view and partially-occluded face alignment. IEEE Trans Pattern Anal Mach Intell, pp 987–1001
Li T, Chang H, Wang M, Ni B, Hong R, Yan S (2015) Crowded scene analysis: a survey. IEEE Trans Circuits Syst Video Technol 25(3):367–386
Habite T, Abdeljaber O, Olsson A (2021) Automatic detection of annual rings and pith location along Norway spruce timber boards using conditional adversarial networks. Wood Sci Technol 55(2):461–488
Johnson J, Alahi A, Fei-Fei L (2016) Perceptual losses for real-time style transfer and super-resolution. In: European Conference on Computer Vision, pp 694–711. Springer
Li C, Wand M (2016) Precomputed real-time texture synthesis with markovian generative adversarial networks. In: European Conference on Computer Vision, pp 702–716. Springer
Chen YJ, Song LX, He R (2018) Adversarial occlusion-aware face detection. In: 4th Asian Conference on Pattern Recognition, pp 354–361
Zhao F, Feng JS, Zhao J, Yang WH, Yan SC (2018) Robust LSTM-autoencoders for face de-occlusion in the wild. IEEE Trans Image Process 27(2):778–790
Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016) Singleimage crowd counting via multi-column convolutional neural network. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 589–597
Song J (2020) Binary generative adversarial networks for image retrieval. Int J Comput Vision 2:1–22
Yang B, Kang Y, Yuan YY et al (2021) ST-LBAGAN: spatio-temporal learnable bidirectional attention generative adversarial networks for missing traffic data imputation. Knowl-Based Syst 215(10):106705
Li Y, Liu S, Yang J, Yang M-H( 2017) Generative face completion. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1180–1188
Zeng L, Xu XM, Cai BL, Qiu S, Zhang T (2017) Multi-scale convolutional neural networks for crowd counting. In ICIP, pp 465–469. IEEE
Dar SU, Yurt M, Ildz ME et al (2020) Prior-guided image reconstruction for accelerated multi-contrast MRI via generative adversarial networks. IEEE J Select Top Signal Process 99:1–12
Sam DB, Surya S, Babu RV (2017) Switching convolutional neural network for crowd counting. In: 2017 IEEE Conference on computer vision and pattern recognition (CVPR), pp 2180–2188
Olmschenk G, Wang X, Tang H et al (2021) Impact of labeling schemes on dense crowd counting using convolutional neural networks with multiscale upsampling. Int J Pattern Recognit Artif Intell 4(3):1190–1198
Xla B, Jsa B, Wwa B et al (2021) Density-aware and background-aware network for crowd counting via multi-task learning. Pattern Recogn Lett 2(3):2190–2198
Pan X, Zhao J, Xu J (2020) Conditional generative adversarial network-based training sample set improvement model for the semantic segmentation of high-resolution remote sensing images. IEEE Trans Geosci Remote Sens pp 2190–2203
Sindagi VA, Patel VM (2017) Generating high-quality crowd density maps using contextual pyramid cnns. In: 2017 IEEE International Conference on Computer Vision, pp 43962–4972
Xu D, Yang WLO, Alameda-Pineda X, Ricci E, Wang XG, Sebe N (2017) Learning deep structured multi-scale features using attention-gated crfs for contour prediction. In: NIPS, pp 3961–3970
Zhang L, Dai J, Lu HC, He Y, Wang G (2018) A bi-directional message passing model for salient object detection. In CVPR, pp 1741–1750
Tavakkoli A, Kamran SA, Hossain KF et al (2020) A novel deep learning conditional generative adversarial network for producing angiography images from retinal fundus photographs. Sci Rep 10(1):789–798
Shen Z, Xu Y, Ni BB, Wang M, Hu JG, Yang XK (2018) Crowd counting via adversarial cross-scale consistency pursuit. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5245–5254
Li Y, Chen X, Zhu Z, Xie L, Huang G, Du D, Wang X (2018) Attention-guided unified network for panoptic segmentation (CVPR), pp 1812–1821
Sam DB, Babu RV (2018) Top-down feedback for crowd counting convolutional neural network, In AAAI, pp 1517–1425
Liu WZ, Salzmann M, Fua P (2018) Contextaware crowd counting. arXiv preprint, arXiv:1811.10452
Zhang L, Shi MJ, Chen QB (2018) Crowd counting via scale-adaptive convolutional neural network, In WACV. IEEE, pp 1427–1440
Liu N, Long YC, Zou CQ, Niu Q, Pan L, Wu HF (2018) Adcrowdnet: An attention-injective deformable convolutional network for crowd understanding, arXiv preprint arXiv:1811.11968.
Liu YB, Jia RS, Liu QM et al (2021) Crowd counting method based on the self-attention residual network. Appl Intell 51(1):427–440
Hu J, Shen L, Albanie S, Sun G, Wu EH (2017) Squeeze-and-excitation networks, In arXiv:1709.01507
Goodfellow IJ, Abadie JP, Mirza M, Xu B, Farley DW, Ozair S, Courville A, Bengio YS (2017) GenerativeAdversarialNets, In arXiv:1406.2661v1
Hagras H (2018) Toward human-understandable, explainable AI. Computer 51(09):28–36
Punjabi A, Katsaggelos AK (2017) Visualization of feature evolution during convolutional neural network training, 2017 25th European Signal Processing Conference (EUSIPCO). Kos 2017:311–315
Samek W, Wiegand T, Mller KR (2018) Explainable artificial intelligence: understanding, visualizing and interpreting deep learning models. ITU J ICT Discov Special Issue Impact Artif Intell (AI) Commun Netw Serv 1(1):3948–3958
Mao J, Huang J, Toshev A, Camburu O, Yuille A, Murphy K (2016) Generation and comprehension of unambiguous object descriptions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp 1696–1709
Cheng Y, Jiang H, Wang F et al (2018) Using high-bandwidth networks efficiently for fast graph computation. IEEE Trans Parallel Distrib Syst 2(3):1–21
Zhang T, Jia WJ, He XJ, Yang J (2017) Discriminative Dictionary learning with motion weber local descriptor for violence detection. IEEE Trans Circuits Syst Video Technol 27(3):696–709
de Souza Jr LA, Mendel R, Strasser S et al (2021) Convolutional neural networks for the evaluation of cancer in Barrett’s esophagus: explainable AI to Lighten up the black-box. Comput Biol Med 2(3):812–828
Hossain MS, Muhammad G, Guizani N (2020) Explainable AI and mass surveillance system-based healthcare framework to combat COVID-19 like pandemics. IEEE Network 99:1–7
Zhang T, Yang ZJ, Jia WJ, Wu Q, Yang J, He XJ (2015) Fast and robust head detection with arbitrary pose and occlusion. Multim Tools Appl 74(21):9365–9385
Zhang T, Yang ZJ, Jia WJ, Yang BQ, Yang J, He XJ (2016) A new method for violence detection in surveillance scenes. Multim Tools Appl 74(12):7327–7349
Cheng Y, Wang F, Jiang H et al (2018) A communication-reduced and computation-balanced framework for fast graph computation. Front Comp Sci 12(5):1222–1238
Han L, Li KC, Castiglione A et al (2021) A clique-based discrete bat algorithm for influence maximization in identifying top-k influential nodes of social networks. Soft Comput 25(13):8223–8240
Zhang T, Jia WJ, Li JJ, Sun J, Yang HH (2018) Fast and robust occluded face detection in ATM surveillance. Pattern Recogn Lett 107:33–40
G. L, S. H, Z. W, (2017) Efficient approximation algorithms for multi-antennae largest weight data retrieval. IEEE Trans Mob Comput 16(12):3320–3333
Nirmala PG (2020) Comparison of partially occluded face detection and recognition methods. J Adv Res Dyn Control Syst 12(SP7):201–211
Ernst MR, Triesch J, Burwick T (2021) Recurrent feedback improves recognition of partially occluded objects. Digit Signal Process 6(3):120–129
Zhang T, Jia WJ, Gong C, Sun J, Song XN (2018) Semi-supervised dictionary learning via local sparse constraints for violence detection. Pattern Recogn Lett 107:98–104
Niu Y, Lin W, Ke X (2018) CF-based optimisation for saliency detection. IET Comput Vis 12(4):365–376
Tao Z, Zou J, Jia W (2019) Fast and robust road sign detection in color images. Appl Intell 48:4113–4127
Zhang T, Jia WJ, Yang BQ, Yang J, He XJ, Zheng ZL (2017) MoWLD: a robust motion image descriptor for violence detection. Multim Tools Appl 76(1):1419–1438
Wang S, Guo W (2017) Sparse multi-graph embedding for multimodal feature representation. IEEE Trans Multim 99:1–1
Niu Y, Chen J, Guo W (2018) Meta-metric for saliency detection evaluation metrics based on application preference. Multimed Tools Appl. https://doi.org/10.1007/s11042-018-5863-2
Z, Jian & Dong, Le & Wu, L. Wen, (2017) New Algorithms for the Unbalanced Generalized Birthday Problem. IET Inf Secur. https://doi.org/10.1049/iet-ifs.2017.0495
Lin B, Guo W, Xiong N, Chen G, Vasilakos AV, Zhang H (2016) A pretreatment workflow scheduling approach for big data applications in multicloud environments. IEEE Trans Netw Service Manage 13(3):581–594
Liu G, Chen Z, Zhuang Z, Guo W, Chen G (2015) A unified algorithm based on HTS and self-adapting PSO for the construction of octagonal and rectilinear SMT. Soft Comput 24(6):3943–3961. https://doi.org/10.1007/s00500-019-04165-2
Liu G, Guo W, Li R et al (2015) XGRouter: high-quality global router in X-architecture with particle swarm optimization. Front Comp Sci 9(4):576–594
Liu G, Guo W, Li R, Niu Y, Chen G (2015) XGRouter: high-quality global router in X-architecture with particle swarm optimization. Front Comput Sci 9(4):576–594
Liu G, Guo W, Niu Y, Chen G, Huang X (2015) A PSO-based-timing-driven octilinear steiner tree algorithm for VLSI routing considering bend reduction. Soft Comput 19(5):1153–1169. https://doi.org/10.1007/s00500-014-1329-2
Liu G, Huang X, Guo W, Niu Y, Chen G (2015) Multilayer obstacle-avoiding x-architecture steiner minimal tree construction based on particle swarm optimization. IEEE Trans Cybern 45(5):989–1002. https://doi.org/10.1109/TCYB.2014.2342713
Ma T, Liu Q, Cao J, Tian Y, Al-Dhelaan A, Al-Rodhaan M (2020) M LGIEM: global and local node influence based community detection. Fut Gener Comput Syst 105:533–546
Ye Q, Li Z, Fu L, Zhang Z, Yang W, Yang GW (2019) G nonpeaked discriminant analysis for data representation. IEEE Trans Neural Netw Learn Syst 30(12):3818–3832
Liu G (2021) Attribute reduction algorithms determined by invariants for decision tables. Cognit Comput pp 818–832
Cheng Z, Chen N, Liu B et al (2020) Joint user association and resource allocation in hetnets based on user mobility prediction. Comput Netw 177:107312
Acknowledgements
This work is supported by the National Natural Science Foundation of China (No. 61702226); the 111 Project (B12018); open Fund of Jiangsu Key Laboratory of Image and Video Understanding for Social Safety, Nanjing University of Science and Technology, Nanjing (J2021-7).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wu, B., He, S. Self-learning and explainable deep learning network toward the security of artificial intelligence of things. J Supercomput 79, 4436–4467 (2023). https://doi.org/10.1007/s11227-022-04818-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-022-04818-4