Deep Reinforcement Learning Based on Spatial-Temporal Context for IoT Video Sensors Object Tracking

He, Panbo; Wu, Chunxue; Liu, Kaijun; Xiong, Neal N.

doi:10.1007/978-3-030-74717-6_24

Panbo He⁹,
Chunxue Wu⁹,
Kaijun Liu⁹ &
…
Neal N. Xiong¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12608))

Included in the following conference series:

International Conference on Smart Computing and Communication

512 Accesses

Abstract

The Internet of Things (IoT) is the upcoming one of the major networking technologies. Using the IoT, different items or devices can be allowed to continuously generate, obtain, and exchange information. The new video sensor network has gradually become a research hotspot in the field of wireless sensor network, and its rich perceptual information is more conducive to the realization of target positioning and tracking function. This paper presents a novel model for IoT video sensors object tracking via deep Reinforcement Learning (RL) algorithm and spatial-temporal context learning algorithm, which provides a tracking solution to directly predict the bounding box locations of the target at every successive frame in video surveillance. Crucially, this task is tackled in an end-to-end approach. Considering the tracking task can be processed as a sequential decision-making process and historical semantic coding that is highly relevant to future decision-making information. So a recurrent convolutional neural network is adopted acting as an agent in this model, with the important insight that it can interact with the video overtime. In order to maximize tracking performance and make a great use the continuous, inter-frame correlation in the long term, this paper harnesses the power of deep reinforcement learning (RL) algorithm. Specifically, Spatial-Temporal Context learning (STC) algorithm is added into our model to achieve its tracking performance more efficiently. The tracking model proposed above demonstrates good performance in an existing tracking benchmark.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Wu, Y., Lim, J., Yang, M.H.: Online object tracking: a benchmark. IEEE Conf. Comput. Vis. Pattern Recogn. 9(4), 2411–2418 (2013)
Google Scholar
Wang, N., Yeung, D.Y.: Learning a deep compact image representation for visual tracking. Int. Conf. Neural Inf. Process. Syst. 1, 809–817 (2013)
Google Scholar
Wang, L., Ouyang, W., Wang, X.: Visual tracking with fully convolutional networks. In: IEEE International Conference on Computer Vision, pp. 3119–3127. IEEE (2016)
Google Scholar
Wu, P.F., Xiao, F., Sha, C., Huang, H.P., Wang, R.C., Xiong, N.: Node scheduling strategies for achieving full-view area coverage in camera sensor networks. Sensors 17(6), 1303–1307 (2017)
Article Google Scholar
Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4293–4302 (2016)
Google Scholar
Cui, Z., Xiao, S., Feng, J., Yan, S.: Recurrently target-attending tracking. In: IEEE Conference on Computer Vision & Pattern Recognition, pp. 1449–1458. IEEE Computer Society (2016)
Google Scholar
Girshick, R., Donahue, J., Darrell, T.: Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 142–158 (2015)
Article Google Scholar
Gui, J., Hui, L., Xiong, N.X.: A game-based localized multi-objective topology control scheme in heterogeneous wireless networks. IEEE Access 5, 2396–2416 (2017)
Article Google Scholar
Xia, Z., Wang, X., Sun, X., Liu, Q., Xiong, N.: Steganalysis of LSB matching using differences between nonadjacent pixels. Multimed. Tools Appl. 75, 1947–1962 (2016). https://doi.org/10.1007/s11042-014-2381-8
Article Google Scholar
Gao, L., Yu, F., Chen, Q., Xiong, N.: Consistency maintenance of do and undo/redo operations in real-time collaborative bitmap editing systems. Clust. Comput. 19(1), 255–267 (2015). https://doi.org/10.1007/s10586-015-0499-8
Article Google Scholar
Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016. ECCV 2016. Lecture Notes in Computer Science, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_56
Chapter Google Scholar
Fang, W., Li, Y., Zhang, H., Xiong, N., Lai, J., Vasilakos, A.V.: On the through put-energy trade off for data transmission between cloud and mobile devices. Inf. Sci. 283, 79–93 (2014)
Article Google Scholar
Lu, X., Chen, S., Xiong, N.: ViMediaNet: an emulation system for interactive multimedia based telepresence services. J. Super Comput. (SCI Indexed) 73, 3562–3578 (2017)
Google Scholar
Zhang, D., Maei, H., Wang, X., Wang, Y.F.: Deep Reinforcement Learning for Visual Object Tracking in Videos, p. 10. arXiv preprint (2017)
Google Scholar
Zhou, X., Liu, X., Yang, C., Jiang, A., Yan, B.: Multi-channel features spatio-temporal context learning for visual tracking. IEEE Access 5, 12856–12864 (2017)
Article Google Scholar
Baek, S., Kim, K.I., Kim, T.: Real-time online action detection forests using spatio-temporal contexts. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, CA, pp. 158–167(2017)
Google Scholar
Girdhar, R., Ramanan, D., Gupta, A., Sivic, J., Russell, B.: ActionVLAD: learning spatio-temporal aggregation for action classification. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3165–3174 (2017)
Google Scholar
Lee, H., Jung, M., Tani, J.: Recognition of visually perceived compositional human actions by multiple spatio-temporal scales recurrent neural networks. IEEE Trans. Cogn. Dev. Syst. 10(4), 1058–1069 (2018)
Article Google Scholar
Wang, Y., et al.: Dynamic propagation characteristics estimation and tracking based on an EM-EKF algorithm in time-variant MIMO channel. Inf. Sci. 408, 70–83 (2017)
Article Google Scholar
Lu, Z., Lin, Y.-R., Huang, X., Xiong, N., Fang, Z.: Visual topic discovering, tracking and summarization from social media streams. Multimed. Tools Appl. 76(8), 10855–10879 (2016). https://doi.org/10.1007/s11042-016-3877-1
Article Google Scholar
He, S., Yang, Q., Wang, J., Yang, M.H.: Visual tracking via locality sensitive histograms. In: Computer Vision and Pattern Recognition. IEEE 2013, pp. 2427–2434 (2013)
Google Scholar
Shu, L., Fang, Y., Fang, Z., Yang, Y., Fei, F., Xiong, N.: A novel objective quality assessment for super-resolution images. Int. J. Signal Process. Image Process. Pattern Recogn. 9(5), 297–308 (2016)
Google Scholar
Xu, T., Feng, Z.H., Wu, X.J., Kittler, J.: Learning adaptive discriminative correlation filters via temporal consistency preserving spatial feature selection for robust visual object tracking. IEEE Trans Image Process. 28(11), 5596–5609 (2019)
Article MathSciNet Google Scholar
Zhang, T.Z., Liu, S., Yan, S.C., Ghanem, B., Ahuja, N., Yang, M.H.: Structural sparse tracking. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, pp. 150–158. IEEE (2015)
Google Scholar
Xiong, N., Liu, R.W., Liang, M., Liu, Z., Wu, H.: Effective alternating direction optimization methods for sparsity-constrained blind image deblurring. Sensors 7, 174–182 (2017)
Article Google Scholar
Zhang, H., Liu, R.W., Wu, D., Liu, Y., Xiong, N.N: Non-convex total generalized variation with spatially adaptive regularization parameters for edge-preserving image restoration. J. Internet Technol. 17(7), 1391–1403 (2016)
Google Scholar
Xia, Z., Xiong, N.N., Vasilakosc, A.V., Sun, X.: EPCBIR: an efficient and privacy-preserving content-based image retrieval scheme in cloud computing. Inf. Sci. 387, 195–204 (2017)
Article Google Scholar
Fang, Y., Fang, Z., Yuan, F., Yang, Y., Yang, S., Xiong, N.N.: Optimized Multi-operator Image Retargeting Based on Perceptual Similarity Measure. IEEE Transactions on Systems, Man, and Cybernetics: Systems 47, 1–11 (2016)
Google Scholar
Zhang, C., Wu, D., Xiong, N., et al.: Non-local regularized variational model for image deblurring under mixed gaussian-impulse noise. J. Internet Technol. 16(7), 1301–1320 (2015)
Google Scholar
Ba, J., Mnih, V., Kavukcuoglu, K.: Multiple object recognition with visual attention. Comput. Sci. 1–10 (2014)
Google Scholar
Xu, K., Ba, J., Kiros, R.: Show, attend and tell: neural image caption generation with visual attention. In: Computer Science, pp. 2048–2057 (2015)
Google Scholar
Ning, G., et al.: Spatially supervised recurrent convolutional neural networks for visual object tracking. In: IEEE International Symposium on Circuits and Systems. IEEE, pp. 1–4 (2017)
Google Scholar
Zhang, H.Y., Zheng, X.: Spatio-temporal context tracking algorithm based on dual-object model. Optics Preci. Eng. 24(5), 1215–1223 (2016)
Article Google Scholar
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. In: International Conference on Learning Representation. ICLR, pp. 1095–32 (2015)
Google Scholar
Hare, S., Saffari, A., Torr, P.H.S.: Struck: structured output tracking with kernels. In: IEEE International Conference on Computer Vision, ICCV 2011, pp. 6–11 (2011)
Google Scholar
Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1409–1422 (2012)
Article Google Scholar
Kwon, J., Lee, K.M.: Tracking by sampling trackers. In: IEEE International Conference on Computer Vision. IEEE, pp. 1195–1202 (2011)
Google Scholar
Shahzad, A., et al.: Real time MODBUS transmissions and cryptography security designs and enhancements of protocol sensitive information. Symmetry 7(3), 1176–1210 (2015)
Article MathSciNet Google Scholar
Huang, K., Zhang, Q., Zhou, C., Xiong, N., Qin, Y.: An efficient intrusion detection approach for visual sensor networks based on traffic pattern learning. IEEE Trans. Syst. Man Cybern. Syst. 47(10), 2704–2713 (2017)
Article Google Scholar
Wu, W., Xiong, N., Wu, C.: Improved clustering algorithm based on energy consumption in wireless sensor networks. IET Netw. 6(3), 47–53 (2017)
Article Google Scholar
Chunxue, W., et al.: UAV autonomous target search based on deep reinforcement learning in complex disaster scene. IEEE Access 7, 117227–117245 (2019)
Article Google Scholar
Ling-Fang Li, X., Wang, W.-J., Xiong, N.N., Yong-Xing, D., Li, B.-S.: Deep learning in skin disease image recognition. a review. IEEE Access 8, 208264–208280 (2020)
Article Google Scholar

Download references

Acknowledgements

This research was supported by Shanghai Science and Technology Innovation Action Plan Project (16111107502, 17511107203) Shanghai key lab of modern optical systems.

Author information

Authors and Affiliations

University of Shanghai for Science and Technology, Shanghai, 200093, China
Panbo He, Chunxue Wu & Kaijun Liu
Department of Mathematics and Computer Science, Northeastern State University, Tahlequah, OK, 74464, USA
Neal N. Xiong

Authors

Panbo He
View author publications
You can also search for this author in PubMed Google Scholar
Chunxue Wu
View author publications
You can also search for this author in PubMed Google Scholar
Kaijun Liu
View author publications
You can also search for this author in PubMed Google Scholar
Neal N. Xiong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chunxue Wu .

Editor information

Editors and Affiliations

Texas A&M University – Commerce, Commerce, TX, USA
Meikang Qiu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

He, P., Wu, C., Liu, K., Xiong, N.N. (2021). Deep Reinforcement Learning Based on Spatial-Temporal Context for IoT Video Sensors Object Tracking. In: Qiu, M. (eds) Smart Computing and Communication. SmartCom 2020. Lecture Notes in Computer Science(), vol 12608. Springer, Cham. https://doi.org/10.1007/978-3-030-74717-6_24

Download citation

DOI: https://doi.org/10.1007/978-3-030-74717-6_24
Published: 17 April 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-74716-9
Online ISBN: 978-3-030-74717-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics