Skip to main content

Deep Reinforcement Learning Based on Spatial-Temporal Context for IoT Video Sensors Object Tracking

  • Conference paper
  • First Online:
Smart Computing and Communication (SmartCom 2020)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12608))

Included in the following conference series:

  • 512 Accesses

Abstract

The Internet of Things (IoT) is the upcoming one of the major networking technologies. Using the IoT, different items or devices can be allowed to continuously generate, obtain, and exchange information. The new video sensor network has gradually become a research hotspot in the field of wireless sensor network, and its rich perceptual information is more conducive to the realization of target positioning and tracking function. This paper presents a novel model for IoT video sensors object tracking via deep Reinforcement Learning (RL) algorithm and spatial-temporal context learning algorithm, which provides a tracking solution to directly predict the bounding box locations of the target at every successive frame in video surveillance. Crucially, this task is tackled in an end-to-end approach. Considering the tracking task can be processed as a sequential decision-making process and historical semantic coding that is highly relevant to future decision-making information. So a recurrent convolutional neural network is adopted acting as an agent in this model, with the important insight that it can interact with the video overtime. In order to maximize tracking performance and make a great use the continuous, inter-frame correlation in the long term, this paper harnesses the power of deep reinforcement learning (RL) algorithm. Specifically, Spatial-Temporal Context learning (STC) algorithm is added into our model to achieve its tracking performance more efficiently. The tracking model proposed above demonstrates good performance in an existing tracking benchmark.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Wu, Y., Lim, J., Yang, M.H.: Online object tracking: a benchmark. IEEE Conf. Comput. Vis. Pattern Recogn. 9(4), 2411–2418 (2013)

    Google Scholar 

  2. Wang, N., Yeung, D.Y.: Learning a deep compact image representation for visual tracking. Int. Conf. Neural Inf. Process. Syst. 1, 809–817 (2013)

    Google Scholar 

  3. Wang, L., Ouyang, W., Wang, X.: Visual tracking with fully convolutional networks. In: IEEE International Conference on Computer Vision, pp. 3119–3127. IEEE (2016)

    Google Scholar 

  4. Wu, P.F., Xiao, F., Sha, C., Huang, H.P., Wang, R.C., Xiong, N.: Node scheduling strategies for achieving full-view area coverage in camera sensor networks. Sensors 17(6), 1303–1307 (2017)

    Article  Google Scholar 

  5. Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4293–4302 (2016)

    Google Scholar 

  6. Cui, Z., Xiao, S., Feng, J., Yan, S.: Recurrently target-attending tracking. In: IEEE Conference on Computer Vision & Pattern Recognition, pp. 1449–1458. IEEE Computer Society (2016)

    Google Scholar 

  7. Girshick, R., Donahue, J., Darrell, T.: Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 142–158 (2015)

    Article  Google Scholar 

  8. Gui, J., Hui, L., Xiong, N.X.: A game-based localized multi-objective topology control scheme in heterogeneous wireless networks. IEEE Access 5, 2396–2416 (2017)

    Article  Google Scholar 

  9. Xia, Z., Wang, X., Sun, X., Liu, Q., Xiong, N.: Steganalysis of LSB matching using differences between nonadjacent pixels. Multimed. Tools Appl. 75, 1947–1962 (2016). https://doi.org/10.1007/s11042-014-2381-8

    Article  Google Scholar 

  10. Gao, L., Yu, F., Chen, Q., Xiong, N.: Consistency maintenance of do and undo/redo operations in real-time collaborative bitmap editing systems. Clust. Comput. 19(1), 255–267 (2015). https://doi.org/10.1007/s10586-015-0499-8

    Article  Google Scholar 

  11. Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016. ECCV 2016. Lecture Notes in Computer Science, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

  12. Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_56

    Chapter  Google Scholar 

  13. Fang, W., Li, Y., Zhang, H., Xiong, N., Lai, J., Vasilakos, A.V.: On the through put-energy trade off for data transmission between cloud and mobile devices. Inf. Sci. 283, 79–93 (2014)

    Article  Google Scholar 

  14. Lu, X., Chen, S., Xiong, N.: ViMediaNet: an emulation system for interactive multimedia based telepresence services. J. Super Comput. (SCI Indexed) 73, 3562–3578 (2017)

    Google Scholar 

  15. Zhang, D., Maei, H., Wang, X., Wang, Y.F.: Deep Reinforcement Learning for Visual Object Tracking in Videos, p. 10. arXiv preprint (2017)

    Google Scholar 

  16. Zhou, X., Liu, X., Yang, C., Jiang, A., Yan, B.: Multi-channel features spatio-temporal context learning for visual tracking. IEEE Access 5, 12856–12864 (2017)

    Article  Google Scholar 

  17. Baek, S., Kim, K.I., Kim, T.: Real-time online action detection forests using spatio-temporal contexts. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, CA, pp. 158–167(2017)

    Google Scholar 

  18. Girdhar, R., Ramanan, D., Gupta, A., Sivic, J., Russell, B.: ActionVLAD: learning spatio-temporal aggregation for action classification. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3165–3174 (2017)

    Google Scholar 

  19. Lee, H., Jung, M., Tani, J.: Recognition of visually perceived compositional human actions by multiple spatio-temporal scales recurrent neural networks. IEEE Trans. Cogn. Dev. Syst. 10(4), 1058–1069 (2018)

    Article  Google Scholar 

  20. Wang, Y., et al.: Dynamic propagation characteristics estimation and tracking based on an EM-EKF algorithm in time-variant MIMO channel. Inf. Sci. 408, 70–83 (2017)

    Article  Google Scholar 

  21. Lu, Z., Lin, Y.-R., Huang, X., Xiong, N., Fang, Z.: Visual topic discovering, tracking and summarization from social media streams. Multimed. Tools Appl. 76(8), 10855–10879 (2016). https://doi.org/10.1007/s11042-016-3877-1

    Article  Google Scholar 

  22. He, S., Yang, Q., Wang, J., Yang, M.H.: Visual tracking via locality sensitive histograms. In: Computer Vision and Pattern Recognition. IEEE 2013, pp. 2427–2434 (2013)

    Google Scholar 

  23. Shu, L., Fang, Y., Fang, Z., Yang, Y., Fei, F., Xiong, N.: A novel objective quality assessment for super-resolution images. Int. J. Signal Process. Image Process. Pattern Recogn. 9(5), 297–308 (2016)

    Google Scholar 

  24. Xu, T., Feng, Z.H., Wu, X.J., Kittler, J.: Learning adaptive discriminative correlation filters via temporal consistency preserving spatial feature selection for robust visual object tracking. IEEE Trans Image Process. 28(11), 5596–5609 (2019)

    Article  MathSciNet  Google Scholar 

  25. Zhang, T.Z., Liu, S., Yan, S.C., Ghanem, B., Ahuja, N., Yang, M.H.: Structural sparse tracking. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, pp. 150–158. IEEE (2015)

    Google Scholar 

  26. Xiong, N., Liu, R.W., Liang, M., Liu, Z., Wu, H.: Effective alternating direction optimization methods for sparsity-constrained blind image deblurring. Sensors 7, 174–182 (2017)

    Article  Google Scholar 

  27. Zhang, H., Liu, R.W., Wu, D., Liu, Y., Xiong, N.N: Non-convex total generalized variation with spatially adaptive regularization parameters for edge-preserving image restoration. J. Internet Technol. 17(7), 1391–1403 (2016)

    Google Scholar 

  28. Xia, Z., Xiong, N.N., Vasilakosc, A.V., Sun, X.: EPCBIR: an efficient and privacy-preserving content-based image retrieval scheme in cloud computing. Inf. Sci. 387, 195–204 (2017)

    Article  Google Scholar 

  29. Fang, Y., Fang, Z., Yuan, F., Yang, Y., Yang, S., Xiong, N.N.: Optimized Multi-operator Image Retargeting Based on Perceptual Similarity Measure. IEEE Transactions on Systems, Man, and Cybernetics: Systems 47, 1–11 (2016)

    Google Scholar 

  30. Zhang, C., Wu, D., Xiong, N., et al.: Non-local regularized variational model for image deblurring under mixed gaussian-impulse noise. J. Internet Technol. 16(7), 1301–1320 (2015)

    Google Scholar 

  31. Ba, J., Mnih, V., Kavukcuoglu, K.: Multiple object recognition with visual attention. Comput. Sci. 1–10 (2014)

    Google Scholar 

  32. Xu, K., Ba, J., Kiros, R.: Show, attend and tell: neural image caption generation with visual attention. In: Computer Science, pp. 2048–2057 (2015)

    Google Scholar 

  33. Ning, G., et al.: Spatially supervised recurrent convolutional neural networks for visual object tracking. In: IEEE International Symposium on Circuits and Systems. IEEE, pp. 1–4 (2017)

    Google Scholar 

  34. Zhang, H.Y., Zheng, X.: Spatio-temporal context tracking algorithm based on dual-object model. Optics Preci. Eng. 24(5), 1215–1223 (2016)

    Article  Google Scholar 

  35. Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. In: International Conference on Learning Representation. ICLR, pp. 1095–32 (2015)

    Google Scholar 

  36. Hare, S., Saffari, A., Torr, P.H.S.: Struck: structured output tracking with kernels. In: IEEE International Conference on Computer Vision, ICCV 2011, pp. 6–11 (2011)

    Google Scholar 

  37. Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1409–1422 (2012)

    Article  Google Scholar 

  38. Kwon, J., Lee, K.M.: Tracking by sampling trackers. In: IEEE International Conference on Computer Vision. IEEE, pp. 1195–1202 (2011)

    Google Scholar 

  39. Shahzad, A., et al.: Real time MODBUS transmissions and cryptography security designs and enhancements of protocol sensitive information. Symmetry 7(3), 1176–1210 (2015)

    Article  MathSciNet  Google Scholar 

  40. Huang, K., Zhang, Q., Zhou, C., Xiong, N., Qin, Y.: An efficient intrusion detection approach for visual sensor networks based on traffic pattern learning. IEEE Trans. Syst. Man Cybern. Syst. 47(10), 2704–2713 (2017)

    Article  Google Scholar 

  41. Wu, W., Xiong, N., Wu, C.: Improved clustering algorithm based on energy consumption in wireless sensor networks. IET Netw. 6(3), 47–53 (2017)

    Article  Google Scholar 

  42. Chunxue, W., et al.: UAV autonomous target search based on deep reinforcement learning in complex disaster scene. IEEE Access 7, 117227–117245 (2019)

    Article  Google Scholar 

  43. Ling-Fang Li, X., Wang, W.-J., Xiong, N.N., Yong-Xing, D., Li, B.-S.: Deep learning in skin disease image recognition. a review. IEEE Access 8, 208264–208280 (2020)

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by Shanghai Science and Technology Innovation Action Plan Project (16111107502, 17511107203) Shanghai key lab of modern optical systems.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chunxue Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

He, P., Wu, C., Liu, K., Xiong, N.N. (2021). Deep Reinforcement Learning Based on Spatial-Temporal Context for IoT Video Sensors Object Tracking. In: Qiu, M. (eds) Smart Computing and Communication. SmartCom 2020. Lecture Notes in Computer Science(), vol 12608. Springer, Cham. https://doi.org/10.1007/978-3-030-74717-6_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-74717-6_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-74716-9

  • Online ISBN: 978-3-030-74717-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics