Skip to main content
Log in

DRA-ODM: a faster and more accurate deep recurrent attention dynamic model for object detection

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

The traditional object detection model based on convolutional neural network contains a large amount of parameters, so it has poor performance when applied to high-real-time and high-precision scenes. To solve the problem, this paper proposes a deep recurrent attention object detection dynamic model (DRA-ODM). The model based on time domain attention mechanism is constructed by using recurrent neural network and dynamic sampling point mechanism, it simulates the way that human-eyes ignore irrelevant information and pay attention to key information when observing things. DRA-ODM model completes object detection on the premise of extracting only part of image features. In addition, this paper visualizes the position in each sampling, thus, it is convenient to observe the position of sampling points during each cycle. Extensive experiment results demonstrate that DRA-ODM model achieves object detection within 5 time steps using about 20 M parameters, its average accuracy could reach 87. 4%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15

Similar content being viewed by others

References

  1. Keqi, C., Zhiliang, Z., Xiaoming, D., Cuixia, Ma., Hongan, W.: A review of deep learning for multi-scale object detection[J]. Journal of Software 32(04), 1201–1227 (2021)

    MATH  Google Scholar 

  2. LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]//Proceedings of the IEEE, 1998, 86(11): 2278–2324.

  3. Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2014: 580–587.

  4. Girshick R. Fast r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2015: 1440–1448.

  5. Ren S, He K, Girshick R, et al. Faster r-cnn: Towards real-time object detection with region proposal networks[J]//Advances in neural information processing systems, 2015, 28: 91–99.

  6. Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 779–788.

  7. Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 7263–7271.

  8. Redmon J, Farhadi A. Yolov3: An incremental improvement[J]. arXiv preprint arXiv:1804.02767, 2018.

  9. Bochkovskiy A, Wang C Y, Liao H Y M. Yolov4: Optimal speed and accuracy of object detection[J]. arXiv preprint arXiv:2004.10934, 2020.

  10. Zhou, Q, Wang, J, Liu, J. RSANet: Towards Real-Time Object Detection with Residual Semantic-Guided Attention Feature Pyramid Network.

  11. Mobile Netw Appl 26, 77–87 (2021).

  12. Najibi M, Rastegari M, Davis L S. G-cnn: an iterative grid based object detector[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 2369–2377.

  13. Deng, S., Zhao, H., Fang, W., et al.: Edge intelligence: The confluence of edge computing and artificial intelligence[J]. IEEE Internet Things J. 7(8), 7457–7469 (2020)

    Article  Google Scholar 

  14. Itti, L., Koch, C.: Computational modelling of visual attention[J]. Nat. Rev. Neurosci. 2(3), 194–203 (2001)

    Article  Google Scholar 

  15. Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning[J]. nature, 2015, 518(7540): 529–533.

  16. Mathe S, Pirinen A, Sminchisescu C. Reinforcement learning for visual object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 2894–2902.

  17. Sorokin I, Seleznev A, Pavlov M, et al. Deep attention recurrent Q-network[J]. arXiv preprint arXiv:1512.01693, 2015.

  18. Mnih V, Heess N, Graves A. Recurrent models of visual attention[C]//Advances in neural information processing systems. 2014: 2204–2212.

  19. Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7132–7141.

  20. Woo S, Park J, Lee J Y, et al. Cbam: Convolutional block attention module[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 3–19.

  21. Li X, Wang W, Hu X, et al. Selective kernel networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 510–519.

  22. Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.

  23. Jie, L.V.: Visual Attentional Network and Learning Method for Object Search and Recognition[J]. Journal of Mechanical Engineering 55(11), 123 (2019)

    Article  Google Scholar 

  24. Shim D, Kim H J. Gaussian RAM: Lightweight Image Classification via Stochastic Retina-Inspired Glimpse and Reinforcement Learning[C]//2020 20th International Conference on Control, Automation and Systems (ICCAS). IEEE, 2020: 155–160.

  25. Huang Y, Gu C, Wu K, et al. Parallel Search by Reinforcement Learning for Object Detection[C]//Chinese Conference on Pattern Recognition and Computer Vision (PRCV). Springer, Cham, 2018: 272–283.

  26. Liu, S., Huang, D., Wang, Y.: Pay attention to them: deep reinforcement learning-based cascade object detection[J]. IEEE transactions on neural networks and learning systems 31(7), 2544–2556 (2019)

    Google Scholar 

  27. Krizhevsky A, Hinton G. Learning multiple layers of features from tiny images[J]. 2009.

  28. Everingham, M., Van Gool, L., Williams, C.K.I., et al.: The pascal visual object classes (voc) challenge[J]. Int. J. Comput. Vision 88(2), 303–338 (2010)

    Article  Google Scholar 

  29. Lu Y, Javidi T, Lazebnik S. Adaptive object detection using adjacency and zoom prediction[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 2351–2359.

  30. Lu, H., Li, Y., Mu, S., et al.: Motor Anomaly Detection for Unmanned Aerial Vehicles Using Reinforcement Learning[J]. IEEE Internet Things J. (2018). https://doi.org/10.1109/JIOT.2017.2737479,5(4),pp.2315-2322

    Article  Google Scholar 

  31. Lu, H., Zhang, Y., Li, Y., et al.: User-Oriented Virtual Mobile Network Resource Management for Vehicle Communications[J]. IEEE Trans. Intell. Transp. Syst. (2020). https://doi.org/10.1109/TITS.2020.2991766

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by Natural Science Project of Shaanxi Education Department (18JK0399).

Author information

Authors and Affiliations

Authors

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Special Issue on Emerging Blockchain Applications and Technology

Guest Editors: Huimin Lu, Xing Xu, Jože Guna, and Gautam Srivastava

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, G., Xu, F., Li, H. et al. DRA-ODM: a faster and more accurate deep recurrent attention dynamic model for object detection. World Wide Web 25, 1625–1648 (2022). https://doi.org/10.1007/s11280-021-00971-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-021-00971-7

Keywords

Navigation