Skip to main content

Advertisement

An active object detection model with multi-step prediction based on deep q-learning network and innovative training algorithm

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Active Object Detection (AOD) gathers additional information by deliberately adjusting the agent’s viewpoint, ensuring precise detection results in complex environments. Viewpoint planning(VP) is one of the focal points of attention in AOD. Until now, the predominant approach in implementing AOD algorithms has involved the use of deep q-learning networks(DQNs), with a single discrete action as the output. Nevertheless, these methods exhibit shortcomings in both implementation efficiency and success rate. To address these challenges, an AOD algorithm is proposed in this paper, allowing for multistep prediction and employing a novel training strategy. In more detail, the AOD network using a shared decision-making approach is first constructed, simultaneously outputting the action category and range. Moreover, a novel training method based on the Prioritized Experience Replay(PER) is introduced in this article, enhancing the operational success rate of the AOD algorithm. Finally, the reward function is optimized for the designed framework, thereby promoting the convergence of network training. Several comparable methods are tested on a public dataset(Active Vision Dataset), and the results clearly illustrate the superiority of the approach presented in this article.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Algorithm 1
Algorithm 2
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

The AOD Dataset [16] is available at https://www.cs.unc.edu/~ammirato/active_vision_dataset_website/index.html.

References

  1. Zou Z, Chen K, Shi Z et al (2023) Object Detection in 20 years: A Survey. Proceedings of the IEEE 111(3):257–276. https://doi.org/10.1109/JPROC.2023.3238524

    Article  MATH  Google Scholar 

  2. Pal A, Kumar V (2023) AgriDet: Plant Leaf Disease severity classification using agriculture detection framework. Eng Appl Artif Intell 119:105754. https://doi.org/10.1016/j.engappai.2022.105754

    Article  Google Scholar 

  3. Zhang D, Hao X, Wang D et al (2023) An efficient lightweight convolutional neural network for industrial surface defect detection. Artif Intell Rev 56:10651–10677. https://doi.org/10.1007/s10462-023-10438-y

    Article  MATH  Google Scholar 

  4. Jha SB, Babiceanu RF (2023) Deep CNN-based visual defect detection: Survey of current literature. Comput Industry 148:103911. https://doi.org/10.1016/j.compind.2023.103911

    Article  MATH  Google Scholar 

  5. Zeng Y, Ma C, Zhu M, et al (2021) Cross-Modal 3D Object Detection and Tracking for Auto-Driving. In: 2021 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, Prague, pp 3850–3857, https://doi.org/10.1109/IROS51168.2021.9636498

  6. Wang L, Zhang X, Song Z et al (2023) Multi-Modal 3D Object Detection in Autonomous Driving: A Survey and Taxonomy. IEEE Trans Intell Vehicles 8(7):3781–3798. https://doi.org/10.1109/TIV.2023.3264658

    Article  MATH  Google Scholar 

  7. Zhao ZQ, Zheng P, Xu ST et al (2019) Object Detection With Deep Learning: A Review. IEEE Trans Neural Netw Learn Syst 30(11):3212–3232. https://doi.org/10.1109/TNNLS.2018.2876865

    Article  MATH  Google Scholar 

  8. Lowe DG (2004) Distinctive Image Features from Scale-Invariant Keypoints. Int J Comput Vision 60:91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94

    Article  MATH  Google Scholar 

  9. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE Computer society conference on computer vision and pattern recognition (CVPR’05), vol 1. IEEE, San Diego, pp 886–893, https://doi.org/10.1109/CVPR.2005.177

  10. Everingham M, Van Gool L, Williams CKI et al (2010) The PASCAL Visual Object Classes (VOC) Challenge. Int J Comput Vision 88:303–338. https://doi.org/10.1007/s11263-009-0275-4

    Article  Google Scholar 

  11. Lin TY, Maire M, Belongie S, et al (2014) Microsoft COCO: Common objects in context. In: Fleet D, Pajdla T, Schiele B, et al (eds) Computer Vision - ECCV 2014. Springer, Cham, Zurich, pp 740–755, https://doi.org/10.1007/978-3-319-10602-1sps48

  12. Deng J, Dong W, Socher R, et al (2009) ImageNet: A large-scale hierarchical image database. In: 2009 IEEE Conference on computer vision and pattern recognition(CVPR). IEEE, Miami, pp 248–255, https://doi.org/10.1109/CVPR.2009.5206848

  13. Yang J, Ren Z, Xu M, et al (2019) Embodied Amodal Recognition: Learning to Move to Perceive Objects. In: 2019 IEEE/CVF international conference on computer vision (ICCV). IEEE, Seoul, pp 2040–2050, https://doi.org/10.1109/ICCV.2019.00213

  14. Ali A, Zhu Y, Zakarya M (2022) Exploiting dynamic spatio-temporal graph convolutional neural networks for citywide traffic flows prediction. Neural Netw 145:233–247. https://doi.org/10.1016/j.neunet.2021.10.021

    Article  Google Scholar 

  15. Kong Y, Fu Y (2022) Human action recognition and prediction: A survey. Int J Comput Vision 130:1366–1401. https://doi.org/10.1007/s11263-022-01594-9

    Article  MATH  Google Scholar 

  16. Ammirato P, Poirson P, Park E, et al (2017) A dataset for developing and benchmarking active vision. In: 2017 IEEE International conference on robotics and automation (ICRA). IEEE, Singapore, pp 1378–1385, https://doi.org/10.1109/ICRA.2017.7989164

  17. Mnih V, Kavukcuoglu K, Silver D et al (2015) Human-level control through deep reinforcement learning. Nature 518:529–533. https://doi.org/10.1038/nature14236

    Article  Google Scholar 

  18. Liu S, Tian G, Zhang Y et al (2022) Active Object Detection Based on a Novel Deep Q-Learning Network and Long-Term Learning Strategy for the Service Robot. IEEE Trans Industrial Electron 69(6):5984–5993. https://doi.org/10.1109/TIE.2021.3090707

    Article  MATH  Google Scholar 

  19. Ammirato P, Berg AC, Košecká J (2018) Active Vision Dataset Benchmark. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition workshops (CVPRW). IEEE, Anchorage, pp 21270–21273. https://doi.org/10.1109/CVPRW.2018.00277

  20. García-Samartín JF, Ulloa CC, Cerro J, et al (2024) Active robotic search for victims using ensemble deep learning techniques. Machine Learning: Science and Technology 5(2). https://doi.org/10.1088/2632-2153/ad33df

  21. Schaul T, Quan J, Antonoglou I, et al (2016) Prioritized Experience Replay arXiv:1511.05952

  22. Lv L, Zhang S, Ding D et al (2019) Path Planning via an Improved DQN-Based Learning Policy. IEEE Access 7:67319–67330. https://doi.org/10.1109/ACCESS.2019.2918703

    Article  MATH  Google Scholar 

  23. Sharma J, Andersen PA, Granmo OC et al (2021) Deep Q-Learning With Q-Matrix Transfer Learning for Novel Fire Evacuation Environment. IEEE Trans Syst, Man, Cybernetics: Syst 51(12):7363–7381. https://doi.org/10.1109/TSMC.2020.2967936

    Article  MATH  Google Scholar 

  24. Lin HY, Liang SC, Chen YK (2021) Robotic Grasping With Multi-View Image Acquisition and Model-Based Pose Estimation. IEEE Sensors J 21(10):11870–11878. https://doi.org/10.1109/JSEN.2020.3030791

    Article  MATH  Google Scholar 

  25. Song S, Kim D, Choi S (2022) View Path Planning via Online Multiview Stereo for 3-D Modeling of Large-Scale Structures. IEEE Trans Robotics 38(1):372–390. https://doi.org/10.1109/TRO.2021.3083197

    Article  MATH  Google Scholar 

  26. Morrison D, Corke P, Leitner J (2019) Multi-View Picking: Next-best-view Reaching for Improved Grasping in Clutter. In: 2019 International conference on robotics and automation (ICRA). IEEE, Montreal, pp 8762–8768. https://doi.org/10.1109/ICRA.2019.8793805

  27. Lehnert C, Tsai D, Eriksson A, et al (2019) 3D Move to See: Multi-perspective visual servoing towards the next best view within unstructured and occluded environments. In: 2019 IEEE/RSJ International conference on intelligent robots and systems (IROS). IEEE, Macao, pp 3890–3897, https://doi.org/10.1109/IROS40897.2019.8967918

  28. Rapado-Rincón D, van Henten EJ, Kootstra G (2023) Development and evaluation of automated localisation and reconstruction of all fruits on tomato plants in a greenhouse based on multi-view perception and 3D multi-object tracking. Biosyst Eng 231:78–91. https://doi.org/10.1016/j.biosystemseng.2023.06.003

    Article  Google Scholar 

  29. Denzler J, Brown C (2002) Information theoretic sensor data selection for active object recognition and state estimation. IEEE Trans Pattern Anal Mach Intell 24(2):145–157. https://doi.org/10.1109/34.982896

    Article  MATH  Google Scholar 

  30. van Hoof H, Kroemer O, Peters J (2014) Probabilistic Segmentation and Targeted Exploration of Objects in Cluttered Environments. IEEE Trans Robot 30(5):1198–1209. https://doi.org/10.1109/TRO.2014.2334912

    Article  MATH  Google Scholar 

  31. Yang J, Waslander SL (2022) Next-Best-View Prediction for Active Stereo Cameras and Highly Reflective Objects. In: 2022 International conference on robotics and automation (ICRA). IEEE, Philadelphia, pp 3684–3690. https://doi.org/10.1109/ICRA46639.2022.9811917

  32. Cheng H, Duan F, He M (2023) Spiking Memory Policy with Population-encoding for Partially Observable Markov Decision Process Problems. Cognitive Comput 15:1153–1166. https://doi.org/10.1007/s12559-022-10030-6

    Article  MATH  Google Scholar 

  33. Zhang H, Liu H, Guo D, et al (2017) From foot to head: Active face finding using deep q-learning. In: 2017 IEEE International conference on image processing (ICIP). IEEE, Beijing, pp 1862–1866. https://doi.org/10.1109/ICIP.2017.8296604

  34. Han X, Liu H, Sun F, et al (2018) Active Object Detection Using Double DQN and Prioritized Experience Replay. In: 2018 International joint conference on neural networks (IJCNN). IEEE, Rio de Janeiro, pp 1–7. https://doi.org/10.1109/IJCNN.2018.8489296

  35. Van Hasselt H, Guez A, Silver D (2016) Deep Reinforcement Learning with Double Q-Learning. Proceed AAAI Conference Artif Intell 30(1):2094–2100. https://doi.org/10.1609/aaai.v30i1.10295

    Article  MATH  Google Scholar 

  36. Han X, Liu H, Sun F et al (2019) Active Object Detection With Multistep Action Prediction Using Deep Q-Network. IEEE Trans Industrial Inf 15(6):3723–3731. https://doi.org/10.1109/TII.2019.2890849

    Article  MATH  Google Scholar 

  37. Xu Q, Fang F, Gauthier N, et al (2021) Towards Efficient Multiview Object Detection with Adaptive Action Prediction. In: 2021 IEEE international conference on robotics and automation (ICRA). IEEE, Xi’an, pp 13423–13429. https://doi.org/10.1109/ICRA48506.2021.9561388

  38. Fang F, Xu Q, Gauthier N, et al (2021) Enhancing Multi-Step Action Prediction for Active Object Detection. In: 2021 IEEE International conference on image processing (ICIP). IEEE, Anchorage, pp 2189–2193. https://doi.org/10.1109/ICIP42928.2021.9506078

  39. Schmid JF, Lauri M, Frintrop S (2019) Explore, Approach, and Terminate: Evaluating Subtasks in Active Visual Object Search Based on Deep Reinforcement Learning. In: 2019 IEEE/RSJ International conference on intelligent robots and systems (IROS). IEEE, Macau, pp 5008–5013, https://doi.org/10.1109/IROS40897.2019.8967805

  40. Peng W, Wang W, Wang Y, et al (2024) Key Technologies and Trends of Active Robotic 3-D Measurement in Intelligent Manufacturing. IEEE/ASME Trans Mechatron pp 1–22. https://doi.org/10.1109/TMECH.2024.3396222

  41. Akl J, Alladkani F, Calli B (2024) Feature-Driven Next View Planning for Cutting Path Generation in Robotic Metal Scrap Recycling. IEEE Trans Automation Sci Eng 21(3):3357–3373. https://doi.org/10.1109/TASE.2023.3278994

    Article  Google Scholar 

  42. Wang T, Xi W, Cheng Y et al (2024) RL-NBV: A deep reinforcement learning based next-best-view method for unknown object reconstruction. Pattern Recognition Lett 184:1–6. https://doi.org/10.1016/j.patrec.2024.05.014

    Article  MATH  Google Scholar 

  43. Wang A, Chen H, Liu L, et al (2024) Yolov10: Real-time End-to-End Object Detection arXiv:2405.14458

  44. Tavakoli A, Pardo F, Kormushev P (2018) Action Branching Architectures for Deep Reinforcement Learning. Proceed AAAI Conference Artif Intell 32:1–9. https://doi.org/10.1609/aaai.v32i1.11798

    Article  MATH  Google Scholar 

  45. He K, Zhang X, Ren S, et al (2016) Deep Residual Learning for Image Recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). IEEE, Las Vegas, pp 770–778. https://doi.org/10.1109/CVPR.2016.90

  46. Wang Z, Schaul T, Hessel M, et al (2016) Dueling Network Architectures for Deep Reinforcement Learning. In: Proceedings of The 33rd international conference on machine learning. PMLR, New York, pp 1995–2003

  47. Sun H, Zhu F, Li Y et al (2023) Viewpoint planning with transition management for active object recognition. Front Neurorobot 17:1093132. https://doi.org/10.3389/fnbot.2023.1093132

    Article  MATH  Google Scholar 

  48. Sun H, Zhu F, Kong Y, et al (2021) Continuous Viewpoint Planning in Conjunction with Dynamic Exploration for Active Object Recognition. Entropy 23(12). https://doi.org/10.3390/e23121702

  49. Wang N, Gao Y, Zhao H et al (2021) Reinforcement Learning-Based Optimal Tracking Control of an Unknown Unmanned Surface Vehicle. IEEE Trans Neural Netw Learn Syst 32(7):3034–3045. https://doi.org/10.1109/TNNLS.2020.3009214

    Article  MathSciNet  MATH  Google Scholar 

  50. Liu H, Sun F, Zhang X (2019) Robotic Material Perception Using Active Multimodal Fusion. IEEE Trans Industrial Electron 66(12):9878–9886. https://doi.org/10.1109/TIE.2018.2878157

    Article  Google Scholar 

  51. Singh A, Sha J, Narayan KS, et al (2014) BigBIRD: A large-scale 3D database of object instances. In: 2014 IEEE International conference on robotics and automation (ICRA). IEEE, Miami, pp 509–516. https://doi.org/10.1109/ICRA.2014.6906903

  52. Wang X, Wang S, Liang X et al (2024) Deep Reinforcement Learning: A Survey. IEEE Trans Neural Netw Learn Syst 35(4):5064–5078. https://doi.org/10.1109/TNNLS.2022.3207346

  53. Fährmann D, Jorek N, Damer N et al (2022) Double Deep Q-Learning With Prioritized Experience Replay for Anomaly Detection in Smart Environments. IEEE Access 10:60836–60848. https://doi.org/10.1109/ACCESS.2022.3179720

    Article  MATH  Google Scholar 

  54. Chen Y, Liang L (2023) SLP-Improved DDPG Path-Planning Algorithm for Mobile Robot in Large-Scale Dynamic Environment. Sensors 23(7). https://doi.org/10.3390/s23073521

  55. Fang F, Liang W, Wu Y et al (2022) Self-Supervised Reinforcement Learning for Active Object Detection. IEEE Robot Automation Lett 7(4):10224–10231. https://doi.org/10.1109/LRA.2022.3193019

    Article  MATH  Google Scholar 

  56. Yang N, Lu F, Yu B, et al (2023) Service Robot Active Object Detection based on Spatial Exploration using Deep Recurrent Q-learning Network. In: 2023 IEEE International conference on robotics and biomimetics (ROBIO), pp 1–6. https://doi.org/10.1109/ROBIO58561.2023.10354931

  57. Xu N, Huo C, Zhang X et al (2021) Dynamic camera configuration learning for high-confidence active object detection. Neurocomputing 466:113–127. https://doi.org/10.1016/j.neucom.2021.09.037

    Article  MATH  Google Scholar 

  58. Tian Z, Shen C, Chen H et al (2022) FCOS: A Simple and Strong Anchor-Free Object Detector. IEEE Trans Pattern Anal Mach Intell 44(4):1922–1933. https://doi.org/10.1109/TPAMI.2020.3032166

    Article  MATH  Google Scholar 

  59. Abbaszadeh Shahri A, Chunling S, Larsson S (2024) A hybrid ensemble-based automated deep learning approach to generate 3D geo-models and uncertainty analysis. Eng Comput 40:1501–1516. https://doi.org/10.1007/s00366-023-01852-5

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

Jianyu Wang: Conceptualization, Methodology, Software, Formal analysis, Data curation, and Writing-original draft preparation. Feng Zhu: Methodology, Validation, Resources, and Project administration. Qun Wang and Yunge Cui: Writing-review and editing. Haibo Sun and Pengfei Zhao:Supervision.

Corresponding author

Correspondence to Feng Zhu.

Ethics declarations

Conflict of Interest/Competing Interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Mathematical Proof

Appendix A: Mathematical Proof

The purpose of adding this section is to prove the convergence and stability of the Q function designed in this paper through mathematical proof.

The value iteration of actions can be written as the following equation:

$$\begin{aligned} Q_{k+1}(s,a) = \sum _{s^{'}\in S} F(s^{'}\vert s,a)[r(s,a,s^{'}+\gamma \max _{a^{'} \in \mathcal {A}}Q_{k}(s^{'},a^{'})] \end{aligned}$$
(A1)

Then, the Bellman optimality backup operator is utilized as the policy \(\pi \).

$$\begin{aligned} \mathcal {B}_{*}U(s) = \max _{a\in \mathcal {A}}\sum _{s^{'}\in S}F(s^{'}\vert s,a)[r(s,a,s^{'}+\gamma U(s^{'})] \end{aligned}$$
(A2)

It can be easily deduced from (A2),

$$\begin{aligned} \begin{aligned} \Vert \mathcal {B}_{*}U(1) - \mathcal {B}_{*}U(2) \Vert _{\infty }&= \max _{s} \bigg \{\bigg \vert \max _{a_1\in \mathcal {A}}\sum _{s^{'}\in S}F(s^{'}\vert s,a_1)\Big [r(s,a_1,s^{'}+\gamma U_1(s^{'})\Big ]\\&-\max _{a_2\in \mathcal {A}}\sum _{s^{'}\in S}F(s^{'}\vert s,a_2)\Big [r(s,a_2,s^{'}+\gamma U_2(s^{'})\Big ]\bigg \vert \bigg \}\\&\le \max _{s} \bigg \{\bigg \vert \max _{a_1\in \mathcal {A}}\sum _{s^{'}\in S}F(s^{'}\vert s,a_1)\Big [r(s,a_1,s^{'}+\gamma U_1(s^{'})\Big ]\\&-\sum _{s^{'}\in S}F(s^{'}\vert s,a_1)\Big [r(s,a_1,s^{'}+\gamma U_2(s^{'})\Big ]\bigg \vert \bigg \}\\&\le \max _{s} \bigg \{ \max _{a_1\in \mathcal {A}}\bigg \vert \sum _{s^{'}\in S}F(s^{'}\vert s,a_1)\Big [r(s,a_1,s^{'}+\gamma U_1(s^{'})\Big ]\\&-\sum _{s^{'}\in S}F(s^{'}\vert s,a_1)\Big [r(s,a_1,s^{'}+\gamma U_2(s^{'})\Big ]\bigg \vert \bigg \}\\&\le \gamma \max _{s} \bigg \{\max _{a\in \mathcal {A}}\Big [\sum _{s^{'}\in S}\Big \vert U_{1}(s^{'})- U_{2}(s^{'}) \Big \vert \Big ] \bigg \}\\&\le \gamma \max _{s} \bigg \{\max _{a\in \mathcal {A}, s^{'} \in S}\Big \{\Big \vert U_{1}(s^{'})- U_{2}(s^{'}) \Big \vert \Big \} \bigg \}\\&= \gamma \Vert U_{1} - U_{2} \Vert _{\infty } \end{aligned} \end{aligned}$$
(A3)

So, When \(0<r<1\), \(\mathcal {B}_{*}\) is a strict contraction mapping. Therefore, when the optimal strategy appears, the action value(Q) can be regarded as the following sequence: \(Q_{list}=(U, B_{\pi ^{*}}U, B_{\pi ^{*}}^{2}U, \dots )\). If the exponent in the sequence is infinite, it can be seen from (A4) that the sequence will strictly converge to a value. Therefore, it can be proven that the Q function proposed in this paper has convergence.

$$\begin{aligned} \begin{aligned} \Vert \mathcal {B}_{\pi ^{*}}^{m+1}U - \mathcal {B}_{\pi ^{*}}^{m}U \Vert _{\infty }&\le \gamma \Vert \mathcal {B}_{\pi ^{*}}^{m}U - \mathcal {B}_{\pi ^{*}}^{m-1}U \Vert _{\infty }\\&\le \gamma ^{2} \Vert \mathcal {B}_{\pi ^{*}}^{m-1}U - \mathcal {B}_{\pi ^{*}}^{m-2}U \Vert _{\infty }\\&\dots \\&\le \gamma ^{m} \Vert \mathcal {B}_{\pi ^{*}}U - U \Vert _{\infty } \end{aligned} \end{aligned}$$
(A4)

According to the above proof, \(Q_{list}\) is converged. First, assume that \(Q_{list}\) has two convergence values \((U, V): U\ne V \). Then, it can be inferred that \(\Vert U-V \Vert _{\infty }>0\). Since U and V are both convergent values, according to convergence analysis, it can be concluded that \(\Vert \mathcal {B}_{\pi ^{*}}U - \mathcal {B}_{\pi ^{*}}V \Vert _{\infty } = \Vert U - V \Vert _{\infty }\). However, this derivation process does not meet the convergence mapping condition: \(\Vert \mathcal {B}_{\pi ^{*}}U - \mathcal {B}_{\pi ^{*}}V \Vert _{\infty } \le \gamma \Vert U - V \Vert _{\infty } < \Vert U - V \Vert _{\infty }\). Therefore, the optimal strategy remains stable at all times when it is unique and not affected by external factors.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, J., Zhu, F., Wang, Q. et al. An active object detection model with multi-step prediction based on deep q-learning network and innovative training algorithm. Appl Intell 55, 185 (2025). https://doi.org/10.1007/s10489-024-05993-y

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-024-05993-y

Keywords