An active object detection model with multi-step prediction based on deep q-learning network and innovative training algorithm

Wang, Jianyu; Zhu, Feng; Wang, Qun; Cui, Yunge; Sun, Haibo; Zhao, Pengfei

doi:10.1007/s10489-024-05993-y

An active object detection model with multi-step prediction based on deep q-learning network and innovative training algorithm

Published: 19 December 2024

Volume 55, article number 185, (2025)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Jianyu Wang^1,2,3,
Feng Zhu^2,3,
Qun Wang^2,3,4,
Yunge Cui^2,3,4,
Haibo Sun⁵ &
…
Pengfei Zhao^2,3,4

126 Accesses
Explore all metrics

Abstract

Active Object Detection (AOD) gathers additional information by deliberately adjusting the agent’s viewpoint, ensuring precise detection results in complex environments. Viewpoint planning(VP) is one of the focal points of attention in AOD. Until now, the predominant approach in implementing AOD algorithms has involved the use of deep q-learning networks(DQNs), with a single discrete action as the output. Nevertheless, these methods exhibit shortcomings in both implementation efficiency and success rate. To address these challenges, an AOD algorithm is proposed in this paper, allowing for multistep prediction and employing a novel training strategy. In more detail, the AOD network using a shared decision-making approach is first constructed, simultaneously outputting the action category and range. Moreover, a novel training method based on the Prioritized Experience Replay(PER) is introduced in this article, enhancing the operational success rate of the AOD algorithm. Finally, the reward function is optimized for the designed framework, thereby promoting the convergence of network training. Several comparable methods are tested on a public dataset(Active Vision Dataset), and the results clearly illustrate the superiority of the approach presented in this article.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 6

Look Around and Learn: Self-training Object Detection by Exploration

A deep Q-learning network based active object detection model with a novel training algorithm for service robots

Article 24 September 2022

Active Object Detection Based on PPO Learning Algorithm with Decision Knowledge Guidance

Article 07 January 2025

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

The AOD Dataset [16] is available at https://www.cs.unc.edu/~ammirato/active_vision_dataset_website/index.html.

References

Zou Z, Chen K, Shi Z et al (2023) Object Detection in 20 years: A Survey. Proceedings of the IEEE 111(3):257–276. https://doi.org/10.1109/JPROC.2023.3238524
Article MATH Google Scholar
Pal A, Kumar V (2023) AgriDet: Plant Leaf Disease severity classification using agriculture detection framework. Eng Appl Artif Intell 119:105754. https://doi.org/10.1016/j.engappai.2022.105754
Article Google Scholar
Zhang D, Hao X, Wang D et al (2023) An efficient lightweight convolutional neural network for industrial surface defect detection. Artif Intell Rev 56:10651–10677. https://doi.org/10.1007/s10462-023-10438-y
Article MATH Google Scholar
Jha SB, Babiceanu RF (2023) Deep CNN-based visual defect detection: Survey of current literature. Comput Industry 148:103911. https://doi.org/10.1016/j.compind.2023.103911
Article MATH Google Scholar
Zeng Y, Ma C, Zhu M, et al (2021) Cross-Modal 3D Object Detection and Tracking for Auto-Driving. In: 2021 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, Prague, pp 3850–3857, https://doi.org/10.1109/IROS51168.2021.9636498
Wang L, Zhang X, Song Z et al (2023) Multi-Modal 3D Object Detection in Autonomous Driving: A Survey and Taxonomy. IEEE Trans Intell Vehicles 8(7):3781–3798. https://doi.org/10.1109/TIV.2023.3264658
Article MATH Google Scholar
Zhao ZQ, Zheng P, Xu ST et al (2019) Object Detection With Deep Learning: A Review. IEEE Trans Neural Netw Learn Syst 30(11):3212–3232. https://doi.org/10.1109/TNNLS.2018.2876865
Article MATH Google Scholar
Lowe DG (2004) Distinctive Image Features from Scale-Invariant Keypoints. Int J Comput Vision 60:91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94
Article MATH Google Scholar
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE Computer society conference on computer vision and pattern recognition (CVPR’05), vol 1. IEEE, San Diego, pp 886–893, https://doi.org/10.1109/CVPR.2005.177
Everingham M, Van Gool L, Williams CKI et al (2010) The PASCAL Visual Object Classes (VOC) Challenge. Int J Comput Vision 88:303–338. https://doi.org/10.1007/s11263-009-0275-4
Article Google Scholar
Lin TY, Maire M, Belongie S, et al (2014) Microsoft COCO: Common objects in context. In: Fleet D, Pajdla T, Schiele B, et al (eds) Computer Vision - ECCV 2014. Springer, Cham, Zurich, pp 740–755, https://doi.org/10.1007/978-3-319-10602-1sps48
Deng J, Dong W, Socher R, et al (2009) ImageNet: A large-scale hierarchical image database. In: 2009 IEEE Conference on computer vision and pattern recognition(CVPR). IEEE, Miami, pp 248–255, https://doi.org/10.1109/CVPR.2009.5206848
Yang J, Ren Z, Xu M, et al (2019) Embodied Amodal Recognition: Learning to Move to Perceive Objects. In: 2019 IEEE/CVF international conference on computer vision (ICCV). IEEE, Seoul, pp 2040–2050, https://doi.org/10.1109/ICCV.2019.00213
Ali A, Zhu Y, Zakarya M (2022) Exploiting dynamic spatio-temporal graph convolutional neural networks for citywide traffic flows prediction. Neural Netw 145:233–247. https://doi.org/10.1016/j.neunet.2021.10.021
Article Google Scholar
Kong Y, Fu Y (2022) Human action recognition and prediction: A survey. Int J Comput Vision 130:1366–1401. https://doi.org/10.1007/s11263-022-01594-9
Article MATH Google Scholar
Ammirato P, Poirson P, Park E, et al (2017) A dataset for developing and benchmarking active vision. In: 2017 IEEE International conference on robotics and automation (ICRA). IEEE, Singapore, pp 1378–1385, https://doi.org/10.1109/ICRA.2017.7989164
Mnih V, Kavukcuoglu K, Silver D et al (2015) Human-level control through deep reinforcement learning. Nature 518:529–533. https://doi.org/10.1038/nature14236
Article Google Scholar
Liu S, Tian G, Zhang Y et al (2022) Active Object Detection Based on a Novel Deep Q-Learning Network and Long-Term Learning Strategy for the Service Robot. IEEE Trans Industrial Electron 69(6):5984–5993. https://doi.org/10.1109/TIE.2021.3090707
Article MATH Google Scholar
Ammirato P, Berg AC, Košecká J (2018) Active Vision Dataset Benchmark. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition workshops (CVPRW). IEEE, Anchorage, pp 21270–21273. https://doi.org/10.1109/CVPRW.2018.00277
García-Samartín JF, Ulloa CC, Cerro J, et al (2024) Active robotic search for victims using ensemble deep learning techniques. Machine Learning: Science and Technology 5(2). https://doi.org/10.1088/2632-2153/ad33df
Schaul T, Quan J, Antonoglou I, et al (2016) Prioritized Experience Replay arXiv:1511.05952
Lv L, Zhang S, Ding D et al (2019) Path Planning via an Improved DQN-Based Learning Policy. IEEE Access 7:67319–67330. https://doi.org/10.1109/ACCESS.2019.2918703
Article MATH Google Scholar
Sharma J, Andersen PA, Granmo OC et al (2021) Deep Q-Learning With Q-Matrix Transfer Learning for Novel Fire Evacuation Environment. IEEE Trans Syst, Man, Cybernetics: Syst 51(12):7363–7381. https://doi.org/10.1109/TSMC.2020.2967936
Article MATH Google Scholar
Lin HY, Liang SC, Chen YK (2021) Robotic Grasping With Multi-View Image Acquisition and Model-Based Pose Estimation. IEEE Sensors J 21(10):11870–11878. https://doi.org/10.1109/JSEN.2020.3030791
Article MATH Google Scholar
Song S, Kim D, Choi S (2022) View Path Planning via Online Multiview Stereo for 3-D Modeling of Large-Scale Structures. IEEE Trans Robotics 38(1):372–390. https://doi.org/10.1109/TRO.2021.3083197
Article MATH Google Scholar
Morrison D, Corke P, Leitner J (2019) Multi-View Picking: Next-best-view Reaching for Improved Grasping in Clutter. In: 2019 International conference on robotics and automation (ICRA). IEEE, Montreal, pp 8762–8768. https://doi.org/10.1109/ICRA.2019.8793805
Lehnert C, Tsai D, Eriksson A, et al (2019) 3D Move to See: Multi-perspective visual servoing towards the next best view within unstructured and occluded environments. In: 2019 IEEE/RSJ International conference on intelligent robots and systems (IROS). IEEE, Macao, pp 3890–3897, https://doi.org/10.1109/IROS40897.2019.8967918
Rapado-Rincón D, van Henten EJ, Kootstra G (2023) Development and evaluation of automated localisation and reconstruction of all fruits on tomato plants in a greenhouse based on multi-view perception and 3D multi-object tracking. Biosyst Eng 231:78–91. https://doi.org/10.1016/j.biosystemseng.2023.06.003
Article Google Scholar
Denzler J, Brown C (2002) Information theoretic sensor data selection for active object recognition and state estimation. IEEE Trans Pattern Anal Mach Intell 24(2):145–157. https://doi.org/10.1109/34.982896
Article MATH Google Scholar
van Hoof H, Kroemer O, Peters J (2014) Probabilistic Segmentation and Targeted Exploration of Objects in Cluttered Environments. IEEE Trans Robot 30(5):1198–1209. https://doi.org/10.1109/TRO.2014.2334912
Article MATH Google Scholar
Yang J, Waslander SL (2022) Next-Best-View Prediction for Active Stereo Cameras and Highly Reflective Objects. In: 2022 International conference on robotics and automation (ICRA). IEEE, Philadelphia, pp 3684–3690. https://doi.org/10.1109/ICRA46639.2022.9811917
Cheng H, Duan F, He M (2023) Spiking Memory Policy with Population-encoding for Partially Observable Markov Decision Process Problems. Cognitive Comput 15:1153–1166. https://doi.org/10.1007/s12559-022-10030-6
Article MATH Google Scholar
Zhang H, Liu H, Guo D, et al (2017) From foot to head: Active face finding using deep q-learning. In: 2017 IEEE International conference on image processing (ICIP). IEEE, Beijing, pp 1862–1866. https://doi.org/10.1109/ICIP.2017.8296604
Han X, Liu H, Sun F, et al (2018) Active Object Detection Using Double DQN and Prioritized Experience Replay. In: 2018 International joint conference on neural networks (IJCNN). IEEE, Rio de Janeiro, pp 1–7. https://doi.org/10.1109/IJCNN.2018.8489296
Van Hasselt H, Guez A, Silver D (2016) Deep Reinforcement Learning with Double Q-Learning. Proceed AAAI Conference Artif Intell 30(1):2094–2100. https://doi.org/10.1609/aaai.v30i1.10295
Article MATH Google Scholar
Han X, Liu H, Sun F et al (2019) Active Object Detection With Multistep Action Prediction Using Deep Q-Network. IEEE Trans Industrial Inf 15(6):3723–3731. https://doi.org/10.1109/TII.2019.2890849
Article MATH Google Scholar
Xu Q, Fang F, Gauthier N, et al (2021) Towards Efficient Multiview Object Detection with Adaptive Action Prediction. In: 2021 IEEE international conference on robotics and automation (ICRA). IEEE, Xi’an, pp 13423–13429. https://doi.org/10.1109/ICRA48506.2021.9561388
Fang F, Xu Q, Gauthier N, et al (2021) Enhancing Multi-Step Action Prediction for Active Object Detection. In: 2021 IEEE International conference on image processing (ICIP). IEEE, Anchorage, pp 2189–2193. https://doi.org/10.1109/ICIP42928.2021.9506078
Schmid JF, Lauri M, Frintrop S (2019) Explore, Approach, and Terminate: Evaluating Subtasks in Active Visual Object Search Based on Deep Reinforcement Learning. In: 2019 IEEE/RSJ International conference on intelligent robots and systems (IROS). IEEE, Macau, pp 5008–5013, https://doi.org/10.1109/IROS40897.2019.8967805
Peng W, Wang W, Wang Y, et al (2024) Key Technologies and Trends of Active Robotic 3-D Measurement in Intelligent Manufacturing. IEEE/ASME Trans Mechatron pp 1–22. https://doi.org/10.1109/TMECH.2024.3396222
Akl J, Alladkani F, Calli B (2024) Feature-Driven Next View Planning for Cutting Path Generation in Robotic Metal Scrap Recycling. IEEE Trans Automation Sci Eng 21(3):3357–3373. https://doi.org/10.1109/TASE.2023.3278994
Article Google Scholar
Wang T, Xi W, Cheng Y et al (2024) RL-NBV: A deep reinforcement learning based next-best-view method for unknown object reconstruction. Pattern Recognition Lett 184:1–6. https://doi.org/10.1016/j.patrec.2024.05.014
Article MATH Google Scholar
Wang A, Chen H, Liu L, et al (2024) Yolov10: Real-time End-to-End Object Detection arXiv:2405.14458
Tavakoli A, Pardo F, Kormushev P (2018) Action Branching Architectures for Deep Reinforcement Learning. Proceed AAAI Conference Artif Intell 32:1–9. https://doi.org/10.1609/aaai.v32i1.11798
Article MATH Google Scholar
He K, Zhang X, Ren S, et al (2016) Deep Residual Learning for Image Recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). IEEE, Las Vegas, pp 770–778. https://doi.org/10.1109/CVPR.2016.90
Wang Z, Schaul T, Hessel M, et al (2016) Dueling Network Architectures for Deep Reinforcement Learning. In: Proceedings of The 33rd international conference on machine learning. PMLR, New York, pp 1995–2003
Sun H, Zhu F, Li Y et al (2023) Viewpoint planning with transition management for active object recognition. Front Neurorobot 17:1093132. https://doi.org/10.3389/fnbot.2023.1093132
Article MATH Google Scholar
Sun H, Zhu F, Kong Y, et al (2021) Continuous Viewpoint Planning in Conjunction with Dynamic Exploration for Active Object Recognition. Entropy 23(12). https://doi.org/10.3390/e23121702
Wang N, Gao Y, Zhao H et al (2021) Reinforcement Learning-Based Optimal Tracking Control of an Unknown Unmanned Surface Vehicle. IEEE Trans Neural Netw Learn Syst 32(7):3034–3045. https://doi.org/10.1109/TNNLS.2020.3009214
Article MathSciNet MATH Google Scholar
Liu H, Sun F, Zhang X (2019) Robotic Material Perception Using Active Multimodal Fusion. IEEE Trans Industrial Electron 66(12):9878–9886. https://doi.org/10.1109/TIE.2018.2878157
Article Google Scholar
Singh A, Sha J, Narayan KS, et al (2014) BigBIRD: A large-scale 3D database of object instances. In: 2014 IEEE International conference on robotics and automation (ICRA). IEEE, Miami, pp 509–516. https://doi.org/10.1109/ICRA.2014.6906903
Wang X, Wang S, Liang X et al (2024) Deep Reinforcement Learning: A Survey. IEEE Trans Neural Netw Learn Syst 35(4):5064–5078. https://doi.org/10.1109/TNNLS.2022.3207346
Fährmann D, Jorek N, Damer N et al (2022) Double Deep Q-Learning With Prioritized Experience Replay for Anomaly Detection in Smart Environments. IEEE Access 10:60836–60848. https://doi.org/10.1109/ACCESS.2022.3179720
Article MATH Google Scholar
Chen Y, Liang L (2023) SLP-Improved DDPG Path-Planning Algorithm for Mobile Robot in Large-Scale Dynamic Environment. Sensors 23(7). https://doi.org/10.3390/s23073521
Fang F, Liang W, Wu Y et al (2022) Self-Supervised Reinforcement Learning for Active Object Detection. IEEE Robot Automation Lett 7(4):10224–10231. https://doi.org/10.1109/LRA.2022.3193019
Article MATH Google Scholar
Yang N, Lu F, Yu B, et al (2023) Service Robot Active Object Detection based on Spatial Exploration using Deep Recurrent Q-learning Network. In: 2023 IEEE International conference on robotics and biomimetics (ROBIO), pp 1–6. https://doi.org/10.1109/ROBIO58561.2023.10354931
Xu N, Huo C, Zhang X et al (2021) Dynamic camera configuration learning for high-confidence active object detection. Neurocomputing 466:113–127. https://doi.org/10.1016/j.neucom.2021.09.037
Article MATH Google Scholar
Tian Z, Shen C, Chen H et al (2022) FCOS: A Simple and Strong Anchor-Free Object Detector. IEEE Trans Pattern Anal Mach Intell 44(4):1922–1933. https://doi.org/10.1109/TPAMI.2020.3032166
Article MATH Google Scholar
Abbaszadeh Shahri A, Chunling S, Larsson S (2024) A hybrid ensemble-based automated deep learning approach to generate 3D geo-models and uncertainty analysis. Eng Comput 40:1501–1516. https://doi.org/10.1007/s00366-023-01852-5
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Robot Science and Engineering, Northeastern University, No.195 Chuangxin Road, Hunnan District, Shenyang, Liaoning, 110169, China
Jianyu Wang
Key Laboratory of Opto-Electronic Information Processing, Chinese Academy of Sciences, No.135 Chuangxin Road, Hunnan District, Shenyang, Liaoning, 110016, China
Jianyu Wang, Feng Zhu, Qun Wang, Yunge Cui & Pengfei Zhao
Shenyang Institute of Automation, Chinese Academy of Sciences, No.135 Chuangxin Road, Hunnan District, Shenyang, Liaoning, 110016, China
Jianyu Wang, Feng Zhu, Qun Wang, Yunge Cui & Pengfei Zhao
University of Chinese Academy of Sciences, No.19 Yuquan Road, Shijingshan District, Beijing, 100049, China
Qun Wang, Yunge Cui & Pengfei Zhao
Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, No.865 Changning Road, Changning District, Shanghai, 200050, China
Haibo Sun

Authors

Jianyu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Feng Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Qun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yunge Cui
View author publications
You can also search for this author in PubMed Google Scholar
Haibo Sun
View author publications
You can also search for this author in PubMed Google Scholar
Pengfei Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Jianyu Wang: Conceptualization, Methodology, Software, Formal analysis, Data curation, and Writing-original draft preparation. Feng Zhu: Methodology, Validation, Resources, and Project administration. Qun Wang and Yunge Cui: Writing-review and editing. Haibo Sun and Pengfei Zhao:Supervision.

Corresponding author

Correspondence to Feng Zhu.

Ethics declarations

Conflict of Interest/Competing Interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Mathematical Proof

The purpose of adding this section is to prove the convergence and stability of the Q function designed in this paper through mathematical proof.

The value iteration of actions can be written as the following equation:

$$\begin{aligned} Q_{k+1}(s,a) = \sum _{s^{'}\in S} F(s^{'}\vert s,a)[r(s,a,s^{'}+\gamma \max _{a^{'} \in \mathcal {A}}Q_{k}(s^{'},a^{'})] \end{aligned}$$

(A1)

Then, the Bellman optimality backup operator is utilized as the policy $\pi $.

$$\begin{aligned} \mathcal {B}_{*}U(s) = \max _{a\in \mathcal {A}}\sum _{s^{'}\in S}F(s^{'}\vert s,a)[r(s,a,s^{'}+\gamma U(s^{'})] \end{aligned}$$

(A2)

It can be easily deduced from (A2),

$$\begin{aligned} \begin{aligned} \Vert \mathcal {B}_{*}U(1) - \mathcal {B}_{*}U(2) \Vert _{\infty }&= \max _{s} \bigg \{\bigg \vert \max _{a_1\in \mathcal {A}}\sum _{s^{'}\in S}F(s^{'}\vert s,a_1)\Big [r(s,a_1,s^{'}+\gamma U_1(s^{'})\Big ]\\&-\max _{a_2\in \mathcal {A}}\sum _{s^{'}\in S}F(s^{'}\vert s,a_2)\Big [r(s,a_2,s^{'}+\gamma U_2(s^{'})\Big ]\bigg \vert \bigg \}\\&\le \max _{s} \bigg \{\bigg \vert \max _{a_1\in \mathcal {A}}\sum _{s^{'}\in S}F(s^{'}\vert s,a_1)\Big [r(s,a_1,s^{'}+\gamma U_1(s^{'})\Big ]\\&-\sum _{s^{'}\in S}F(s^{'}\vert s,a_1)\Big [r(s,a_1,s^{'}+\gamma U_2(s^{'})\Big ]\bigg \vert \bigg \}\\&\le \max _{s} \bigg \{ \max _{a_1\in \mathcal {A}}\bigg \vert \sum _{s^{'}\in S}F(s^{'}\vert s,a_1)\Big [r(s,a_1,s^{'}+\gamma U_1(s^{'})\Big ]\\&-\sum _{s^{'}\in S}F(s^{'}\vert s,a_1)\Big [r(s,a_1,s^{'}+\gamma U_2(s^{'})\Big ]\bigg \vert \bigg \}\\&\le \gamma \max _{s} \bigg \{\max _{a\in \mathcal {A}}\Big [\sum _{s^{'}\in S}\Big \vert U_{1}(s^{'})- U_{2}(s^{'}) \Big \vert \Big ] \bigg \}\\&\le \gamma \max _{s} \bigg \{\max _{a\in \mathcal {A}, s^{'} \in S}\Big \{\Big \vert U_{1}(s^{'})- U_{2}(s^{'}) \Big \vert \Big \} \bigg \}\\&= \gamma \Vert U_{1} - U_{2} \Vert _{\infty } \end{aligned} \end{aligned}$$

(A3)

So, When $0<r<1$, $\mathcal {B}_{*}$ is a strict contraction mapping. Therefore, when the optimal strategy appears, the action value(Q) can be regarded as the following sequence: $Q_{list}=(U, B_{\pi ^{*}}U, B_{\pi ^{*}}^{2}U, \dots )$. If the exponent in the sequence is infinite, it can be seen from (A4) that the sequence will strictly converge to a value. Therefore, it can be proven that the Q function proposed in this paper has convergence.

$$\begin{aligned} \begin{aligned} \Vert \mathcal {B}_{\pi ^{*}}^{m+1}U - \mathcal {B}_{\pi ^{*}}^{m}U \Vert _{\infty }&\le \gamma \Vert \mathcal {B}_{\pi ^{*}}^{m}U - \mathcal {B}_{\pi ^{*}}^{m-1}U \Vert _{\infty }\\&\le \gamma ^{2} \Vert \mathcal {B}_{\pi ^{*}}^{m-1}U - \mathcal {B}_{\pi ^{*}}^{m-2}U \Vert _{\infty }\\&\dots \\&\le \gamma ^{m} \Vert \mathcal {B}_{\pi ^{*}}U - U \Vert _{\infty } \end{aligned} \end{aligned}$$

(A4)

According to the above proof, $Q_{list}$ is converged. First, assume that $Q_{list}$ has two convergence values $(U, V): U\ne V $. Then, it can be inferred that $\Vert U-V \Vert _{\infty }>0$. Since U and V are both convergent values, according to convergence analysis, it can be concluded that $\Vert \mathcal {B}_{\pi ^{*}}U - \mathcal {B}_{\pi ^{*}}V \Vert _{\infty } = \Vert U - V \Vert _{\infty }$. However, this derivation process does not meet the convergence mapping condition: $\Vert \mathcal {B}_{\pi ^{*}}U - \mathcal {B}_{\pi ^{*}}V \Vert _{\infty } \le \gamma \Vert U - V \Vert _{\infty } < \Vert U - V \Vert _{\infty }$. Therefore, the optimal strategy remains stable at all times when it is unique and not affected by external factors.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, J., Zhu, F., Wang, Q. et al. An active object detection model with multi-step prediction based on deep q-learning network and innovative training algorithm. Appl Intell 55, 185 (2025). https://doi.org/10.1007/s10489-024-05993-y

Download citation

Accepted: 30 September 2024
Published: 19 December 2024
DOI: https://doi.org/10.1007/s10489-024-05993-y

Keywords