Abstract
Active Object Detection (AOD) gathers additional information by deliberately adjusting the agent’s viewpoint, ensuring precise detection results in complex environments. Viewpoint planning(VP) is one of the focal points of attention in AOD. Until now, the predominant approach in implementing AOD algorithms has involved the use of deep q-learning networks(DQNs), with a single discrete action as the output. Nevertheless, these methods exhibit shortcomings in both implementation efficiency and success rate. To address these challenges, an AOD algorithm is proposed in this paper, allowing for multistep prediction and employing a novel training strategy. In more detail, the AOD network using a shared decision-making approach is first constructed, simultaneously outputting the action category and range. Moreover, a novel training method based on the Prioritized Experience Replay(PER) is introduced in this article, enhancing the operational success rate of the AOD algorithm. Finally, the reward function is optimized for the designed framework, thereby promoting the convergence of network training. Several comparable methods are tested on a public dataset(Active Vision Dataset), and the results clearly illustrate the superiority of the approach presented in this article.
Graphical abstract
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
The AOD Dataset [16] is available at https://www.cs.unc.edu/~ammirato/active_vision_dataset_website/index.html.
References
Zou Z, Chen K, Shi Z et al (2023) Object Detection in 20 years: A Survey. Proceedings of the IEEE 111(3):257–276. https://doi.org/10.1109/JPROC.2023.3238524
Pal A, Kumar V (2023) AgriDet: Plant Leaf Disease severity classification using agriculture detection framework. Eng Appl Artif Intell 119:105754. https://doi.org/10.1016/j.engappai.2022.105754
Zhang D, Hao X, Wang D et al (2023) An efficient lightweight convolutional neural network for industrial surface defect detection. Artif Intell Rev 56:10651–10677. https://doi.org/10.1007/s10462-023-10438-y
Jha SB, Babiceanu RF (2023) Deep CNN-based visual defect detection: Survey of current literature. Comput Industry 148:103911. https://doi.org/10.1016/j.compind.2023.103911
Zeng Y, Ma C, Zhu M, et al (2021) Cross-Modal 3D Object Detection and Tracking for Auto-Driving. In: 2021 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, Prague, pp 3850–3857, https://doi.org/10.1109/IROS51168.2021.9636498
Wang L, Zhang X, Song Z et al (2023) Multi-Modal 3D Object Detection in Autonomous Driving: A Survey and Taxonomy. IEEE Trans Intell Vehicles 8(7):3781–3798. https://doi.org/10.1109/TIV.2023.3264658
Zhao ZQ, Zheng P, Xu ST et al (2019) Object Detection With Deep Learning: A Review. IEEE Trans Neural Netw Learn Syst 30(11):3212–3232. https://doi.org/10.1109/TNNLS.2018.2876865
Lowe DG (2004) Distinctive Image Features from Scale-Invariant Keypoints. Int J Comput Vision 60:91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE Computer society conference on computer vision and pattern recognition (CVPR’05), vol 1. IEEE, San Diego, pp 886–893, https://doi.org/10.1109/CVPR.2005.177
Everingham M, Van Gool L, Williams CKI et al (2010) The PASCAL Visual Object Classes (VOC) Challenge. Int J Comput Vision 88:303–338. https://doi.org/10.1007/s11263-009-0275-4
Lin TY, Maire M, Belongie S, et al (2014) Microsoft COCO: Common objects in context. In: Fleet D, Pajdla T, Schiele B, et al (eds) Computer Vision - ECCV 2014. Springer, Cham, Zurich, pp 740–755, https://doi.org/10.1007/978-3-319-10602-1sps48
Deng J, Dong W, Socher R, et al (2009) ImageNet: A large-scale hierarchical image database. In: 2009 IEEE Conference on computer vision and pattern recognition(CVPR). IEEE, Miami, pp 248–255, https://doi.org/10.1109/CVPR.2009.5206848
Yang J, Ren Z, Xu M, et al (2019) Embodied Amodal Recognition: Learning to Move to Perceive Objects. In: 2019 IEEE/CVF international conference on computer vision (ICCV). IEEE, Seoul, pp 2040–2050, https://doi.org/10.1109/ICCV.2019.00213
Ali A, Zhu Y, Zakarya M (2022) Exploiting dynamic spatio-temporal graph convolutional neural networks for citywide traffic flows prediction. Neural Netw 145:233–247. https://doi.org/10.1016/j.neunet.2021.10.021
Kong Y, Fu Y (2022) Human action recognition and prediction: A survey. Int J Comput Vision 130:1366–1401. https://doi.org/10.1007/s11263-022-01594-9
Ammirato P, Poirson P, Park E, et al (2017) A dataset for developing and benchmarking active vision. In: 2017 IEEE International conference on robotics and automation (ICRA). IEEE, Singapore, pp 1378–1385, https://doi.org/10.1109/ICRA.2017.7989164
Mnih V, Kavukcuoglu K, Silver D et al (2015) Human-level control through deep reinforcement learning. Nature 518:529–533. https://doi.org/10.1038/nature14236
Liu S, Tian G, Zhang Y et al (2022) Active Object Detection Based on a Novel Deep Q-Learning Network and Long-Term Learning Strategy for the Service Robot. IEEE Trans Industrial Electron 69(6):5984–5993. https://doi.org/10.1109/TIE.2021.3090707
Ammirato P, Berg AC, Košecká J (2018) Active Vision Dataset Benchmark. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition workshops (CVPRW). IEEE, Anchorage, pp 21270–21273. https://doi.org/10.1109/CVPRW.2018.00277
García-Samartín JF, Ulloa CC, Cerro J, et al (2024) Active robotic search for victims using ensemble deep learning techniques. Machine Learning: Science and Technology 5(2). https://doi.org/10.1088/2632-2153/ad33df
Schaul T, Quan J, Antonoglou I, et al (2016) Prioritized Experience Replay arXiv:1511.05952
Lv L, Zhang S, Ding D et al (2019) Path Planning via an Improved DQN-Based Learning Policy. IEEE Access 7:67319–67330. https://doi.org/10.1109/ACCESS.2019.2918703
Sharma J, Andersen PA, Granmo OC et al (2021) Deep Q-Learning With Q-Matrix Transfer Learning for Novel Fire Evacuation Environment. IEEE Trans Syst, Man, Cybernetics: Syst 51(12):7363–7381. https://doi.org/10.1109/TSMC.2020.2967936
Lin HY, Liang SC, Chen YK (2021) Robotic Grasping With Multi-View Image Acquisition and Model-Based Pose Estimation. IEEE Sensors J 21(10):11870–11878. https://doi.org/10.1109/JSEN.2020.3030791
Song S, Kim D, Choi S (2022) View Path Planning via Online Multiview Stereo for 3-D Modeling of Large-Scale Structures. IEEE Trans Robotics 38(1):372–390. https://doi.org/10.1109/TRO.2021.3083197
Morrison D, Corke P, Leitner J (2019) Multi-View Picking: Next-best-view Reaching for Improved Grasping in Clutter. In: 2019 International conference on robotics and automation (ICRA). IEEE, Montreal, pp 8762–8768. https://doi.org/10.1109/ICRA.2019.8793805
Lehnert C, Tsai D, Eriksson A, et al (2019) 3D Move to See: Multi-perspective visual servoing towards the next best view within unstructured and occluded environments. In: 2019 IEEE/RSJ International conference on intelligent robots and systems (IROS). IEEE, Macao, pp 3890–3897, https://doi.org/10.1109/IROS40897.2019.8967918
Rapado-Rincón D, van Henten EJ, Kootstra G (2023) Development and evaluation of automated localisation and reconstruction of all fruits on tomato plants in a greenhouse based on multi-view perception and 3D multi-object tracking. Biosyst Eng 231:78–91. https://doi.org/10.1016/j.biosystemseng.2023.06.003
Denzler J, Brown C (2002) Information theoretic sensor data selection for active object recognition and state estimation. IEEE Trans Pattern Anal Mach Intell 24(2):145–157. https://doi.org/10.1109/34.982896
van Hoof H, Kroemer O, Peters J (2014) Probabilistic Segmentation and Targeted Exploration of Objects in Cluttered Environments. IEEE Trans Robot 30(5):1198–1209. https://doi.org/10.1109/TRO.2014.2334912
Yang J, Waslander SL (2022) Next-Best-View Prediction for Active Stereo Cameras and Highly Reflective Objects. In: 2022 International conference on robotics and automation (ICRA). IEEE, Philadelphia, pp 3684–3690. https://doi.org/10.1109/ICRA46639.2022.9811917
Cheng H, Duan F, He M (2023) Spiking Memory Policy with Population-encoding for Partially Observable Markov Decision Process Problems. Cognitive Comput 15:1153–1166. https://doi.org/10.1007/s12559-022-10030-6
Zhang H, Liu H, Guo D, et al (2017) From foot to head: Active face finding using deep q-learning. In: 2017 IEEE International conference on image processing (ICIP). IEEE, Beijing, pp 1862–1866. https://doi.org/10.1109/ICIP.2017.8296604
Han X, Liu H, Sun F, et al (2018) Active Object Detection Using Double DQN and Prioritized Experience Replay. In: 2018 International joint conference on neural networks (IJCNN). IEEE, Rio de Janeiro, pp 1–7. https://doi.org/10.1109/IJCNN.2018.8489296
Van Hasselt H, Guez A, Silver D (2016) Deep Reinforcement Learning with Double Q-Learning. Proceed AAAI Conference Artif Intell 30(1):2094–2100. https://doi.org/10.1609/aaai.v30i1.10295
Han X, Liu H, Sun F et al (2019) Active Object Detection With Multistep Action Prediction Using Deep Q-Network. IEEE Trans Industrial Inf 15(6):3723–3731. https://doi.org/10.1109/TII.2019.2890849
Xu Q, Fang F, Gauthier N, et al (2021) Towards Efficient Multiview Object Detection with Adaptive Action Prediction. In: 2021 IEEE international conference on robotics and automation (ICRA). IEEE, Xi’an, pp 13423–13429. https://doi.org/10.1109/ICRA48506.2021.9561388
Fang F, Xu Q, Gauthier N, et al (2021) Enhancing Multi-Step Action Prediction for Active Object Detection. In: 2021 IEEE International conference on image processing (ICIP). IEEE, Anchorage, pp 2189–2193. https://doi.org/10.1109/ICIP42928.2021.9506078
Schmid JF, Lauri M, Frintrop S (2019) Explore, Approach, and Terminate: Evaluating Subtasks in Active Visual Object Search Based on Deep Reinforcement Learning. In: 2019 IEEE/RSJ International conference on intelligent robots and systems (IROS). IEEE, Macau, pp 5008–5013, https://doi.org/10.1109/IROS40897.2019.8967805
Peng W, Wang W, Wang Y, et al (2024) Key Technologies and Trends of Active Robotic 3-D Measurement in Intelligent Manufacturing. IEEE/ASME Trans Mechatron pp 1–22. https://doi.org/10.1109/TMECH.2024.3396222
Akl J, Alladkani F, Calli B (2024) Feature-Driven Next View Planning for Cutting Path Generation in Robotic Metal Scrap Recycling. IEEE Trans Automation Sci Eng 21(3):3357–3373. https://doi.org/10.1109/TASE.2023.3278994
Wang T, Xi W, Cheng Y et al (2024) RL-NBV: A deep reinforcement learning based next-best-view method for unknown object reconstruction. Pattern Recognition Lett 184:1–6. https://doi.org/10.1016/j.patrec.2024.05.014
Wang A, Chen H, Liu L, et al (2024) Yolov10: Real-time End-to-End Object Detection arXiv:2405.14458
Tavakoli A, Pardo F, Kormushev P (2018) Action Branching Architectures for Deep Reinforcement Learning. Proceed AAAI Conference Artif Intell 32:1–9. https://doi.org/10.1609/aaai.v32i1.11798
He K, Zhang X, Ren S, et al (2016) Deep Residual Learning for Image Recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). IEEE, Las Vegas, pp 770–778. https://doi.org/10.1109/CVPR.2016.90
Wang Z, Schaul T, Hessel M, et al (2016) Dueling Network Architectures for Deep Reinforcement Learning. In: Proceedings of The 33rd international conference on machine learning. PMLR, New York, pp 1995–2003
Sun H, Zhu F, Li Y et al (2023) Viewpoint planning with transition management for active object recognition. Front Neurorobot 17:1093132. https://doi.org/10.3389/fnbot.2023.1093132
Sun H, Zhu F, Kong Y, et al (2021) Continuous Viewpoint Planning in Conjunction with Dynamic Exploration for Active Object Recognition. Entropy 23(12). https://doi.org/10.3390/e23121702
Wang N, Gao Y, Zhao H et al (2021) Reinforcement Learning-Based Optimal Tracking Control of an Unknown Unmanned Surface Vehicle. IEEE Trans Neural Netw Learn Syst 32(7):3034–3045. https://doi.org/10.1109/TNNLS.2020.3009214
Liu H, Sun F, Zhang X (2019) Robotic Material Perception Using Active Multimodal Fusion. IEEE Trans Industrial Electron 66(12):9878–9886. https://doi.org/10.1109/TIE.2018.2878157
Singh A, Sha J, Narayan KS, et al (2014) BigBIRD: A large-scale 3D database of object instances. In: 2014 IEEE International conference on robotics and automation (ICRA). IEEE, Miami, pp 509–516. https://doi.org/10.1109/ICRA.2014.6906903
Wang X, Wang S, Liang X et al (2024) Deep Reinforcement Learning: A Survey. IEEE Trans Neural Netw Learn Syst 35(4):5064–5078. https://doi.org/10.1109/TNNLS.2022.3207346
Fährmann D, Jorek N, Damer N et al (2022) Double Deep Q-Learning With Prioritized Experience Replay for Anomaly Detection in Smart Environments. IEEE Access 10:60836–60848. https://doi.org/10.1109/ACCESS.2022.3179720
Chen Y, Liang L (2023) SLP-Improved DDPG Path-Planning Algorithm for Mobile Robot in Large-Scale Dynamic Environment. Sensors 23(7). https://doi.org/10.3390/s23073521
Fang F, Liang W, Wu Y et al (2022) Self-Supervised Reinforcement Learning for Active Object Detection. IEEE Robot Automation Lett 7(4):10224–10231. https://doi.org/10.1109/LRA.2022.3193019
Yang N, Lu F, Yu B, et al (2023) Service Robot Active Object Detection based on Spatial Exploration using Deep Recurrent Q-learning Network. In: 2023 IEEE International conference on robotics and biomimetics (ROBIO), pp 1–6. https://doi.org/10.1109/ROBIO58561.2023.10354931
Xu N, Huo C, Zhang X et al (2021) Dynamic camera configuration learning for high-confidence active object detection. Neurocomputing 466:113–127. https://doi.org/10.1016/j.neucom.2021.09.037
Tian Z, Shen C, Chen H et al (2022) FCOS: A Simple and Strong Anchor-Free Object Detector. IEEE Trans Pattern Anal Mach Intell 44(4):1922–1933. https://doi.org/10.1109/TPAMI.2020.3032166
Abbaszadeh Shahri A, Chunling S, Larsson S (2024) A hybrid ensemble-based automated deep learning approach to generate 3D geo-models and uncertainty analysis. Eng Comput 40:1501–1516. https://doi.org/10.1007/s00366-023-01852-5
Author information
Authors and Affiliations
Contributions
Jianyu Wang: Conceptualization, Methodology, Software, Formal analysis, Data curation, and Writing-original draft preparation. Feng Zhu: Methodology, Validation, Resources, and Project administration. Qun Wang and Yunge Cui: Writing-review and editing. Haibo Sun and Pengfei Zhao:Supervision.
Corresponding author
Ethics declarations
Conflict of Interest/Competing Interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Mathematical Proof
Appendix A: Mathematical Proof
The purpose of adding this section is to prove the convergence and stability of the Q function designed in this paper through mathematical proof.
The value iteration of actions can be written as the following equation:
Then, the Bellman optimality backup operator is utilized as the policy \(\pi \).
It can be easily deduced from (A2),
So, When \(0<r<1\), \(\mathcal {B}_{*}\) is a strict contraction mapping. Therefore, when the optimal strategy appears, the action value(Q) can be regarded as the following sequence: \(Q_{list}=(U, B_{\pi ^{*}}U, B_{\pi ^{*}}^{2}U, \dots )\). If the exponent in the sequence is infinite, it can be seen from (A4) that the sequence will strictly converge to a value. Therefore, it can be proven that the Q function proposed in this paper has convergence.
According to the above proof, \(Q_{list}\) is converged. First, assume that \(Q_{list}\) has two convergence values \((U, V): U\ne V \). Then, it can be inferred that \(\Vert U-V \Vert _{\infty }>0\). Since U and V are both convergent values, according to convergence analysis, it can be concluded that \(\Vert \mathcal {B}_{\pi ^{*}}U - \mathcal {B}_{\pi ^{*}}V \Vert _{\infty } = \Vert U - V \Vert _{\infty }\). However, this derivation process does not meet the convergence mapping condition: \(\Vert \mathcal {B}_{\pi ^{*}}U - \mathcal {B}_{\pi ^{*}}V \Vert _{\infty } \le \gamma \Vert U - V \Vert _{\infty } < \Vert U - V \Vert _{\infty }\). Therefore, the optimal strategy remains stable at all times when it is unique and not affected by external factors.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, J., Zhu, F., Wang, Q. et al. An active object detection model with multi-step prediction based on deep q-learning network and innovative training algorithm. Appl Intell 55, 185 (2025). https://doi.org/10.1007/s10489-024-05993-y
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-024-05993-y