Abstract
Active vision is the ability of intelligent agents to dynamically gather more information about their surroundings by physical motion of the camera. In the case of object recognition, active vision enables improved performance by incorporating classification decisions from new viewpoints when there is some degree of uncertainty in the current recognition result. A natural question in an autonomous active vision system is, nonetheless, how to determine the new viewpoint, i.e. in what pose should the camera be moved? This is the traditional question of next best view in active perception systems. Current approaches to the next best view problem either need construction of occupancy grids or require training datasets of 3D objects or multiple captures of the same object in specified poses. Occupancy grid methods are usually dependent on multiple camera movements to perform well, which make them more useful for 3D reconstruction applications than object recognition. In this paper, a next best view method for active object recognition based on object appearance and surface direction is proposed that decides on the next cameras pose without requiring any specifically structured training datasets of 3D objects. It is also designed for single-shot deductions of next viewpoint and is able to determine next best views without the need for substantial knowledge of 3D voxels in the environment around the camera. The experimental results illustrate the efficiency of the proposed method, while showing large improvements in accuracy and F1 score.
Similar content being viewed by others
Notes
Dataset available at https://github.com/pouryahoseini/Next-Best-View-Dataset.
References
Almadhoun R, Abduldayem A, Taha T, Seneviratne L, Zweiri Y (2019) Guided next best view for 3d reconstruction of large complex structures. Remote Sens 11(20):2440
Atanasov N, Sankaran B, Le Ny J, Pappas GJ, Daniilidis K (2014) Nonmyopic view planning for active object classification and pose estimation. IEEE Trans Robot 30(5):1078–1090
Bajcsy R, Aloimonos Y, Tsotsos JK (2018) Revisiting active perception. Auton Robot 42 (2):177–196
Barzilay O, Zelnik-Manor L, Gutfreund Y, Wagner H, Wolf A (2017) From biokinematics to a robotic active vision system. Bioinspir Biomim 12(5):056004
Bircher A, Kamel M, Alexis K, Oleynikova H, Siegwart R (2016) Receding horizon” next-best-view” planner for 3d exploration. In: 2016 IEEE international conference on robotics and automation (ICRA), IEEE, pp 1462–1468
Cui J, Wen JT, Trinkle J (2019) A multi-sensor next-best-view framework for geometric model-based robotics applications. In: 2019 International conference on robotics and automation (ICRA), IEEE, pp 8769–8775
Das D, Lee CG (2019) A two-stage approach to few-shot learning for image recognition. IEEE Trans Image Process 29:3336–3350
Doumanoglou A, Kouskouridas R, Malassiotis S, Kim TK (2016) Recovering 6d object pose and predicting next-best-view in the crowd. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3583–3592
Edmonds M, Yigit T, Yi J (2020) Auto-calibrated 3d hyperspectral scanning using a heterogeneous set of cameras and lights with spectrally-optimal next-best-view planning. In: 2020 IEEE 16th International conference on automation science and engineering (CASE), pp 863–868. IEEE
Gao P, Yuan R, Wang F, Xiao L, Fujita H, Zhang Y (2020) Siamese attentional keypoint network for high performance visual tracking. Knowl Based Syst 193:105448
Gao P, Zhang Q, Wang F, Xiao L, Fujita H, Zhang Y (2020) Learning reinforced attentional representation for end-to-end visual tracking. Inform Sci 517:52–67
Gonzalez RC, Richard E (2018) Woods digital image processing, Pearson Prentice Hall
Hayashi T, Fujita H (2020) Cluster-based zero-shot learning for multivariate data. J Ambient Intell Humaniz Comput 12:1–15
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hoseini P, Blankenburg J, Nicolescu M, Nicolescu M, Feil-Seifer D (2019) Active eye-in-hand data management to improve the robotic object detection performance. Computers 8(4):71
Hoseini P, Blankenburg J, Nicolescu M, Nicolescu M, Feil-Seifer D (2019) An active robotic vision system with a pair of moving and stationary cameras. In: International symposium on visual computing, Springer, pp 184–195
Jia Z, Chang YJ, Chen T (2010) A general boosting-based framework for active object recognition. In: British machine vision conference (BMVC), Citeseer, pp 1–11
Lauri M, Pajarinen J, Peters J, Frintrop S (2020) Multi-sensor next-best-view planning as matroid-constrained submodular maximization. IEEE Robot Autom Lett 5(4):5323–5330
Lehnert C, Tsai D, Eriksson A, McCool C (2019) 3d move to see: Multi-perspective visual servoing towards the next best view within unstructured and occluded environments. In: 2019 IEEE/RSJ International conference on intelligent robots and systems (IROS), IEEE, pp 3890–3897
Morrison D, Corke P, Leitner J (2019) Multi-view picking: Next-best-view reaching for improved grasping in clutter. In: 2019 International conference on robotics and automation (ICRA), IEEE, pp 8762–8768
Palomeras N, Hurtós N, Vidal E, Carreras M (2019) Autonomous exploration of complex underwater environments using a probabilistic next-best-view planner. IEEE Robot Autom Lett 4(2):1619–1625
Pérez-Hernández F, Tabik S, Lamas A, Olmos R, Fujita H, Herrera F (2020) Object detection binary classifiers methodology based on deep learning to identify small objects handled similarly: Application in video surveillance. Knowl Based Syst 194 :105590
Potthast C, Sukhatme GS (2014) A probabilistic framework for next best view estimation in a cluttered environment. J Vis Commun Image Represent 25(1):148–164
Rebull Mestres J (2017) Implementation of an automated eye-in hand scanning system using best-path planning, Master’s thesis, Universitat Politècnica de Catalunya
Wang Z, Xiong J, Yang Y, Li H (2017) A flexible and robust threshold selection method. IEEE Trans Circuits Syst Video Technol 28(9):2220–2232
Wu Y, Jiang X, Fang Z, Gao Y, Fujita H (2021) Multi-modal 3d object detection by 2d-guided precision anchor proposal and multi-layer fusion. Appl Soft Comput 108:107405
Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2015) 3d shapenets: A deep representation for volumetric shapes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1912–1920
Xu Y, Hu J, Wattanachote K, Zeng K, Gong Y (2020) Sketch-based shape retrieval via best view selection and a cross-domain similarity measure. IEEE Trans Multimed 22(11):2950–2962
Zeng R, Zhao W, Liu YJ (2020) Pc-nbv: A point cloud based deep network for efficient next best view planning. In: 2020 IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE, pp 7050–7057
Zhu K, Jiang X, Fang Z, Gao Y, Fujita H, Hwang JN (2021) Photometric transfer for direct visual odometry. Knowl Based Syst 213:106671
Funding
This work has been supported in part by the Office of Naval Research award N00014-16-1-2312 and US Army Research Laboratory (ARO) award W911NF-20-2-0084.
Author information
Authors and Affiliations
Contributions
Conceptualization: Pourya Hoseini, Mircea Nicolescu, Monica Nicolescu; Methodology: Pourya Hoseini, Mircea Nicolescu, Shuvo Kumar Paul; Formal Ananlysis and Investigation: Pourya Hoseini, Mircea Nicolescu, Monica Nicolescu; Writing - original draft preparation: Pourya Hoseini; Writing - review and editing: Pourya Hoseini, Shuvo Kumar Paul, Mircea Nicolescu; Funding acquisition: Monica Nicolescu, Mircea Nicolescu; Resources: Monica Nicolescu, Mircea Nicolescu, Pourya Hoseini, Shuvo Kumar Paul; Supervision: Mircea Nicolescu, Monica Nicolescu.
Corresponding author
Additional information
Availability of data and material
Yes. Dataset available at https://github.com/pouryahoseini/Next-Best-View-Datasethttps://github.com/pouryahoseini/Next-Best-View-Dataset.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Hoseini, P., Paul, S.K., Nicolescu, M. et al. A one-shot next best view system for active object recognition. Appl Intell 52, 5290–5309 (2022). https://doi.org/10.1007/s10489-021-02657-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02657-z