Skip to main content
Log in

A one-shot next best view system for active object recognition

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Active vision is the ability of intelligent agents to dynamically gather more information about their surroundings by physical motion of the camera. In the case of object recognition, active vision enables improved performance by incorporating classification decisions from new viewpoints when there is some degree of uncertainty in the current recognition result. A natural question in an autonomous active vision system is, nonetheless, how to determine the new viewpoint, i.e. in what pose should the camera be moved? This is the traditional question of next best view in active perception systems. Current approaches to the next best view problem either need construction of occupancy grids or require training datasets of 3D objects or multiple captures of the same object in specified poses. Occupancy grid methods are usually dependent on multiple camera movements to perform well, which make them more useful for 3D reconstruction applications than object recognition. In this paper, a next best view method for active object recognition based on object appearance and surface direction is proposed that decides on the next cameras pose without requiring any specifically structured training datasets of 3D objects. It is also designed for single-shot deductions of next viewpoint and is able to determine next best views without the need for substantial knowledge of 3D voxels in the environment around the camera. The experimental results illustrate the efficiency of the proposed method, while showing large improvements in accuracy and F1 score.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. Dataset available at https://github.com/pouryahoseini/Next-Best-View-Dataset.

References

  1. Almadhoun R, Abduldayem A, Taha T, Seneviratne L, Zweiri Y (2019) Guided next best view for 3d reconstruction of large complex structures. Remote Sens 11(20):2440

    Article  Google Scholar 

  2. Atanasov N, Sankaran B, Le Ny J, Pappas GJ, Daniilidis K (2014) Nonmyopic view planning for active object classification and pose estimation. IEEE Trans Robot 30(5):1078–1090

    Article  Google Scholar 

  3. Bajcsy R, Aloimonos Y, Tsotsos JK (2018) Revisiting active perception. Auton Robot 42 (2):177–196

    Article  Google Scholar 

  4. Barzilay O, Zelnik-Manor L, Gutfreund Y, Wagner H, Wolf A (2017) From biokinematics to a robotic active vision system. Bioinspir Biomim 12(5):056004

    Article  Google Scholar 

  5. Bircher A, Kamel M, Alexis K, Oleynikova H, Siegwart R (2016) Receding horizon” next-best-view” planner for 3d exploration. In: 2016 IEEE international conference on robotics and automation (ICRA), IEEE, pp 1462–1468

  6. Cui J, Wen JT, Trinkle J (2019) A multi-sensor next-best-view framework for geometric model-based robotics applications. In: 2019 International conference on robotics and automation (ICRA), IEEE, pp 8769–8775

  7. Das D, Lee CG (2019) A two-stage approach to few-shot learning for image recognition. IEEE Trans Image Process 29:3336–3350

    Article  Google Scholar 

  8. Doumanoglou A, Kouskouridas R, Malassiotis S, Kim TK (2016) Recovering 6d object pose and predicting next-best-view in the crowd. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3583–3592

  9. Edmonds M, Yigit T, Yi J (2020) Auto-calibrated 3d hyperspectral scanning using a heterogeneous set of cameras and lights with spectrally-optimal next-best-view planning. In: 2020 IEEE 16th International conference on automation science and engineering (CASE), pp 863–868. IEEE

  10. Gao P, Yuan R, Wang F, Xiao L, Fujita H, Zhang Y (2020) Siamese attentional keypoint network for high performance visual tracking. Knowl Based Syst 193:105448

    Article  Google Scholar 

  11. Gao P, Zhang Q, Wang F, Xiao L, Fujita H, Zhang Y (2020) Learning reinforced attentional representation for end-to-end visual tracking. Inform Sci 517:52–67

    Article  Google Scholar 

  12. Gonzalez RC, Richard E (2018) Woods digital image processing, Pearson Prentice Hall

  13. Hayashi T, Fujita H (2020) Cluster-based zero-shot learning for multivariate data. J Ambient Intell Humaniz Comput 12:1–15

    Google Scholar 

  14. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  15. Hoseini P, Blankenburg J, Nicolescu M, Nicolescu M, Feil-Seifer D (2019) Active eye-in-hand data management to improve the robotic object detection performance. Computers 8(4):71

    Article  Google Scholar 

  16. Hoseini P, Blankenburg J, Nicolescu M, Nicolescu M, Feil-Seifer D (2019) An active robotic vision system with a pair of moving and stationary cameras. In: International symposium on visual computing, Springer, pp 184–195

  17. Jia Z, Chang YJ, Chen T (2010) A general boosting-based framework for active object recognition. In: British machine vision conference (BMVC), Citeseer, pp 1–11

  18. Lauri M, Pajarinen J, Peters J, Frintrop S (2020) Multi-sensor next-best-view planning as matroid-constrained submodular maximization. IEEE Robot Autom Lett 5(4):5323–5330

    Article  Google Scholar 

  19. Lehnert C, Tsai D, Eriksson A, McCool C (2019) 3d move to see: Multi-perspective visual servoing towards the next best view within unstructured and occluded environments. In: 2019 IEEE/RSJ International conference on intelligent robots and systems (IROS), IEEE, pp 3890–3897

  20. Morrison D, Corke P, Leitner J (2019) Multi-view picking: Next-best-view reaching for improved grasping in clutter. In: 2019 International conference on robotics and automation (ICRA), IEEE, pp 8762–8768

  21. Palomeras N, Hurtós N, Vidal E, Carreras M (2019) Autonomous exploration of complex underwater environments using a probabilistic next-best-view planner. IEEE Robot Autom Lett 4(2):1619–1625

    Article  Google Scholar 

  22. Pérez-Hernández F, Tabik S, Lamas A, Olmos R, Fujita H, Herrera F (2020) Object detection binary classifiers methodology based on deep learning to identify small objects handled similarly: Application in video surveillance. Knowl Based Syst 194 :105590

    Article  Google Scholar 

  23. Potthast C, Sukhatme GS (2014) A probabilistic framework for next best view estimation in a cluttered environment. J Vis Commun Image Represent 25(1):148–164

    Article  Google Scholar 

  24. Rebull Mestres J (2017) Implementation of an automated eye-in hand scanning system using best-path planning, Master’s thesis, Universitat Politècnica de Catalunya

  25. Wang Z, Xiong J, Yang Y, Li H (2017) A flexible and robust threshold selection method. IEEE Trans Circuits Syst Video Technol 28(9):2220–2232

    Article  Google Scholar 

  26. Wu Y, Jiang X, Fang Z, Gao Y, Fujita H (2021) Multi-modal 3d object detection by 2d-guided precision anchor proposal and multi-layer fusion. Appl Soft Comput 108:107405

    Article  Google Scholar 

  27. Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2015) 3d shapenets: A deep representation for volumetric shapes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1912–1920

  28. Xu Y, Hu J, Wattanachote K, Zeng K, Gong Y (2020) Sketch-based shape retrieval via best view selection and a cross-domain similarity measure. IEEE Trans Multimed 22(11):2950–2962

    Google Scholar 

  29. Zeng R, Zhao W, Liu YJ (2020) Pc-nbv: A point cloud based deep network for efficient next best view planning. In: 2020 IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE, pp 7050–7057

  30. Zhu K, Jiang X, Fang Z, Gao Y, Fujita H, Hwang JN (2021) Photometric transfer for direct visual odometry. Knowl Based Syst 213:106671

    Article  Google Scholar 

Download references

Funding

This work has been supported in part by the Office of Naval Research award N00014-16-1-2312 and US Army Research Laboratory (ARO) award W911NF-20-2-0084.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: Pourya Hoseini, Mircea Nicolescu, Monica Nicolescu; Methodology: Pourya Hoseini, Mircea Nicolescu, Shuvo Kumar Paul; Formal Ananlysis and Investigation: Pourya Hoseini, Mircea Nicolescu, Monica Nicolescu; Writing - original draft preparation: Pourya Hoseini; Writing - review and editing: Pourya Hoseini, Shuvo Kumar Paul, Mircea Nicolescu; Funding acquisition: Monica Nicolescu, Mircea Nicolescu; Resources: Monica Nicolescu, Mircea Nicolescu, Pourya Hoseini, Shuvo Kumar Paul; Supervision: Mircea Nicolescu, Monica Nicolescu.

Corresponding author

Correspondence to Pourya Hoseini.

Additional information

Availability of data and material

Yes. Dataset available at https://github.com/pouryahoseini/Next-Best-View-Datasethttps://github.com/pouryahoseini/Next-Best-View-Dataset.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hoseini, P., Paul, S.K., Nicolescu, M. et al. A one-shot next best view system for active object recognition. Appl Intell 52, 5290–5309 (2022). https://doi.org/10.1007/s10489-021-02657-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02657-z

Keywords

Navigation