research-article

Embodied Visual Navigation for Grasping

Authors:

Chao ChenAuthors Info & Claims

ICCIP '23: Proceedings of the 2023 9th International Conference on Communication and Information Processing

Pages 194 - 198

https://doi.org/10.1145/3638884.3638912

Published: 23 April 2024 Publication History

Abstract

This paper presents a novel approach to robotic grasping by integrating embodied visual navigation with reinforcement learning. The primary objective is to determine the optimal location for a robot to stand for successful object grasping. The motivation for this research is to address the existing gap in the literature where navigation and grasping are often treated as separate problems, leading to suboptimal performance. Our approach leverages multimodal sensory data, including RGB images, depth images, and semantic information, to guide the robot’s navigation. It also utilizes deep reinforcement learning to enable the robot to learn optimal navigation strategies from visual input. The effectiveness of this approach is demonstrated through a series of experiments conducted in simple and complex scenes with varying numbers of obstacles. The results show that our method achieves a high success rate and a fast grasping speed in different scenarios, outperforming other methods. This work contributes significantly to the field of robotic grasping by integrating embodied visual navigation and deep reinforcement learning, and by demonstrating its effectiveness through rigorous experiments.

References

[1]

Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko Sünderhauf, Ian Reid, Stephen Gould, and Anton Van Den Hengel. 2018. Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments. In Proceedings of the IEEE conference on computer vision and pattern recognition.

[2]

Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Joseph Dabis, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Jasmine Hsu, 2022. Rt-1: Robotics transformer for real-world control at scale. arXiv preprint arXiv:2212.06817 (2022).

[3]

Henrik Christensen, Nancy Amato, Holly Yanco, Maja Mataric, Howie Choset, Ann Drobnis, Ken Goldberg, Jessy Grizzle, Gregory Hager, John Hollerbach, 2021. A roadmap for us robotics–from internet to robotics 2020 edition. Foundations and Trends® in Robotics 8, 4 (2021), 307–424.

[4]

Y. Ding. 2023. BestMan_Pybullet. https://github.com/yding25/BestMan_Pybullet GitHub repository.

[5]

Theophile Gervet, Soumith Chintala, Dhruv Batra, Jitendra Malik, and Devendra Singh Chaplot. 2022. Navigating to Objects in the Real World. http://arxiv.org/abs/2212.00922 arXiv:2212.00922 [cs].

[6]

Daekyum Kim, Sang-Hun Kim, Taekyoung Kim, B. B. Kang, Minhyuk Lee, Wookeun Park, Subyeong Ku, Dongwook Kim, Junghan Kwon, Hochang Lee, J. Bae, Yong‐Lae Park, Kyu-Jin Cho, and Sungho Jo. 2021. Review of machine learning methods in soft robotics. PLoS ONE (2021).

[7]

Kai Lemmerz, Paul Glogowski, Phil Kleineberg, A. Hypki, and B. Kuhlenkötter. 2019. A Hybrid Collaborative Operation for Human-Robot Interaction Supported by Machine Learning. International Conference on Human System Interaction (2019).

[8]

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and B. Guo. 2021. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. IEEE International Conference on Computer Vision (2021).

[9]

K. Mathai, R. Enoch, and J. Jishnu. 2016. Robotics and artificial intelligence : The future of surgeons and surgery. arXiv preprint (2016).

[10]

Justinas Mišeikis, Pietro Caroni, Patricia Duchamp, A. Gasser, Rastislav Marko, Nelija Miseikiene, Frederik Zwilling, Charles de Castelbajac, Lucas Eicher, M. Früh, and H. Früh. 2020. Lio-A Personal Robot Assistant for Human-Robot Interaction and Care Applications. IEEE Robotics and Automation Letters (2020).

[11]

V. Prasad, Dorothea Koert, R. Stock-Homburg, Jan Peters, and G. Chalvatzaki. 2022. MILD: Multimodal Interactive Latent Dynamics for Learning Human-Robot Interaction. IEEE-RAS International Conference on Humanoid Robots (2022).

[12]

Nicholas Roy, Ingmar Posner, Tim Barfoot, Philippe Beaudoin, Yoshua Bengio, Jeannette Bohg, Oliver Brock, Isabelle Depatie, Dieter Fox, Dan Koditschek, 2021. From machine learning to robotics: challenges and opportunities for embodied intelligence. arXiv preprint arXiv:2110.15245 (2021).

[13]

M. Selvaggio, Marco Cognetti, S. Nikolaidis, S. Ivaldi, and B. Siciliano. 2021. Autonomy in Physical Human-Robot Interaction: A Brief Survey. IEEE Robotics and Automation Letters (2021).

[14]

Xiaowen Su, Fengpei Yuan, Ran Zhang, Jian Liu, M. Boltz, and Xiaopeng Zhao. 2022. Deploying a Human Robot Interaction Model for Dementia Care in Federated Learning. IEEE/ACM International Conference on Connected Health: Applications, Systems and Engineering Technologies (2022).

[15]

Kai Zhu and Tao Zhang. 2021. Deep Reinforcement Learning Based Mobile Robot Navigation: A Review. (2021).

[16]

Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J. Lim, Abhinav Gupta, Li Fei-Fei, and Ali Farhadi. 2016. Target-Driven Visual Navigation in Indoor Scenes Using Deep Reinforcement Learning. arxiv:1609.05143 [cs]

Index Terms

Embodied Visual Navigation for Grasping
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Vision for robotics

Recommendations

Collision Anticipation via Deep Reinforcement Learning for Visual Navigation
Pattern Recognition and Image Analysis
Abstract
Visual navigation is the ability of an autonomous agent to find its way in a large and complex environment based on visual information. It is indeed a fundamental problem in computer vision and robotics. In this paper, we propose a deep ...
Visual Navigation in Natural Environments: From Range and Color Data to a Landmark-Based Model

This paper concerns the exploration of a natural environment by a mobile robot equipped with both a video color camera and a stereo-vision system. We focus on the interest of such a multi-sensory system to deal with the navigation of a robot in an a ...
Visual semantic navigation with real robots: Visual semantic navigation with real robots
Abstract
Visual Semantic Navigation (VSN) is the ability of a robot to learn visual semantic information for navigating in unseen environments. These VSN models are typically tested in those virtual environments where they are trained, mainly using ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICCIP '23: Proceedings of the 2023 9th International Conference on Communication and Information Processing

December 2023

648 pages

ISBN:9798400708909

DOI:10.1145/3638884

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 April 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Excellent Youth Foundation of Chongqing
National Natural Science Foundation of China

Conference

ICCIP 2023

ICCIP 2023: 2023 the 9th International Conference on Communication and Information Processing

December 14 - 16, 2023

Lingshui, China

Acceptance Rates

Overall Acceptance Rate 61 of 301 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
32
Total Downloads

Downloads (Last 12 months)32
Downloads (Last 6 weeks)4

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten