Abstract:
Perspective ambiguity widely exists in human-robot interaction, especially in the language-driven robot manipulation. It is easy to place the object at a wrong position j...Show MoreMetadata
Abstract:
Perspective ambiguity widely exists in human-robot interaction, especially in the language-driven robot manipulation. It is easy to place the object at a wrong position just because of different perspectives. Thus, placement location determination is required in language-driven robot manipulation. In this letter, we introduced a Language-Driven Perspective-based Pick-and-Place Algorithm (LD3PA) to solve this problem. By taking natural language description, image and point cloud from Kinects, a perspective disambiguation network using multi-head self-attention is proposed to infer the user’s perspective. A command analysis network is also designed to decode object’s role and relation from user’s commands. A placement optimization algorithm based on convolution is proposed to find the final placement location and orientation of the picked objects considering the edge of the table and the objects already on the table. Experiments show that our proposed LD3PA could not only correctly disambiguate the user’s perspective, but also find a good location for object placement. The perspective disambiguation ability outperforms human in the comparable experiment. Finally, we implement the algorithm on physical Panda Robot. The result shows that our proposed LD3PA could fuse the multi-modal information to solve the perspective ambiguity and placement optimization problem in human-robot interaction.
Published in: IEEE Robotics and Automation Letters ( Volume: 7, Issue: 2, April 2022)