Abstract:
Although the navigation of robots in urban environments has achieved great performance, there is still a problem of insufficient robustness in cross-scene (ground, water ...Show MoreMetadata
Abstract:
Although the navigation of robots in urban environments has achieved great performance, there is still a problem of insufficient robustness in cross-scene (ground, water surface) navigation applications. An intuitive idea is to introduce multi-modal complementary data to improve the robustness of the algorithms. Therefore, this paper presents an MMDF (multi-modal deep feature) based cross-scene place recognition framework, which consists of four kinds of modules: LiDAR module, image module, fusion module and NetVLAD module. 3D point clouds and images are input to the network firstly. The point cloud module uses PointNet to extract point cloud features. The image module uses a lightweight network to extract image features. The fusion module uses image semantic features to enhance point cloud features, and then the enhanced point cloud features are aggregated using NetVLAD to obtain the final enhanced descriptors. Extensive experiments on KITTI, Oxford RobotCar and USVInland datasets demonstrate MMDF outperforms PointNetVLAD, NetVLAD and a camera-LiDAR fused descriptor.
Published in: IEEE Robotics and Automation Letters ( Volume: 7, Issue: 3, July 2022)