Abstract
Maintaining high-precision localization and ensuring map consistency are crucial objectives for mobile robots. However, loop closure detection remains a challenging aspect of their operation because of viewpoint and appearance changes. To address this issue, this paper proposes WP-VLAD, a novel hierarchical loop closure detection method that tightly couples global features and weighted local patch-level features (WPs). WP-VLAD employs MobileNetV3 as the backbone network for feature extraction, and integrates a trainable vector of local aggregated descriptors (VLAD) for compact global and local feature representation. A hierarchical navigable small world method is used to retrieve loop candidate frames based on the global features, whereas a multiscale feature fusion weighted map prediction module assigns weights to the local patches during mutual nearest neighbour matching. The proposed weight allocation strategy emphasizes salient regions, reducing interference from dynamic objects. The experimental results on benchmark datasets demonstrate that WP-VLAD significantly improves matching performance while maintaining efficient computation, exhibiting strong generalizability and robustness across various complex environments.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
This study has associated data in data repositories.
References
Cadena C et al (2016) Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age. IEEE Trans Rob 32(6):1309–1332
Galvez-López D, Tardos JD (2012) Bags of Binary Words for Fast Place Recognition in Image Sequences. IEEE Trans Rob 28(5):1188–1197
Bay H, Tuytelaars T, Van Gool L (2006) Surf: speeded up robust features. In: Leonardis A, Bischof H, Pinz A (eds) Computer Vision – ECCV 2006. ECCV 2006, Lecture notes in computer science, vol 3951. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11744023_32
Lowe DG (2004) Distinctive image features from scale-invariant key-points. Int J Comput Vis 60(2):91–110
Calonder M, Lepetit V, Strecha C, Fua P (2010) BRIEF: binary robust independent elementary features. In: Daniilidis K, Maragos P, Paragios N (eds) Computer Vision – ECCV 2010. ECCV 2010, Lecture notes in computer science, vol 6314. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15561-1_56
Radenović F, Tolias G, Chum O (2016) CNN image retrieval learns from BoW: unsupervised fine-tuning with hard examples. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer Vision – ECCV 2016. ECCV 2016, Lecture Notes in Computer Science(), vol 9905. Springer, Cham. https://doi.org/10.1007/978-3-319-46448-0_1
Arandjelovic R, Gronat P, Torii A, Pajdla T, Sivic J (2016) NetVLAD: CNN architecture for weakly supervised place recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, pp 5297–5307. https://doi.org/10.1109/CVPR.2016.572.
Xu Y, Huang J, Wang J et al (2021) ESA-VLAD: A lightweight network based on second-order attention and NetVLAD for loop closure detection. IEEE Robot Autom Lett 6(4):6545–6552
Hausler S, Garg S, Xu M, Milford M, Fischer T (2021) Patch-NetVLAD: multi-scale fusion of locally-global descriptors for place recognition. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), Nashville, TN, pp 14136–14147. https://doi.org/10.1109/CVPR46437.2021.01392.
Jin S, Dai X, Meng Q (2023) Loop closure detection with patch-level local features and visual saliency prediction. Eng Appl Artif Intell 120:105902
Yu J, Zhu C, Zhang J, Huang Q, Tao D (2020) Spatial Pyramid-Enhanced NetVLAD With Weighted Triplet Loss for Place Recognition. IEEE Trans Neural Netw Learn Syst 31(2):661–674
Khaliq A, Milford M, Garg S (2022) MultiRes-NetVLAD: Augmenting Place Recognition Training With Low-Resolution Imagery. IEEE Robot Autom Lett 7(2):3882–3889
Noh H, Araujo A, Sim J, Weyand T, Han B (2017) Large-scale image retrieval with attentive deep local features. In: 2017 IEEE international conference on computer vision (ICCV), Venice, pp 3476-3485. https://doi.org/10.1109/ICCV.2017.374.
DeTone D, Malisiewicz T, Rabinovich A (2018) SuperPoint: self-supervised interest point detection and description. In: 2018 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), Salt Lake City, pp 337–33712. https://doi.org/10.1109/CVPRW.2018.00060
Dusmanu M et al (2019) D2-Net: a trainable CNN for joint description and detection of local features. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), Long Beach, CA, pp 8084–8093. https://doi.org/10.1109/CVPR.2019.00828
Li D et al (2021) RaP-Net: a region-wise and point-wise weighting network to extract robust features for indoor localization. In: 2021 IEEE/RSJ international conference on intelligent robots and systems (IROS), Prague, pp 1331–1338. https://doi.org/10.1109/IROS51168.2021.9636248.
Ren M, Gao B (2023) Loop closure detection based on feature pyramids and NetVLAD. J Electron Imaging 32(6):063033. https://doi.org/10.1117/1.JEI.32.6.063033
An S, Zhu H, Wei D et al (2022) Fast and incremental loop closure detection with deep features and proximity graphs. J Field Robot 39(4):473–493
Keetha NV, Milford M, Garg S (2021) A hierarchical dual model of environment-and place-specific utility for visual place recognition. IEEE Robot Autom Lett 6(4):6969–6976
Malkov YA, Yashunin DA (2020) Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Trans Pattern Anal Mach Intell 42(4):824–836
Cummins M, Newman P (2008) FAB-MAP: Probabilistic localization and mapping in the space of appearance. Int J Robot Res 27(1):647–665
Cummins M, Newman P (2011) Appearance-only SLAM at large scale with FAB-MAP 2.0. Int J Robot Res 30(9):1100–1123
Labbé M, Michaud F (2013) Appearance-Based Loop Closure Detection for Online Large-Scale and Long-Term Operation. IEEE Trans Rob 29(3):734–745
Garcia-Fidalgo E, Ortiz A (2018) iBoW-LCD: An Appearance-Based Loop-Closure Detection Approach Using Incremental Bags of Binary Words. IEEE Robot Autom Lett 3(4):3051–3057
Chen Z, Lam O, Jacobson A, Milford M (2014) Convolutional neural network-based place recognition. arXiv:1411.1509
Sünderhauf N, Shirazi S, Dayoub F, Upcroft B, Milford M (2015) On the performance of ConvNet features for place recognition. In: 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS). Hamburg, Germany, pp 4297–4304. https://doi.org/10.1109/IROS.2015.7353986
Xin Z et al (2019) Localizing discriminative visual landmarks for place recognition. In: 2019 international conference on robotics and automation (ICRA), Montreal, pp 5979-5985. https://doi.org/10.1109/ICRA.2019.8794383.
Cao B, Araujo A, Sim J (2020) Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XX, pp 726–743. https://doi.org/10.1007/978-3-030-58565-5_43
Sarlin P-E, DeTone D, Malisiewicz T, Rabinovich A (2020) SuperGlue: learning feature matching with graph neural networks. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), Seattle, pp 4937–4946. https://doi.org/10.1109/CVPR42600.2020.00499
Li Z, Lee CDW, Tung BXL, Huang Z, Rus D, Ang MH (2023) Hot-NetVLAD: Learning Discriminatory Key Points for Visual Place Recognition. IEEE Robot Autom Lett 8(2):974–980
Ma J, Zhang K, Jiang J (2023) Loop Closure Detection via Locality Preserving Matching With Global Consensus. J Autom Sin 10(2):411–426
Li P, Wen S, Xu C, Qiu TZ (2024) Visual Place Recognition for Opposite Viewpoints and Environment Changes. IEEE Trans Instrum Meas 73:1–9
Cai Y, Zhao J, Cui J, Zhang F, Feng T, Ye C (2022) Patch-NetVLAD+: Learned patch descriptor and weighted matching strategy for place recognition. In: 2022 IEEE international conference on multisensor fusion and integration for intelligent systems (MFI), Bedford, pp 1–8. https://doi.org/10.1109/MFI55806.2022.9913860
Zhou Y, Chen S, Wang Y, Huan W (2020) Review of research on lightweight convolutional neural networks. In: 2020 IEEE 5th information technology and mechatronics engineering conference (ITOEC), Chongqing, pp 1713–1720. https://doi.org/10.1109/ITOEC49072.2020.9141847
Howard A et al (2019) Searching for MobileNetV3. In: 2019 IEEE/CVF international conference on computer vision (ICCV). Seoul, pp 1314–1324. https://doi.org/10.1109/ICCV.2019.00140
Tsintotas KA, Bampis L, Gasteratos A (2018) Assigning visual words to places for loop closure detection. In: IEEE international conference on robotics and automation (ICRA), vol 2018, Brisbane, QLD, pp 5979–5985. https://doi.org/10.1109/ICRA.2018.8461146
Kenshimov C, Bampis L, Amirgaliyev B et al (2017) Deep learning features exception for cross-season visual place recognition. Pattern Recogn Lett 100:124–130
Merrill N, Huang G (2018) Lightweight unsupervised deep loop closure. arXiv:1805.07703,2018,5
Torii A, Sivic J, Okutomi M, Pajdla T (2015) Visual Place Recognition with Repetitive Structures. IEEE Trans Pattern Anal Mach Intell 37(11):2346–2359
Torii A, Arandjelović R, Sivic J, Okutomi M, Pajdla T (2018) 24/7 Place Recognition by View Synthesis. IEEE Trans Pattern Anal Mach Intell 40(2):257–271
Qin T, Li P, Shen S (2018) VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator. IEEE Trans Rob 34(4):1004–1020
Acknowledgements
We would like to gratefully thank the reviewers for their thorough review and are extraordinarily appreciative of their comments and suggestions, which have significantly improved the quality of the publication.
Funding
This work is partly supported by the National Natural Science Foundation of China (62373017).
Author information
Authors and Affiliations
Contributions
Mingrong Ren: study conception and design, methodology development, manuscript revision. Xiurui Zhang: manuscript preparation. Bin Liu: data analysis. Yuehui Zhu conducted the experiments.
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ren, M., Zhang, X., Liu, B. et al. Hierarchical loop closure detection with weighted local patch features and global descriptors. Appl Intell 55, 266 (2025). https://doi.org/10.1007/s10489-024-06135-0
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-024-06135-0