skip to main content
10.1145/3309772.3309800acmotherconferencesArticle/Chapter ViewAbstractPublication PagesappisConference Proceedingsconference-collections
research-article

Experimental study of the suitability of CNN-based holistic descriptors for accurate visual localization

Published: 07 January 2019 Publication History

Abstract

Holistic Image Descriptors (HIDs) are compact representations of a whole image that, being suitable for Place Recognition, are not appropriate for accurate Visual Localization. The most successful HIDs are those extracted from Convolutional Neural Networks (CNNs) like VGG, ResNet, InceptionV4 or NetVLAD. Very recently, the equivariance property has been proposed to reflect how image 2D transformations (e.g. rotation, flip, scale changes) influence the descriptor [17]. Our work experimentally analyzes whether such property can be a good indicator of the suitability of the existing CNN-based HID for estimating changes in the camera pose, which produces more complex transformations of the image than the pure transformations analyzed in [17]. The results we report here are a preliminary work in the context of an ongoing project towards appearance-based localization of autonomous mobile robots.

References

[1]
Pulkit Agrawal, Joao Carreira, and Jitendra Malik. 2015. Learning to see by moving. In Proceedings of the IEEE International Conference on Computer Vision. 37--45.
[2]
Relja Arandjelovic, Petr Gronat, Akihiko Torii, Tomas Pajdla, and Josef Sivic. 2016. NetVLAD: CNN architecture for weakly supervised place recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5297--5307.
[3]
Vassileios Balntas, Shuda Li, and Victor Prisacariu. 2018. RelocNet: Continuous Metric Learning Relocalisation using Neural Nets. In Proceedings of the European Conference on Computer Vision (ECCV). 751--767.
[4]
Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. 2006. Surf: Speeded up robust features. In European conference on computer vision. Springer, 404--417.
[5]
Titus Cieslewski, Siddharth Choudhary, and Davide Scaramuzza. 2018. Data-efficient decentralized visual SLAM. In 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2466--2473.
[6]
Mark Cummins and Paul Newman. 2008. FAB-MAP: Probabilistic localization and mapping in the space of appearance. The International Journal of Robotics Research 27, 6 (2008), 647--665.
[7]
Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In Conference on Computer Vision and Pattern Recognition (CVPR).
[8]
Ruben Gomez-Ojeda, Manuel Lopez-Antequera, Nicolai Petkov, and Javier González Jiménez. 2015. Training a Convolutional Neural Network for Appearance-Invariant Place Recognition. CoRR abs/1505.07428 (2015). arXiv:1505.07428 http://arxiv.org/abs/1505.07428
[9]
Ruben Gomez-Ojeda, David Zuñiga-Noël, Francisco-Angel Moreno, Davide Scaramuzza, and Javier Gonzalez-Jimenez. 2017. PL-SLAM: a Stereo SLAM System through the Combination of Points and Line Segments. arXiv preprint arXiv:1705.09479 (2017).
[10]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. CoRR abs/1512.03385 (2015). arXiv:1512.03385 http://arxiv.org/abs/1512.03385
[11]
Minyoung Huh, Pulkit Agrawal, and Alexei A Efros. 2016. What makes ImageNet good for transfer learning? arXiv preprint arXiv:1608.08614 (2016).
[12]
Robert Huitl, Georg Schroth, Sebastian Hilsenbeck, Florian Schweiger, and Eckehard Steinbach. 2012. TUMindoor: An extensive image and point cloud dataset for visual indoor localization and mapping. In Image Processing (ICIP), 2012 19th IEEE International Conference on. IEEE, 1773--1776.
[13]
Dinesh Jayaraman and Kristen Grauman. 2015. Learning image representations tied to ego-motion. In Proceedings of the IEEE International Conference on Computer Vision. 1413--1421.
[14]
Herve Jegou, Florent Perronnin, Matthijs Douze, Jorge Sánchez, Patrick Perez, and Cordelia Schmid. 2012. Aggregating local image descriptors into compact codes. IEEE transactions on pattern analysis and machine intelligence 34, 9 (2012), 1704--1716.
[15]
Alex Kendall, Matthew Grimes, and Roberto Cipolla. 2015. Posenet: A convolutional network for real-time 6-dof camera relocalization. In Proceedings of the IEEE international conference on computer vision. 2938--2946.
[16]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey EHinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.
[17]
Karel Lenc and Andrea Vedaldi. 2015. Understanding image representations by measuring their equivariance and equivalence. In Proceedings of the IEEE conference on computer vision and pattern recognition. 991--999.
[18]
Manuel Lopez-Antequera, Ruben Gomez-Ojeda, Nicolai Petkov, and Javier Gonzalez-Jimenez. 2017. Appearance-invariant place recognition by discrimi-natively training a convolutional neural network. Pattern Recognition Letters 92 (2017), 89--95.
[19]
Manuel Lopez-Antequera, Nicolai Petkov, and Javier Gonzalez-Jimenez. 2016. Image-based localization using Gaussian processes. In Indoor Positioning and Indoor Navigation (IPIN), 2016 International Conference on. Winner, Best Paper award.
[20]
Manuel Lopez-Antequera, Nicolai Petkov, and Javier Gonzalez-Jimenez. 2017. City-scale continuous visual localization. In Mobile Robots (ECMR), 2017 European Conference on. IEEE, 1--6.
[21]
David G Lowe. 1999. Object recognition from local scale-invariant features. In Computer vision, 1999. The proceedings of the seventh IEEE international conference on, Vol. 2. Ieee, 1150--1157.
[22]
Will Maddern, Michael Milford, and Gordon Wyeth. 2012. CAT-SLAM: probabilistic localisation and mapping using a continuous appearance-based trajectory. The International Journal of Robotics Research 31, 4 (2012), 429--451.
[23]
Francisco-Angel Moreno, Jose-Luis Blanco, and Javier Gonzalez-Jimenez. 2016. A constant-time SLAM back-end in the continuum between global mapping and submapping: application to visual stereo SLAM. The International Journal of Robotics Research 35, 9 (2016), 1036--1056.
[24]
Raúl Mur-Artal and Juan D. Tardós. 2017. ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras. IEEE Transactions on Robotics 33, 5 (2017), 1255--1262.
[25]
Filip Radenovic, Giorgos Tolias, and Ondrej Chum. 2017. Fine-tuning CNN Image Retrieval with No Human Annotation. CoRR abs/1711.02512 (2017). arXiv:1711.02512 http://arxiv.org/abs/1711.02512
[26]
Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. 2011. ORB: An efficient alternative to SIFT or SURF. In Computer Vision (ICCV), 2011 IEEE international conference on. IEEE, 2564--2571.
[27]
Torsten Sattler, Bastian Leibe, and Leif Kobbelt. 2011. Fast image-based localization using direct 2d-to-3d matching. In Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 667--674.
[28]
Torsten Sattler, Will Maddern, Carl Toft, Akihiko Torii, Lars Hammarstrand, Erik Stenborg, Daniel Safari, Masatoshi Okutomi, Marc Pollefeys, Josef Sivic, et al. 2018. Benchmarking 6dof outdoor visual localization in changing conditions. In Proc. CVPR, Vol. 1.
[29]
Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR abs/1409.1556 (2014). arXiv:1409.1556 http://arxiv.org/abs/1409.1556
[30]
Shuran Song, Fisher Yu, Andy Zeng, Angel X Chang, Manolis Savva, and Thomas Funkhouser. 2017. Semantic Scene Completion from a Single Depth Image. Proceedings of 30th IEEE Conference on Computer Vision and Pattern Recognition (2017).
[31]
Christian Szegedy, Sergey Ioffe, and Vincent Vanhoucke. 2016. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. CoRR abs/1602.07261 (2016). arXiv:1602.07261 http://arxiv.org/abs/1602.07261
[32]
Yi Wu, Yuxin Wu, Georgia Gkioxari, and Yuandong Tian. 2018. Building Generalizable Agents with a Realistic and Rich 3D Environment. arXiv preprint arXiv:1801.02209 (2018).
[33]
Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Places: A 10 million Image Database for Scene Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2017).
[34]
Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, and Aude Oliva. 2014. Learning deep features for scene recognition using places database. In Advances in neural information processing systems. 487--495.
[35]
Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. 2017. Learning Transferable Architectures for Scalable Image Recognition. CoRR abs/1707.07012 (2017). arXiv:1707.07012 http://arxiv.org/abs/1707.07012

Cited By

View all
  • (2021)Appearance-Based Sequential Robot Localization Using a Patchwise Approximation of a Descriptor ManifoldSensors10.3390/s2107248321:7(2483)Online publication date: 2-Apr-2021

Index Terms

  1. Experimental study of the suitability of CNN-based holistic descriptors for accurate visual localization

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      APPIS '19: Proceedings of the 2nd International Conference on Applications of Intelligent Systems
      January 2019
      208 pages
      ISBN:9781450360852
      DOI:10.1145/3309772
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 07 January 2019

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. convolutional neural networks
      2. deep learning
      3. equivariance
      4. gaussian process particle filters
      5. holistic descriptors
      6. visual localization

      Qualifiers

      • Research-article

      Conference

      APPIS 2019

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 22 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2021)Appearance-Based Sequential Robot Localization Using a Patchwise Approximation of a Descriptor ManifoldSensors10.3390/s2107248321:7(2483)Online publication date: 2-Apr-2021

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media