research-article

Experimental study of the suitability of CNN-based holistic descriptors for accurate visual localization

Authors:

Alberto Jaenal,

Francisco-Angel Moreno,

Javier Gonzalez-JimenezAuthors Info & Claims

APPIS '19: Proceedings of the 2nd International Conference on Applications of Intelligent Systems

Article No.: 28, Pages 1 - 6

https://doi.org/10.1145/3309772.3309800

Published: 07 January 2019 Publication History

Abstract

Holistic Image Descriptors (HIDs) are compact representations of a whole image that, being suitable for Place Recognition, are not appropriate for accurate Visual Localization. The most successful HIDs are those extracted from Convolutional Neural Networks (CNNs) like VGG, ResNet, InceptionV4 or NetVLAD. Very recently, the equivariance property has been proposed to reflect how image 2D transformations (e.g. rotation, flip, scale changes) influence the descriptor [17]. Our work experimentally analyzes whether such property can be a good indicator of the suitability of the existing CNN-based HID for estimating changes in the camera pose, which produces more complex transformations of the image than the pure transformations analyzed in [17]. The results we report here are a preliminary work in the context of an ongoing project towards appearance-based localization of autonomous mobile robots.

References

[1]

Pulkit Agrawal, Joao Carreira, and Jitendra Malik. 2015. Learning to see by moving. In Proceedings of the IEEE International Conference on Computer Vision. 37--45.

Digital Library

[2]

Relja Arandjelovic, Petr Gronat, Akihiko Torii, Tomas Pajdla, and Josef Sivic. 2016. NetVLAD: CNN architecture for weakly supervised place recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5297--5307.

[3]

Vassileios Balntas, Shuda Li, and Victor Prisacariu. 2018. RelocNet: Continuous Metric Learning Relocalisation using Neural Nets. In Proceedings of the European Conference on Computer Vision (ECCV). 751--767.

Digital Library

[4]

Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. 2006. Surf: Speeded up robust features. In European conference on computer vision. Springer, 404--417.

Digital Library

[5]

Titus Cieslewski, Siddharth Choudhary, and Davide Scaramuzza. 2018. Data-efficient decentralized visual SLAM. In 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2466--2473.

Digital Library

[6]

Mark Cummins and Paul Newman. 2008. FAB-MAP: Probabilistic localization and mapping in the space of appearance. The International Journal of Robotics Research 27, 6 (2008), 647--665.

Digital Library

[7]

Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In Conference on Computer Vision and Pattern Recognition (CVPR).

Digital Library

[8]

Ruben Gomez-Ojeda, Manuel Lopez-Antequera, Nicolai Petkov, and Javier González Jiménez. 2015. Training a Convolutional Neural Network for Appearance-Invariant Place Recognition. CoRR abs/1505.07428 (2015). arXiv:1505.07428 http://arxiv.org/abs/1505.07428

[9]

Ruben Gomez-Ojeda, David Zuñiga-Noël, Francisco-Angel Moreno, Davide Scaramuzza, and Javier Gonzalez-Jimenez. 2017. PL-SLAM: a Stereo SLAM System through the Combination of Points and Line Segments. arXiv preprint arXiv:1705.09479 (2017).

[10]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. CoRR abs/1512.03385 (2015). arXiv:1512.03385 http://arxiv.org/abs/1512.03385

[11]

Minyoung Huh, Pulkit Agrawal, and Alexei A Efros. 2016. What makes ImageNet good for transfer learning? arXiv preprint arXiv:1608.08614 (2016).

[12]

Robert Huitl, Georg Schroth, Sebastian Hilsenbeck, Florian Schweiger, and Eckehard Steinbach. 2012. TUMindoor: An extensive image and point cloud dataset for visual indoor localization and mapping. In Image Processing (ICIP), 2012 19th IEEE International Conference on. IEEE, 1773--1776.

[13]

Dinesh Jayaraman and Kristen Grauman. 2015. Learning image representations tied to ego-motion. In Proceedings of the IEEE International Conference on Computer Vision. 1413--1421.

Digital Library

[14]

Herve Jegou, Florent Perronnin, Matthijs Douze, Jorge Sánchez, Patrick Perez, and Cordelia Schmid. 2012. Aggregating local image descriptors into compact codes. IEEE transactions on pattern analysis and machine intelligence 34, 9 (2012), 1704--1716.

Digital Library

[15]

Alex Kendall, Matthew Grimes, and Roberto Cipolla. 2015. Posenet: A convolutional network for real-time 6-dof camera relocalization. In Proceedings of the IEEE international conference on computer vision. 2938--2946.

Digital Library

[16]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey EHinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.

Digital Library

[17]

Karel Lenc and Andrea Vedaldi. 2015. Understanding image representations by measuring their equivariance and equivalence. In Proceedings of the IEEE conference on computer vision and pattern recognition. 991--999.

[18]

Manuel Lopez-Antequera, Ruben Gomez-Ojeda, Nicolai Petkov, and Javier Gonzalez-Jimenez. 2017. Appearance-invariant place recognition by discrimi-natively training a convolutional neural network. Pattern Recognition Letters 92 (2017), 89--95.

Digital Library

[19]

Manuel Lopez-Antequera, Nicolai Petkov, and Javier Gonzalez-Jimenez. 2016. Image-based localization using Gaussian processes. In Indoor Positioning and Indoor Navigation (IPIN), 2016 International Conference on. Winner, Best Paper award.

[20]

Manuel Lopez-Antequera, Nicolai Petkov, and Javier Gonzalez-Jimenez. 2017. City-scale continuous visual localization. In Mobile Robots (ECMR), 2017 European Conference on. IEEE, 1--6.

[21]

David G Lowe. 1999. Object recognition from local scale-invariant features. In Computer vision, 1999. The proceedings of the seventh IEEE international conference on, Vol. 2. Ieee, 1150--1157.

Digital Library

[22]

Will Maddern, Michael Milford, and Gordon Wyeth. 2012. CAT-SLAM: probabilistic localisation and mapping using a continuous appearance-based trajectory. The International Journal of Robotics Research 31, 4 (2012), 429--451.

Digital Library

[23]

Francisco-Angel Moreno, Jose-Luis Blanco, and Javier Gonzalez-Jimenez. 2016. A constant-time SLAM back-end in the continuum between global mapping and submapping: application to visual stereo SLAM. The International Journal of Robotics Research 35, 9 (2016), 1036--1056.

Digital Library

[24]

Raúl Mur-Artal and Juan D. Tardós. 2017. ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras. IEEE Transactions on Robotics 33, 5 (2017), 1255--1262.

Digital Library

[25]

Filip Radenovic, Giorgos Tolias, and Ondrej Chum. 2017. Fine-tuning CNN Image Retrieval with No Human Annotation. CoRR abs/1711.02512 (2017). arXiv:1711.02512 http://arxiv.org/abs/1711.02512

[26]

Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. 2011. ORB: An efficient alternative to SIFT or SURF. In Computer Vision (ICCV), 2011 IEEE international conference on. IEEE, 2564--2571.

Digital Library

[27]

Torsten Sattler, Bastian Leibe, and Leif Kobbelt. 2011. Fast image-based localization using direct 2d-to-3d matching. In Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 667--674.

Digital Library

[28]

Torsten Sattler, Will Maddern, Carl Toft, Akihiko Torii, Lars Hammarstrand, Erik Stenborg, Daniel Safari, Masatoshi Okutomi, Marc Pollefeys, Josef Sivic, et al. 2018. Benchmarking 6dof outdoor visual localization in changing conditions. In Proc. CVPR, Vol. 1.

[29]

Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR abs/1409.1556 (2014). arXiv:1409.1556 http://arxiv.org/abs/1409.1556

[30]

Shuran Song, Fisher Yu, Andy Zeng, Angel X Chang, Manolis Savva, and Thomas Funkhouser. 2017. Semantic Scene Completion from a Single Depth Image. Proceedings of 30th IEEE Conference on Computer Vision and Pattern Recognition (2017).

[31]

Christian Szegedy, Sergey Ioffe, and Vincent Vanhoucke. 2016. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. CoRR abs/1602.07261 (2016). arXiv:1602.07261 http://arxiv.org/abs/1602.07261

[32]

Yi Wu, Yuxin Wu, Georgia Gkioxari, and Yuandong Tian. 2018. Building Generalizable Agents with a Realistic and Rich 3D Environment. arXiv preprint arXiv:1801.02209 (2018).

[33]

Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Places: A 10 million Image Database for Scene Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2017).

[34]

Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, and Aude Oliva. 2014. Learning deep features for scene recognition using places database. In Advances in neural information processing systems. 487--495.

Digital Library

[35]

Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. 2017. Learning Transferable Architectures for Scalable Image Recognition. CoRR abs/1707.07012 (2017). arXiv:1707.07012 http://arxiv.org/abs/1707.07012

Cited By

Jaenal AMoreno FGonzalez-Jimenez J(2021)Appearance-Based Sequential Robot Localization Using a Patchwise Approximation of a Descriptor ManifoldSensors10.3390/s2107248321:7(2483)Online publication date: 2-Apr-2021
https://doi.org/10.3390/s21072483

Index Terms

Experimental study of the suitability of CNN-based holistic descriptors for accurate visual localization
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Image representations
      2. Computer vision tasks
        Vision for robotics

Recommendations

Towards dropout training for convolutional neural networks

Recently, dropout has seen increasing use in deep learning. For deep convolutional neural networks, dropout is known to work well in fully-connected layers. However, its effect in convolutional and pooling layers is still not clear. This paper ...
A novel path planning method for biomimetic robot based on deep learning

PurposeThis paper aims to design a multi-layer convolutional neural network CNN to solve biomimetic robot path planning problem.Design/methodology/approachAt first, the convolution kernel with different scales can be obtained by using the sparse auto ...
Convolutional neural networks for wavelet domain super resolution

Proposed a super resolution method with higher reconstruction accuracy than before.Cast super resolution as a problem of estimating sparse wavelet detail coefficients.Estimated sparse wavelet coefficients using a convolutional neural network (CNN)...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

APPIS '19: Proceedings of the 2nd International Conference on Applications of Intelligent Systems

January 2019

208 pages

ISBN:9781450360852

DOI:10.1145/3309772

Conference Chairs:
Nicolai Petkov
University of Groningen, Netherlands
,
Nicola Strisciuglio
University of Groningen, Netherlands
,
Carlos M. Travieso
University of Las Palmas de Gran Canaria, Spain

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 January 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

APPIS 2019

APPIS 2019: 2nd International Conference on Applications of Intelligent Systems

January 7 - 9, 2019

Las Palmas de Gran Canaria, Spain

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
54
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Jaenal AMoreno FGonzalez-Jimenez J(2021)Appearance-Based Sequential Robot Localization Using a Patchwise Approximation of a Descriptor ManifoldSensors10.3390/s2107248321:7(2483)Online publication date: 2-Apr-2021
https://doi.org/10.3390/s21072483

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents