SC-wLS: Towards Interpretable Feed-forward Camera Re-localization

Wu, Xin; Zhao, Hao; Li, Shunkai; Cao, Yingdian; Zha, Hongbin

doi:10.1007/978-3-031-19769-7_34

Xin Wu^12,13,
Hao Zhao^12,14,
Shunkai Li¹⁵,
Yingdian Cao^12,13 &
…
Hongbin Zha^12,13

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13661))

Included in the following conference series:

European Conference on Computer Vision

4046 Accesses
7 Citations

Abstract

Visual re-localization aims to recover camera poses in a known environment, which is vital for applications like robotics or augmented reality. Feed-forward absolute camera pose regression methods directly output poses by a network, but suffer from low accuracy. Meanwhile, scene coordinate based methods are accurate, but need iterative RANSAC post-processing, which brings challenges to efficient end-to-end training and inference. In order to have the best of both worlds, we propose a feed-forward method termed SC-wLS that exploits all scene coordinate estimates for weighted least squares pose regression. This differentiable formulation exploits a weight network imposed on 2D-3D correspondences, and requires pose supervision only. Qualitative results demonstrate the interpretability of learned weights. Evaluations on 7Scenes and Cambridge datasets show significantly promoted performance when compared with former feed-forward counterparts. Moreover, our SC-wLS method enables a new capability: self-supervised test-time adaptation on the weight network. Codes and models are publicly available.

X. Wu and H. Zhao—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Insights on Evaluation of Camera Re-localization Using Relative Pose Regression

RelMobNet: End-to-End Relative Camera Pose Estimation Using a Robust Two-Stage Training

Learning Neural Volumetric Pose Features for Camera Localization

References

Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: NetVLAD: CNN architecture for weakly supervised place recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5297–5307 (2016)
Google Scholar
Brachmann, E., Humenberger, M., Rother, C., Sattler, T.: On the limits of pseudo ground truth in visual camera re-localisation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6218–6228 (2021)
Google Scholar
Brachmann, E., Krull, A., Michel, F., Gumhold, S., Shotton, J., Rother, C.: Learning 6D object pose estimation using 3D object coordinates. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 536–551. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10605-2_35
Chapter Google Scholar
Brachmann, E., et al.: DSAC-differentiable RANSAC for camera localization. In: CVPR (2017)
Google Scholar
Brachmann, E., Rother, C.: Learning less is more - 6D camera localization via 3D surface regression. In: CVPR (2018)
Google Scholar
Brachmann, E., Rother, C.: Neural-guided RANSAC: learning where to sample model hypotheses. In: ICCV (2019)
Google Scholar
Brachmann, E., Rother, C.: Visual camera re-localization from RGB and RGB-D images using DSAC. IEEE Trans. Pattern Anal. Mach. Intell. (2021)
Google Scholar
Brahmbhatt, S., Gu, J., Kim, K., Hays, J., Kautz, J.: Geometry-aware learning of maps for camera localization. In: CVPR (2018)
Google Scholar
Cai, M., Shen, C., Reid, I.: A hybrid probabilistic model for camera relocalization (2019)
Google Scholar
Cao, S., Snavely, N.: Graph-based discriminative learning for location recognition. In: IJCV (2015)
Google Scholar
Choi, S., Kim, T., Yu, W.: Performance evaluation of RANSAC family. J. Comput. Vision 24(3), 271–300 (1997)
Article Google Scholar
Dang, Z., Yi, K.M., Hu, Y., Wang, F., Fua, P., Salzmann, M.: Eigendecomposition-free training of deep networks for linear least-square problems. TPAMI (2020)
Google Scholar
Ding, M., Wang, Z., Sun, J., Shi, J., Luo, P.: CamNet: coarse-to-fine retrieval for camera re-localization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2871–2880 (2019)
Google Scholar
Gould, S., Hartley, R., Campbell, D.J.: Deep declarative networks. TPAMI (2021)
Google Scholar
Hartley, R., Zisserman, A.: Multiple view geometry in computer vision: N-view geometry (2004)
Google Scholar
Hays, J., Efros, A.A.: IM2GPS: estimating geographic information from a single image. In: CVPR (2008)
Google Scholar
Hedau, V., Hoiem, D., Forsyth, D.: Recovering the spatial layout of cluttered rooms. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 1849–1856. IEEE (2009)
Google Scholar
Hirzer, M., Lepetit, V., Roth, P.: Smart hypothesis generation for efficient and robust room layout estimation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2912–2920 (2020)
Google Scholar
Huang, Z., Xu, Y., Shi, J., Zhou, X., Bao, H., Zhang, G.: Prior guided dropout for robust visual localization in dynamic environments. In: ICCV (2019)
Google Scholar
Ionescu, C., Vantzos, O., Sminchisescu, C.: Matrix backpropagation for deep networks with structured layers. In: ICCV (2015)
Google Scholar
Kendall, A., Cipolla, R.: Geometric loss functions for camera pose regression with deep learning. In: CVPR (2017)
Google Scholar
Kendall, A., Grimes, M., Cipolla, R.: PoseNet: a convolutional network for real-time 6-DOF camera relocalization. In: ICCV (2015)
Google Scholar
Lepetit, V., Fua, P., et al.: Monocular model-based 3D tracking of rigid objects: a survey. Found. Trends® Comput. Graph. Vision 1(1), 1–89 (2005)
Google Scholar
Li, S., Xu, C., Xie, M.: A robust o (n) solution to the perspective-n-point problem. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1444–1450 (2012)
Article Google Scholar
Li, S., Wu, X., Cao, Y., Zha, H.: Generalizing to the open world: deep visual odometry with online adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13184–13193 (2021)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV (2004)
Google Scholar
Mair, E., Strobl, K.H., Suppa, M., Burschka, D.: Efficient camera-based pose estimation for real-time applications. In: 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2696–2703. IEEE (2009)
Google Scholar
Meng, L., Tung, F., Little, J.J., Valentin, J., de Silva, C.W.: Exploiting points and lines in regression forests for RGB-D camera relocalization. In: IROS (2018)
Google Scholar
Naseer, T., Burgard, W.: Deep regression for monocular camera-based 6-DOF global localization in outdoor environments. In: IROS (2017)
Google Scholar
Ranftl, R., Koltun, V.: Deep fundamental matrix estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 292–309. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_18
Chapter Google Scholar
Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: Superglue: learning feature matching with graph neural networks. In: CVPR (2020)
Google Scholar
Sattler, T., Leibe, B., Kobbelt, L.: Efficient & effective prioritized matching for large-scale image-based localization. TPAMI (2016)
Google Scholar
Sattler, T., Zhou, Q., Pollefeys, M., Leal-Taixe, L.: Understanding the limitations of CNN-based absolute camera pose regression. In: CVPR (2019)
Google Scholar
Schindler, G., Brown, M., Szeliski, R.: City-scale location recognition. In: CVPR (2007)
Google Scholar
Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4104–4113 (2016)
Google Scholar
Schönemann, P.H.: A generalized solution of the orthogonal procrustes problem. Psychometrika 31(1), 1–10 (1966)
Article MathSciNet Google Scholar
Shavit, Y., Ferens, R., Keller, Y.: Learning multi-scene absolute pose regression with transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2733–2742 (2021)
Google Scholar
Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., Fitzgibbon, A.: Scene coordinate regression forests for camera relocalization in RGB-D images. In: CVPR (2013)
Google Scholar
Sivic, J., Zisserman, A.: Video Google: a text retrieval approach to object matching in videos. In: ICCV (2003)
Google Scholar
Valentin, J., Nießner, M., Shotton, J., Fitzgibbon, A., Izadi, S., Torr, P.H.: Exploiting uncertainty in regression forests for accurate camera relocalization. In: CVPR (2015)
Google Scholar
Vaswani, A., et al.: Attention is all you need (2017)
Google Scholar
Walch, F., Hazirbas, C., Leal-Taixe, L., Sattler, T., Hilsenbeck, S., Cremers, D.: Image-based localization using LSTMs for structured feature correlation. In: ICCV (2017)
Google Scholar
Wang, B., Chen, C., Lu, C.X., Zhao, P., Trigoni, N., Markham, A.: Atloc: attention guided camera localization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 10393–10401 (2020)
Google Scholar
Wang, H., Sridhar, S., Huang, J., Valentin, J., Song, S., Guibas, L.J.: Normalized object coordinate space for category-level 6d object pose and size estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2642–2651 (2019)
Google Scholar
Wang, X., Wang, X., Wang, C., Bai, X., Wu, J., Hancock, E.R.: Discriminative features matter: multi-layer bilinear pooling for camera localization. In: BMVC (2019)
Google Scholar
Wu, C.: Towards linear-time incremental structure from motion. In: 3DV (2013)
Google Scholar
Wu, J., Ma, L., Hu, X.: Delving deeper into convolutional neural networks for camera relocalization. In: ICRA (2017)
Google Scholar
Xue, F., Wang, X., Yan, Z., Wang, Q., Wang, J., Zha, H.: Local supports global: deep camera relocalization with sequence enhancement. In: ICCV (2019)
Google Scholar
Xue, F., Wu, X., Cai, S., Wang, J.: Learning multi-view camera relocalization with graph neural networks. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11372–11381. IEEE (2020)
Google Scholar
Yan, C., Shao, B., Zhao, H., Ning, R., Zhang, Y., Xu, F.: 3D room layout estimation from a single RGB image. IEEE Trans. Multimedia 22(11), 3014–3024 (2020)
Article Google Scholar
Yi, K., Trulls, E., Ono, Y., Lepetit, V., Salzmann, M., Fua, P.: Learning to find good correspondences. In: CVPR (2018)
Google Scholar
Zhang, J., et al.: Learning two-view correspondences and geometry using order-aware network. In: ICCV (2019)
Google Scholar
Zhang, W., Kosecka, J.: Image based localization in urban environments. In: 3DPTV (2006)
Google Scholar
Zhao, H., Lu, M., Yao, A., Guo, Y., Chen, Y., Zhang, L.: Physics inspired optimization on semantic transfer features: an alternative method for room layout estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10–18 (2017)
Google Scholar
Zhong, L., et al.: Seeing through the occluders: robust monocular 6-DOF object pose tracking via model-guided video object segmentation. IEEE Robot. Autom. Lett. 5(4), 5159–5166 (2020)
Article Google Scholar
Zhou, L., et al.: KfNet: learning temporal camera relocalization using Kalman filtering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4919–4928 (2020)
Google Scholar
Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: CVPR (2017)
Google Scholar
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004). https://doi.org/10.1109/TIP.2003.819861

Download references

Acknowledgement

This work was supported by the National Natural Science Foundation of China under Grant 62176010.

Author information

Authors and Affiliations

Key Laboratory of Machine Perception (MOE), School of AI, Peking University, China, China
Xin Wu, Hao Zhao, Yingdian Cao & Hongbin Zha
PKU-SenseTime Machine Vision Joint Lab, China, China
Xin Wu, Yingdian Cao & Hongbin Zha
Intel Labs China, Beijing, China
Hao Zhao
Kuaishou Technology, Beijing, China
Shunkai Li

Authors

Xin Wu
View author publications
You can also search for this author in PubMed Google Scholar
Hao Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Shunkai Li
View author publications
You can also search for this author in PubMed Google Scholar
Yingdian Cao
View author publications
You can also search for this author in PubMed Google Scholar
Hongbin Zha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xin Wu .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1612 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, X., Zhao, H., Li, S., Cao, Y., Zha, H. (2022). SC-wLS: Towards Interpretable Feed-forward Camera Re-localization. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13661. Springer, Cham. https://doi.org/10.1007/978-3-031-19769-7_34

Download citation

DOI: https://doi.org/10.1007/978-3-031-19769-7_34
Published: 23 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19768-0
Online ISBN: 978-3-031-19769-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics