Efficient Neighbourhood Consensus Networks via Submanifold Sparse Convolutions

Rocco, Ignacio; Arandjelović, Relja; Sivic, Josef

doi:10.1007/978-3-030-58545-7_35

Ignacio Rocco¹²,
Relja Arandjelović¹³ &
Josef Sivic^12,14

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12354))

Included in the following conference series:

European Conference on Computer Vision

5826 Accesses

Abstract

In this work we target the problem of estimating accurately localized correspondences between a pair of images. We adopt the recent Neighbourhood Consensus Networks that have demonstrated promising performance for difficult correspondence problems and propose modifications to overcome their main limitations: large memory consumption, large inference time and poorly localized correspondences. Our proposed modifications can reduce the memory footprint and execution time more than $10\times $, with equivalent results. This is achieved by sparsifying the correlation tensor containing tentative matches, and its subsequent processing with a 4D CNN using submanifold sparse convolutions. localization accuracy is significantly improved by processing the input images in higher resolution, which is possible due to the reduced memory footprint, and by a novel two-stage correspondence relocalization module. The proposed Sparse-NCNet method obtains state-of-the-art results on the HPatches Sequences and InLoc visual localization benchmarks, and competitive results on the Aachen Day-Night benchmark.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

ECO-TR: Efficient Correspondences Finding via Coarse-to-Fine Refinement

RANSAC-Flow: Generic Two-Stage Image Alignment

Zero-Shot Image Feature Consensus with Deep Functional Maps

References

Arandjelović, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: NetVLAD: CNN architecture for weakly supervised place recognition. In: CVPR (2016)
Google Scholar
Arandjelović, R., Zisserman, A.: Three things everyone should know to improve object retrieval. In: Proceedings CVPR, pp. 2911–2918 (2012)
Google Scholar
Balntas, V., Lenc, K., Vedaldi, A., Mikolajczyk, K.: HPatches: A benchmark and evaluation of handcrafted and learned local descriptors. In: Proceedings CVPR (2017)
Google Scholar
Balntas, V., Hammarstrand, L., Heijnen, H., Kahl, F., Maddern, W., Mikolajczyk, K., et al.: Workshop in long-term visual localization under changing conditions. In: CVPR (2019). https://www.visuallocalization.net/workshop/cvpr/2019/
Balntas, V., Johns, E., Tang, L., Mikolajczyk, K.: PN-Net: Conjoined triple deep network for learning local image descriptors (2016). arXiv preprint arXiv:1601.05030
Balntas, V., Riba, E., Ponsa, D., Mikolajczyk, K.: Learning local feature descriptors with triplets and shallow convolutional neural networks. In: Proceedings BMVC (2016)
Google Scholar
Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, Axel (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_32
Chapter Google Scholar
Bian, J., Lin, W.Y., Matsushita, Y., Yeung, S.K., Nguyen, T.D., Cheng, M.M.: GMS: Grid-based motion statistics for fast, ultra-robust feature correspondence. In: Proceedings CVPR (2017)
Google Scholar
Brachmann, E., Rother, C.: Neural-guided RANSAC: learning where to sample model hypotheses. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4322–4331 (2019)
Google Scholar
Choy, C., Gwak, J., Savarese, S.: 4D Spatio-temporal ConvNets: Minkowski convolutional neural networks. In: Proceedings CVPR (2019)
Google Scholar
Choy, C., Park, J., Koltun, V.: Fully convolutional geometric features. In: Proceedings ICCV (2019)
Google Scholar
DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperPoint: self-supervised interest point detection and description. In: CVPR Workshops (2018)
Google Scholar
Dusmanu, M., et al.: D2-Net: a trainable CNN for joint detection and description of local features. In: Proceedings CVPR (2019)
Google Scholar
Gao, X.S., Hou, X.R., Tang, J., Cheng, H.F.: Complete solution classification for the perspective-three-point problem. IEEE PAMI 25(8), 930–943 (2003)
Article Google Scholar
Germain, H., Bourmaud, G., Lepetit, V.: Sparse-to-dense hypercolumn matching for long-term visual localization. In: 3DV (2019)
Google Scholar
Girshick, R.: Fast R-CNN. In: Proceedings ICCV (2015)
Google Scholar
Gojcic, Z., Zhou, C., Wegner, J.D., Guibas, L.J., Birdal, T.: Learning multiview 3D point cloud registration (2020). arXiv preprint arXiv:2001.05119
Grabner, A., Roth, P.M., Lepetit, V.: 3D pose estimation and 3D model retrieval for objects in the wild. In: Proceedings CVPR (2018)
Google Scholar
Graham, B.: Sparse 3D convolutional neural networks (2015). arXiv preprint arXiv:1505.02890
Graham, B.: Spatially-sparse convolutional neural networks (2014). arXiv preprint arXiv:1409.6070
Graham, B., Engelcke, M., van der Maaten, L.: 3D semantic segmentation with submanifold sparse convolutional networks. In: Proceedings CVPR (2018)
Google Scholar
Han, X., Leung, T., Jia, Y., Sukthankar, R., Berg, A.C.: MatchNet: unifying feature and metric learning for patch-based matching. In: Proceedings CVPR (2015)
Google Scholar
Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUs (2017). arXiv preprint arXiv:1702.08734
Julesz, B.: Towards the automation of binocular depth perception. In: Proceedings IFIP Congress, pp. 439–444 (1962)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)
Google Scholar
Laguna, A.B., Riba, E., Ponsa, D., Mikolajczyk, K.: Key. Net: keypoint detection by handcrafted and learned CNN filters. In: Proceedings ICCV (2019)
Google Scholar
Lenc, K., Vedaldi, A.: Learning covariant feature detectors. In: Hua, G., Jégou, He (eds.) ECCV 2016. LNCS, vol. 9915, pp. 100–117. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_11
Chapter Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)
Article Google Scholar
Marr, D., Poggio, T.: Cooperative computation of stereo disparity. Science 194(4262), 283–287 (1976)
Article Google Scholar
Mikolajczyk, K., Schmid, C.: An affine invariant interest point detector. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 128–142. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-47969-4_9
Chapter Google Scholar
Mikolajczyk, K., et al.: A comparison of affine region detectors. IJCV 65(1–2), 43–72 (2005)
Article Google Scholar
Mishchuk, A., Mishkin, D., Radenovic, F., Matas, J.: Working hard to know your neighbor’s margins: local descriptor learning loss. In: NIPS (2017)
Google Scholar
Mishkin, D., Radenović, F., Matas, J.: Repeatability is not enough: learning discriminative affine regions via discriminability. In: Proceedings ECCV (2018)
Google Scholar
Moo Yi, K., Trulls, E., Ono, Y., Lepetit, V., Salzmann, M., Fua, P.: Learning to find good correspondences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2666–2674 (2018)
Google Scholar
Mori, K.I., Kidode, M., Asada, H.: An iterative prediction and correction method for automatic stereocomparison. Comput. Graph. Image Process. 2(3–4), 393–401 (1973)
Article Google Scholar
Noh, H., Araujo, A., Sim, J., Weyand, T., Han, B.: Large-scale image retrieval with attentive deep local features. In: Proceedings ICCV (2017)
Google Scholar
Ono, Y., Trulls, E., Fua, P., Yi, K.M.: LF-Net: learning local features from images. In: NIPS (2018)
Google Scholar
Oron, S., Dekel, T., Xue, T., Freeman, W.T., Avidan, S.: Best-buddies similarity–robust template matching using mutual nearest neighbors. IEEE PAMI 40(8), 1799–1813 (2017)
Article Google Scholar
Paszke, A., et al.: Automatic differentiation in PyTorch (2017)
Google Scholar
Persson, M., Nordberg, K.: Lambda twist: an accurate fast robust perspective three point (P3P) solver. In: Proceedings ECCV (2018)
Google Scholar
Revaud, J., Weinzaepfel, P., de Souza, C.R., Humenberger, M.: R2D2: repeatable and reliable detector and descriptor. In: NeurIPS (2019)
Google Scholar
Rocco, I., Arandjelović, R., Sivic, J.: Efficient neighbourhood consensus networks via submanifold sparse convolutions (2020). https://arxiv.org/abs/2004.10566
Rocco, I., Arandjelović, R., Sivic, J.: Sparse neighbouhood consensus networks (2020). https://www.di.ens.fr/willow/research/sparse-ncnet/
Rocco, I., Cimpoi, M., Arandjelović, R., Torii, A., Pajdla, T., Sivic, J.: Neighbourhood consensus networks. In: NeurIPS (2018)
Google Scholar
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: An efficient alternative to SIFT or SURF. In: Proceedings ICCV (2011)
Google Scholar
Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: Superglue: learning feature matching with graph neural networks (2019). arXiv preprint arXiv:1911.11763
Sattler, T., et al.: Benchmarking 6DOF outdoor visual localization in changing conditions. In: Proceedings CVPR (2018)
Google Scholar
Schaffalitzky, F., Zisserman, A.: Automated scene matching in movies. In: Lew, M.S., Sebe, N., Eakins, J.P. (eds.) CIVR 2002. LNCS, vol. 2383, pp. 186–197. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45479-9_20
Chapter MATH Google Scholar
Schmid, C., Mohr, R.: Local grayvalue invariants for image retrieval. IEEE PAMI 19(5), 530–535 (1997)
Article Google Scholar
Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Schönberger, J.L., Zheng, E., Frahm, J.-M., Pollefeys, M.: Pixelwise view selection for unstructured multi-view stereo. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 501–518. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_31
Chapter Google Scholar
Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: Proceedings ICCV (2003)
Google Scholar
Taira, H., et al.: InLoc: indoor visual localization with dense matching and view synthesis. In: Proceedings CVPR (2018)
Google Scholar
Tian, Y., Fan, B., Wu, F.: L2-Net: deep learning of discriminative patch descriptor in Euclidean space. In: Proceeding CVPR (2017)
Google Scholar
Torii, A., Arandjelović, R., Sivic, J., Okutomi, M., Pajdla, T.: 24/7 place recognition by view synthesis. In: CVPR (2015)
Google Scholar
Verdie, Y., Yi, K., Fua, P., Lepetit, V.: TILDE: a temporally invariant learned detector. In: Proceedings CVPR (2015)
Google Scholar
Widya, A.R., Torii, A., Okutomi, M.: Structure from motion using dense cnn features with keypoint relocalization. IPSJ Trans. Comput. Vis. Appl. 10(1), 6 (2018)
Article Google Scholar
Yi, K.M., Trulls, E., Lepetit, V., Fua, P.: LIFT: learned invariant feature transform. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 467–483. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_28
Chapter Google Scholar
Zagoruyko, S., Komodakis, N.: Learning to compare image patches via convolutional neural networks. In: Proceedings CVPR (2015)
Google Scholar
Zhang, J., et al.: Learning two-view correspondences and geometry using order-aware network. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5845–5854 (2019)
Google Scholar
Zhang, Z., Deriche, R., Faugeras, O., Luong, Q.T.: A robust technique for matching two uncalibrated images through the recovery of the unknown epipolar geometry. Artif. Intell. 78(1–2), 87–119 (1995)
Article Google Scholar
Zhao, W.L., Jégou, H., Gravier, G.: Oriented pooling for dense and non-dense rotation-invariant features. In: Proceedings BMVC (2013)
Google Scholar
Zhou, H., Sattler, T., Jacobs, D.W.: Evaluating local features for day-night matching. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 724–736. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_60
Chapter Google Scholar

Download references

Acknowledgements

This work was partially supported by the European Regional Development Fund under project IMPACT (reg. no. CZ.02.1.01/0.0/0.0/15 003/0000468), Louis Vuitton ENS Chair on Artificial Intelligence, and the French government under management of Agence Nationale de la Recherche as part of the “Investissements d’avenir” program, reference ANR-19-P3IA-0001 (PRAIRIE 3IA Institute).

Author information

Authors and Affiliations

WILLOW, Inria, DI -ENS, CNRS, PSL Research University, Paris, France
Ignacio Rocco & Josef Sivic
DeepMind, London, UK
Relja Arandjelović
Czech Institute of Informatics, Robotics and Cybernetics, CTU, Prague, Czechia
Josef Sivic

Authors

Ignacio Rocco
View author publications
You can also search for this author in PubMed Google Scholar
Relja Arandjelović
View author publications
You can also search for this author in PubMed Google Scholar
Josef Sivic
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ignacio Rocco .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 14381 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rocco, I., Arandjelović, R., Sivic, J. (2020). Efficient Neighbourhood Consensus Networks via Submanifold Sparse Convolutions. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12354. Springer, Cham. https://doi.org/10.1007/978-3-030-58545-7_35

Download citation

DOI: https://doi.org/10.1007/978-3-030-58545-7_35
Published: 05 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58544-0
Online ISBN: 978-3-030-58545-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics