Abstract
In this work we target the problem of estimating accurately localized correspondences between a pair of images. We adopt the recent Neighbourhood Consensus Networks that have demonstrated promising performance for difficult correspondence problems and propose modifications to overcome their main limitations: large memory consumption, large inference time and poorly localized correspondences. Our proposed modifications can reduce the memory footprint and execution time more than \(10\times \), with equivalent results. This is achieved by sparsifying the correlation tensor containing tentative matches, and its subsequent processing with a 4D CNN using submanifold sparse convolutions. localization accuracy is significantly improved by processing the input images in higher resolution, which is possible due to the reduced memory footprint, and by a novel two-stage correspondence relocalization module. The proposed Sparse-NCNet method obtains state-of-the-art results on the HPatches Sequences and InLoc visual localization benchmarks, and competitive results on the Aachen Day-Night benchmark.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Arandjelović, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: NetVLAD: CNN architecture for weakly supervised place recognition. In: CVPR (2016)
Arandjelović, R., Zisserman, A.: Three things everyone should know to improve object retrieval. In: Proceedings CVPR, pp. 2911–2918 (2012)
Balntas, V., Lenc, K., Vedaldi, A., Mikolajczyk, K.: HPatches: A benchmark and evaluation of handcrafted and learned local descriptors. In: Proceedings CVPR (2017)
Balntas, V., Hammarstrand, L., Heijnen, H., Kahl, F., Maddern, W., Mikolajczyk, K., et al.: Workshop in long-term visual localization under changing conditions. In: CVPR (2019). https://www.visuallocalization.net/workshop/cvpr/2019/
Balntas, V., Johns, E., Tang, L., Mikolajczyk, K.: PN-Net: Conjoined triple deep network for learning local image descriptors (2016). arXiv preprint arXiv:1601.05030
Balntas, V., Riba, E., Ponsa, D., Mikolajczyk, K.: Learning local feature descriptors with triplets and shallow convolutional neural networks. In: Proceedings BMVC (2016)
Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, Axel (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_32
Bian, J., Lin, W.Y., Matsushita, Y., Yeung, S.K., Nguyen, T.D., Cheng, M.M.: GMS: Grid-based motion statistics for fast, ultra-robust feature correspondence. In: Proceedings CVPR (2017)
Brachmann, E., Rother, C.: Neural-guided RANSAC: learning where to sample model hypotheses. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4322–4331 (2019)
Choy, C., Gwak, J., Savarese, S.: 4D Spatio-temporal ConvNets: Minkowski convolutional neural networks. In: Proceedings CVPR (2019)
Choy, C., Park, J., Koltun, V.: Fully convolutional geometric features. In: Proceedings ICCV (2019)
DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperPoint: self-supervised interest point detection and description. In: CVPR Workshops (2018)
Dusmanu, M., et al.: D2-Net: a trainable CNN for joint detection and description of local features. In: Proceedings CVPR (2019)
Gao, X.S., Hou, X.R., Tang, J., Cheng, H.F.: Complete solution classification for the perspective-three-point problem. IEEE PAMI 25(8), 930–943 (2003)
Germain, H., Bourmaud, G., Lepetit, V.: Sparse-to-dense hypercolumn matching for long-term visual localization. In: 3DV (2019)
Girshick, R.: Fast R-CNN. In: Proceedings ICCV (2015)
Gojcic, Z., Zhou, C., Wegner, J.D., Guibas, L.J., Birdal, T.: Learning multiview 3D point cloud registration (2020). arXiv preprint arXiv:2001.05119
Grabner, A., Roth, P.M., Lepetit, V.: 3D pose estimation and 3D model retrieval for objects in the wild. In: Proceedings CVPR (2018)
Graham, B.: Sparse 3D convolutional neural networks (2015). arXiv preprint arXiv:1505.02890
Graham, B.: Spatially-sparse convolutional neural networks (2014). arXiv preprint arXiv:1409.6070
Graham, B., Engelcke, M., van der Maaten, L.: 3D semantic segmentation with submanifold sparse convolutional networks. In: Proceedings CVPR (2018)
Han, X., Leung, T., Jia, Y., Sukthankar, R., Berg, A.C.: MatchNet: unifying feature and metric learning for patch-based matching. In: Proceedings CVPR (2015)
Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUs (2017). arXiv preprint arXiv:1702.08734
Julesz, B.: Towards the automation of binocular depth perception. In: Proceedings IFIP Congress, pp. 439–444 (1962)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)
Laguna, A.B., Riba, E., Ponsa, D., Mikolajczyk, K.: Key. Net: keypoint detection by handcrafted and learned CNN filters. In: Proceedings ICCV (2019)
Lenc, K., Vedaldi, A.: Learning covariant feature detectors. In: Hua, G., Jégou, He (eds.) ECCV 2016. LNCS, vol. 9915, pp. 100–117. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_11
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)
Marr, D., Poggio, T.: Cooperative computation of stereo disparity. Science 194(4262), 283–287 (1976)
Mikolajczyk, K., Schmid, C.: An affine invariant interest point detector. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 128–142. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-47969-4_9
Mikolajczyk, K., et al.: A comparison of affine region detectors. IJCV 65(1–2), 43–72 (2005)
Mishchuk, A., Mishkin, D., Radenovic, F., Matas, J.: Working hard to know your neighbor’s margins: local descriptor learning loss. In: NIPS (2017)
Mishkin, D., Radenović, F., Matas, J.: Repeatability is not enough: learning discriminative affine regions via discriminability. In: Proceedings ECCV (2018)
Moo Yi, K., Trulls, E., Ono, Y., Lepetit, V., Salzmann, M., Fua, P.: Learning to find good correspondences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2666–2674 (2018)
Mori, K.I., Kidode, M., Asada, H.: An iterative prediction and correction method for automatic stereocomparison. Comput. Graph. Image Process. 2(3–4), 393–401 (1973)
Noh, H., Araujo, A., Sim, J., Weyand, T., Han, B.: Large-scale image retrieval with attentive deep local features. In: Proceedings ICCV (2017)
Ono, Y., Trulls, E., Fua, P., Yi, K.M.: LF-Net: learning local features from images. In: NIPS (2018)
Oron, S., Dekel, T., Xue, T., Freeman, W.T., Avidan, S.: Best-buddies similarity–robust template matching using mutual nearest neighbors. IEEE PAMI 40(8), 1799–1813 (2017)
Paszke, A., et al.: Automatic differentiation in PyTorch (2017)
Persson, M., Nordberg, K.: Lambda twist: an accurate fast robust perspective three point (P3P) solver. In: Proceedings ECCV (2018)
Revaud, J., Weinzaepfel, P., de Souza, C.R., Humenberger, M.: R2D2: repeatable and reliable detector and descriptor. In: NeurIPS (2019)
Rocco, I., Arandjelović, R., Sivic, J.: Efficient neighbourhood consensus networks via submanifold sparse convolutions (2020). https://arxiv.org/abs/2004.10566
Rocco, I., Arandjelović, R., Sivic, J.: Sparse neighbouhood consensus networks (2020). https://www.di.ens.fr/willow/research/sparse-ncnet/
Rocco, I., Cimpoi, M., Arandjelović, R., Torii, A., Pajdla, T., Sivic, J.: Neighbourhood consensus networks. In: NeurIPS (2018)
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: An efficient alternative to SIFT or SURF. In: Proceedings ICCV (2011)
Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: Superglue: learning feature matching with graph neural networks (2019). arXiv preprint arXiv:1911.11763
Sattler, T., et al.: Benchmarking 6DOF outdoor visual localization in changing conditions. In: Proceedings CVPR (2018)
Schaffalitzky, F., Zisserman, A.: Automated scene matching in movies. In: Lew, M.S., Sebe, N., Eakins, J.P. (eds.) CIVR 2002. LNCS, vol. 2383, pp. 186–197. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45479-9_20
Schmid, C., Mohr, R.: Local grayvalue invariants for image retrieval. IEEE PAMI 19(5), 530–535 (1997)
Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Schönberger, J.L., Zheng, E., Frahm, J.-M., Pollefeys, M.: Pixelwise view selection for unstructured multi-view stereo. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 501–518. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_31
Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: Proceedings ICCV (2003)
Taira, H., et al.: InLoc: indoor visual localization with dense matching and view synthesis. In: Proceedings CVPR (2018)
Tian, Y., Fan, B., Wu, F.: L2-Net: deep learning of discriminative patch descriptor in Euclidean space. In: Proceeding CVPR (2017)
Torii, A., Arandjelović, R., Sivic, J., Okutomi, M., Pajdla, T.: 24/7 place recognition by view synthesis. In: CVPR (2015)
Verdie, Y., Yi, K., Fua, P., Lepetit, V.: TILDE: a temporally invariant learned detector. In: Proceedings CVPR (2015)
Widya, A.R., Torii, A., Okutomi, M.: Structure from motion using dense cnn features with keypoint relocalization. IPSJ Trans. Comput. Vis. Appl. 10(1), 6 (2018)
Yi, K.M., Trulls, E., Lepetit, V., Fua, P.: LIFT: learned invariant feature transform. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 467–483. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_28
Zagoruyko, S., Komodakis, N.: Learning to compare image patches via convolutional neural networks. In: Proceedings CVPR (2015)
Zhang, J., et al.: Learning two-view correspondences and geometry using order-aware network. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5845–5854 (2019)
Zhang, Z., Deriche, R., Faugeras, O., Luong, Q.T.: A robust technique for matching two uncalibrated images through the recovery of the unknown epipolar geometry. Artif. Intell. 78(1–2), 87–119 (1995)
Zhao, W.L., Jégou, H., Gravier, G.: Oriented pooling for dense and non-dense rotation-invariant features. In: Proceedings BMVC (2013)
Zhou, H., Sattler, T., Jacobs, D.W.: Evaluating local features for day-night matching. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 724–736. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_60
Acknowledgements
This work was partially supported by the European Regional Development Fund under project IMPACT (reg. no. CZ.02.1.01/0.0/0.0/15 003/0000468), Louis Vuitton ENS Chair on Artificial Intelligence, and the French government under management of Agence Nationale de la Recherche as part of the “Investissements d’avenir” program, reference ANR-19-P3IA-0001 (PRAIRIE 3IA Institute).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Rocco, I., Arandjelović, R., Sivic, J. (2020). Efficient Neighbourhood Consensus Networks via Submanifold Sparse Convolutions. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12354. Springer, Cham. https://doi.org/10.1007/978-3-030-58545-7_35
Download citation
DOI: https://doi.org/10.1007/978-3-030-58545-7_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58544-0
Online ISBN: 978-3-030-58545-7
eBook Packages: Computer ScienceComputer Science (R0)