Abstract
We propose an image alignment algorithm based on weak supervision, which aims to identify the correspondence between a pair of reference and target images with no supervision of individual pixels. Since most existing methods have relied on a predefined geometric model such as homography, they often suffer from a lack of model flexibility and generalizability. To tackle the challenge, we propose a novel nonparametric transformation model based on graph convolutional networks without an explicit geometric constraint. The proposed method is generic and flexible in the sense that it is applicable to the image pairs undergoing diverse local and/or global transformations. To make the algorithm more suitable for real-world scenarios having potential noises from moving objects, we disregard those objects with an off-the-shelf semantic segmentation model. The proposed algorithm is evaluated on the Cityscapes dataset with annotated pixel-level correspondences and outperforms baseline methods relying on global parametric transformations.
Similar content being viewed by others
Notes
Refer to Appendix A in the supplementary document for the list of images used for the construction of test sets.
We set the threshold to 60 pixels, which is a reasonable choice for \(512 \times \) 384 images used in our experiment.
References
Alcantarilla, P.F., Solutions, T.: Fast explicit diffusion for accelerated features in nonlinear scale spaces. In: British Machine Vision Conference (2013)
Balntas, V., Johns, E., Tang, L., Mikolajczyk, K.: PN-Net: conjoined triple deep network for learning local image descriptors. arXiv preprint arXiv:1601.05030 (2016)
Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: European Conference on Computer Vision (2006)
Brown, M., Lowe, D.G.: Automatic panoramic image stitching using invariant features. Int. J. Comput. Vis. 74, 59–73 (2007)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition (2005)
DeTone, D., Malisiewicz, T., Rabinovich, A.: Superpoint: Self-supervised interest point detection and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (2018)
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 381–395 (1981)
Han, X., Leung, T., Jia, Y., Sukthankar, R., Berg, A.C.: Matchnet: Unifying feature and metric learning for patch-based matching. In: IEEE Conference on Computer Vision and Pattern Recognition (2015)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)
Horn, B.K., Schunck, B.G.: Determining optical flow. Artif. Intell. 17, 85–203 (1981)
Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. Adv. Neural Inform. Process. Syst. (2015)
Kanade, T., Okutomi, M.: A stereo matching algorithm with an adaptive window: theory and experiment. IEEE Trans. Pattern Anal. Mach. Intell. 16, 920–932 (1994)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations (2017)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)
Noh, H., Araujo, A., Sim, J., Weyand, T., Han, B.: Large-scale image retrieval with attentive deep local features. In: International Conference on Computer Vision (2017)
Raguram, R., Frahm, J.M., Pollefeys, M.: A comparative analysis of ransac techniques leading to adaptive real-time random sample consensus. In: European Conference on Computer Vision (2008)
Rocco, I., Arandjelović, R., Sivic, J.: End-to-end weakly-supervised semantic alignment. In: IEEE Conference on Computer Vision and Pattern Recognition (2018)
Rocco, I., Cimpoi, M., Arandjelovi’c, R., Torii, A., Pajdla, T., Sivic, J.: Neighbourhood consensus networks. In: Advances in Neural Information Processing Systems (2018)
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: Orb: an efficient alternative to sift or surf. In: International Conference on Computer Vision (2011)
Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: Superglue: learning feature matching with graph neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (2020)
Yi, K.M., Trulls, E., Lepetit, V., Fua, P.: Lift: Learned invariant feature transform. In: European Conference on Computer Vision (2016)
Zhang, J., Wang, C., Liu, S., Jia, L., Ye, N., Wang, J., Zhou, J., Sun, J.: Content-aware unsupervised deep homography estimation. In: European Conference on Computer Vision (2020)
Acknowledgements
This work was conducted by Center for Applied Research in Artificial Intelligence (CARAI) grant funded by Defense Acquisition Program Administration (DAPA) and Agency for Defense Development (ADD) (UD190031RD).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kim, M., Chu, S. & Han, B. Beyond homography: nonparametric image alignment via graph convolutional networks. Machine Vision and Applications 33, 89 (2022). https://doi.org/10.1007/s00138-022-01331-9
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00138-022-01331-9