Abstract
This article studies the correspondence problem for semantically similar images, which is challenging due to the joint visual and geometric deformations. We introduce the Flip-aware Distance Ratio method (FDR) to solve this problem from the perspective of geometric structure analysis. First, a distance ratio constraint is introduced to enforce the geometric consistencies between images with large visual variations, whereas local geometric jitters are tolerated via a smoothness term. For challenging cases with symmetric structures, our proposed method exploits Curl to suppress the mismatches. Subsequently, image correspondence is formulated as a permutation problem, for which we propose a Gradient Guided Simulated Annealing (GGSA) algorithm to perform a robust discrete optimization. Experiments on simulated and real-world datasets, where both visual and geometric deformations are present, indicate that our method significantly improves the baselines for both visually and semantically similar images.
- Manya V. Afonso, Jacinto C. Nascimento, and Jorge S. Marques. 2013. Automatic estimation of multiple motion fields from video sequences using a region matching based approach. IEEE Transactions on Multimedia 16, 1 (2013), 1–14.Google ScholarCross Ref
- Xaro Benavent, Ana Garcia-Serrano, Ruben Granados, Joan Benavent, and Esther de Ves. 2013. Multimedia information retrieval based on late semantic fusion approaches: Experiments on a Wikipedia image collection. IEEE Transactions on Multimedia 15, 8 (2013), 2009–2021. Google ScholarDigital Library
- Alexander C. Berg, Tamara L. Berg, and Jitendra Malik. 2005. Shape matching and object recognition using low distortion correspondences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 26–33. Google ScholarDigital Library
- Goutam Bhat, Felix Järemo Lawin, Martin Danelljan, Andreas Robinson, Michael Felsberg, Luc Van Gool, and Radu Timofte. 2020. Learning what to learn for video object segmentation. In Proceedings of the European Conference on Computer Vision.Google ScholarDigital Library
- Hilton Bristow, Jack Valmadre, and Simon Lucey. 2015. Dense semantic correspondence where every pixel is a classifier. In Proceedings of the IEEE International Conference on Computer Vision. 4024–4031. Google ScholarDigital Library
- Michael Calonder, Vincent Lepetit, Mustafa Özuysal, Tomasz Trzcinski, Christoph Strecha, and Pascal Fua. 2012. BRIEF: Computing a local binary descriptor very fast. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 7 (2012), 1281–1298. Google ScholarDigital Library
- Minsu Cho, Karteek Alahari, and Jean Ponce. 2013. Learning graphs to match. In Proceedings of the IEEE International Conference on Computer Vision. 25–32. Google ScholarDigital Library
- Minsu Cho, Jungmin Lee, and Kyoung Mu Lee. 2010. Reweighted random walks for graph matching. In Proceedings of the European Conference on Computer Vision. 492–505. Google ScholarDigital Library
- Minsu Cho, Jian Sun, Olivier Duchenne, and Jean Ponce. 2014. Finding matches in a haystack: A max-pooling strategy for graph matching in the presence of outliers. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2091–2098. Google ScholarDigital Library
- Navneet Dalal and Bill Triggs. 2005. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 886–893. Google ScholarDigital Library
- Olivier Duchenne, Armand Joulin, and Jean Ponce. 2011. A graph-matching kernel for object categorization. In Proceedings of the IEEE International Conference on Computer Vision. 1792–1799. Google ScholarDigital Library
- Fangxiang Feng, Xiaojie Wang, Ruifan Li, and Ibrar Ahmad. 2015. Correspondence autoencoders for cross-modal retrieval. ACM Transactions on Multimedia Computing, Communications, and Applications 12, 1s (2015), 22. Google ScholarDigital Library
- Martin A. Fischler and Robert C. Bolles. 1981. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24, 6 (1981), 381–395. Google ScholarDigital Library
- Shiming Ge, Jia Li, Qiting Ye, and Zhao Luo. 2017. Detecting masked faces in the wild with LLE-CNNs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 426–434.Google ScholarCross Ref
- Ross Girshick. 2015. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 1440–1448. Google ScholarDigital Library
- V. Granville, M. Krivanek, and J. P. Rasson. 1994. Simulated annealing: A proof of convergence. IEEE Transactions on Pattern Analysis and Machine Intelligence 16, 6 (1994), 652–656. Google ScholarDigital Library
- Tal Hassner, Viki Mayzels, and Lihi Zelnik-Manor. 2012. On SIFTs and their scales. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1522–1528. Google ScholarDigital Library
- Shuaiyi Huang, Qiuyue Wang, Songyang Zhang, Shipeng Yan, and Xuming He. 2019. Dynamic context correspondence network for semantic alignment. In Proceedings of the IEEE International Conference on Computer Vision. 2010–2019.Google ScholarCross Ref
- Junhwa Hur, Hwasup Lim, Changsoo Park, and Sang Chul Ahn. 2015. Generalized deformable spatial pyramid: Geometry-preserving dense correspondence estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1392–1400.Google ScholarCross Ref
- Ignacio Rocco, Relja Arandjelovic, and Josef Sivic. 2018. End-to-end weakly-supervised semantic alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6917–6925.Google ScholarCross Ref
- Sangryul Jeon, Dongbo Min, Seungryong Kim, and Kwanghoon Sohn. 2019. Joint learning of semantic alignment and object landmark detection. In Proceedings of the IEEE International Conference on Computer Vision. 7293–7302.Google ScholarCross Ref
- Jaechul Kim, Ce Liu, Fei Sha, and Kristen Grauman. 2013. Deformable spatial pyramid matching for fast dense correspondences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2307–2314. Google ScholarDigital Library
- Suna Kim, Suha Kwak, Jan Feyereisl, and Bohyung Han. 2012. Online multi-target tracking by large margin structured learning. In Proceedings of the Asian Conference on Computer Vision. 98–111. Google ScholarDigital Library
- Seungryong Kim, Stephen Lin, Sang Ryul Jeon, Dongbo Min, and Kwanghoon Sohn. 2018. Recurrent transformer networks for semantic correspondence. In Proceedings of the International Conference on Neural Information Processing Systems. 6126–6136. Google ScholarDigital Library
- Seungryong Kim, Dongbo Min, Bumsub Ham, Sangryul Jeon, Stephen Lin, and Kwanghoon Sohn. 2017. FCSS: Fully convolutional self-similarity for dense semantic correspondence. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 616–625.Google ScholarCross Ref
- Seungryong Kim, Dongbo Min, Stephen Lin, and Kwanghoon Sohn. 2017. DCTM: Discrete-continuous transformation matching for semantic flow. In Proceedings of the IEEE International Conference on Computer Vision. 4539–4548.Google ScholarCross Ref
- Seungryong Kim, Dongbo Min, Stephen Lin, and Kwanghoon Sohn. 2020. Discrete-continuous transformation matching for dense semantic correspondence. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 1 (2020), 59–73.Google ScholarDigital Library
- S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. 1983. Optimization by simulated annealing. Science 220, 4598 (1983), 671–680. Google Scholar
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Communications of the ACM 60, 6 (2017), 84–90. Google ScholarDigital Library
- Eugene L. Lawler. 1963. The quadratic assignment problem. Management Science 9, 4 (1963), 586–599. Google ScholarDigital Library
- Junghyup Lee, Dohyung Kim, Wonkyung Lee, Jean Ponce, and Bumsub Ham. 2020. Learning semantic correspondence exploiting an object-level prior. IEEE Transactions on Pattern Analysis and Machine Intelligence. Early access, August 3, 2020.Google ScholarCross Ref
- Soon-Young Lee, Jae-Young Sim, Chang-Su Kim, and Sang-Uk Lee. 2013. Correspondence matching of multi-view video sequences using mutual information based similarity measure. IEEE Transactions on Multimedia 15, 8 (2013), 1719–1731. Google ScholarDigital Library
- Marius Leordeanu and Martial Hebert. 2005. A spectral technique for correspondence problems using pairwise constraints. In Proceedings of the IEEE International Conference on Computer Vision. 1482–1489. Google ScholarDigital Library
- Marius Leordeanu, Martial Hebert, and Rahul Sukthankar. 2009. An integer projected fixed point method for graph matching and MAP inference. In Proceedings of the International Conference on Neural Information Processing Systems. 1114–1122. Google ScholarDigital Library
- Chueh-Yu Li and Chiou-Ting Hsu. 2008. Image retrieval with relevance feedback based on graph-theoretic region correspondence estimation. IEEE Transactions on Multimedia 10, 3 (2008), 447–456. Google ScholarDigital Library
- Ce Liu, Jenny Yuen, and Antonio Torralba. 2011. Nonparametric scene parsing via label transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 12 (2011), 2368–2382. Google ScholarDigital Library
- Ce Liu, Jenny Yuen, and Antonio Torralba. 2011. SIFT flow: Dense correspondence across scenes and its applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 5 (2011), 978–994. Google ScholarDigital Library
- David G. Lowe. 1999. Object recognition from local scale-invariant features. In Proceedings of the IEEE International Conference on Computer Vision. 1150–1157. Google ScholarDigital Library
- K.-K. Maninis, S. Caelles, Y. Chen, J. Pont-Tuset, L. Leal-Taixe, D. Cremers, and L. Van Gool. 2019. Video object segmentation without temporal information. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 6 (2019), 1515–1530.Google ScholarDigital Library
- David Novotný, Diane Larlus, and Andrea Vedaldi. 2017. AnchorNet: A weakly supervised network to learn geometry-sensitive features for semantic matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2867–2876.Google ScholarCross Ref
- Deepti Pachauri, Risi Kondor, and Vikas Singh. 2013. Solving the multi-way matching problem by permutation synchronization. In Proceedings of the International Conference on Neural Information Processing Systems. 1860–1868. Google ScholarDigital Library
- Jiangmiao Pang, Kai Chen, Jianping Shi, Huajun Feng, Wanli Ouyang, and Dahua Lin. 2019. Libra R-CNN: Towards balanced learning for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 821–830.Google ScholarCross Ref
- Federico Perazzi, Anna Khoreva, Rodrigo Benenson, Bernt Schiele, and Alexander Sorkine-Hornung. 2017. Learning video object segmentation from static images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3491–3500.Google ScholarCross Ref
- Richard Roberts, Sudipta N. Sinha, Richard Szeliski, and Drew Steedly. 2011. Structure from motion for scenes with large duplicate structures. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3137–3144. Google ScholarDigital Library
- Ignacio Rocco, Relja Arandjelovic, and Josef Sivic. 2017. Convolutional neural network architecture for geometric matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 39–48.Google ScholarCross Ref
- Douglas C. Schmidt and Larry E. Druffel. 1976. A fast backtracking algorithm to test directed graphs for isomorphism using distance matrices. Journal of the ACM 23, 3 (1976), 433–445. Google ScholarDigital Library
- Yumin Suh, Kamil Adamczewski, and Kyoung Mu Lee. 2015. Subgraph matching using compactness prior for robust feature correspondence. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5070–5078.Google ScholarCross Ref
- Yumin Suh, Minsu Cho, and Kyoung Mu Lee. 2012. Graph matching via sequential Monte Carlo. In Proceedings of the European Conference on Computer Vision. 624–637. Google ScholarDigital Library
- Yoshikazu Terada and Ulrike V. Luxburg. 2014. Local ordinal embedding. In Proceedings of the International Conference on Machine Learning. 847–855. Google ScholarDigital Library
- Prune Truong, Martin Danelljan, and Radu Timofte. 2020. GLU-Net: Global-local universal network for dense flow and correspondences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6258–6268.Google ScholarCross Ref
- Nikolai Ufer and Bjorn Ommer. 2017. Deep semantic feature matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5929–5938.Google ScholarCross Ref
- Julian R. Ullmann. 1976. An algorithm for subgraph isomorphism. Journal of the ACM 23, 1 (1976), 31–42. Google ScholarDigital Library
- Carl Martin Vondrick, Abhinav Shrivastava, Alireza Fathi, Sergio Guadarrama, and Kevin Murphy. 2018. Tracking emerges by colorizing videos. In Proceedings of the European Conference on Computer Vision. 402–419.Google ScholarCross Ref
- Rui Wang, Dong Liang, Wei Zhang, and Xiaochun Cao. 2016. MatchDR: Image correspondence by leveraging distance ratio constraint. In Proceedings of the ACM Conference on Multimedia. 606–610. Google ScholarDigital Library
- Zhiyu Wang, Peng Cui, Lexing Xie, Wenwu Zhu, Yong Rui, and Shiqiang Yang. 2014. Bilateral correspondence model for words-and-pictures association in multimedia-rich microblogs. ACM Transactions on Multimedia Computing, Communications, and Applications 10, 4 (2014), 21. Google ScholarDigital Library
- Zhichao Yin, Trevor Darrell, and Fisher Yu. 2019. Hierarchical discrete distribution decomposition for match density estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6044–6053.Google ScholarCross Ref
- Zhengyou Zhang. 1994. Iterative point matching for registration of free-form curves and surfaces. International Journal of Computer Vision 13, 2 (1994), 119–152. Google ScholarDigital Library
- Wanlei Zhao and Chong-Wah Ngo. 2013. Flip-Invariant SIFT for copy and object detection. IEEE Transactions on Image Processing 22, 3 (2013), 980–991. Google ScholarDigital Library
- Xiaowei Zhou, Menglong Zhu, and Kostas Daniilidis. 2015. Multi-image matching via fast alternating minimization. In Proceedings of the IEEE International Conference on Computer Vision. 4032–4040. Google ScholarDigital Library
Index Terms
- Semantic Correspondence with Geometric Structure Analysis
Recommendations
MatchDR: Image Correspondence by Leveraging Distance Ratio Constraint
MM '16: Proceedings of the 24th ACM international conference on MultimediaImage correspondence is to establish the connections between coherent images, which can be quite challenging due to the visual and geometric deformations. This paper proposes a robust image correspondence technique from the perspective of spatial ...
3-D surface reconstruction from stereoscopic image sequences
ICCV '95: Proceedings of the Fifth International Conference on Computer VisionA stereoscopic scene analysis system for 3-D modeling of objects from stereoscopic image sequences is described. A dense map of 3-D surface points is obtained by image correspondence, object segmentation, interpolation, and triangulation. Emphasis is ...
Tolerance near sets and image correspondence
The principal problem considered in this paper is how to solve the image correspondence problem using a bio-inspired approach. One solution to this problem is to consider tolerance near sets that model human perception in a physical continuum. Near sets ...
Comments