research-article

Semantic Correspondence with Geometric Structure Analysis

Authors:
Rui Wang

Chinese Academy of Sciences and University of Chinese Academy of Sciences, Beijing, China

Chinese Academy of Sciences and University of Chinese Academy of Sciences, Beijing, China
View Profile

,
Dong Liang

Chinese Academy of Sciences and University of Chinese Academy of Sciences, Beijing, China

Chinese Academy of Sciences and University of Chinese Academy of Sciences, Beijing, China
View Profile

,
Xiaochun Cao

Chinese Academy of Sciences and University of Chinese Academy of Sciences, Beijing, China

Chinese Academy of Sciences and University of Chinese Academy of Sciences, Beijing, China
View Profile

,
Yuanfang Guo

Beihang University, Beijing, China

Beihang University, Beijing, China
View Profile

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 17 Issue 3Article No.: 83pp 1–21https://doi.org/10.1145/3441576

Published:22 July 2021Publication History

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

This article studies the correspondence problem for semantically similar images, which is challenging due to the joint visual and geometric deformations. We introduce the Flip-aware Distance Ratio method (FDR) to solve this problem from the perspective of geometric structure analysis. First, a distance ratio constraint is introduced to enforce the geometric consistencies between images with large visual variations, whereas local geometric jitters are tolerated via a smoothness term. For challenging cases with symmetric structures, our proposed method exploits Curl to suppress the mismatches. Subsequently, image correspondence is formulated as a permutation problem, for which we propose a Gradient Guided Simulated Annealing (GGSA) algorithm to perform a robust discrete optimization. Experiments on simulated and real-world datasets, where both visual and geometric deformations are present, indicate that our method significantly improves the baselines for both visually and semantically similar images.

References

Manya V. Afonso, Jacinto C. Nascimento, and Jorge S. Marques. 2013. Automatic estimation of multiple motion fields from video sequences using a region matching based approach. IEEE Transactions on Multimedia 16, 1 (2013), 1–14.Google ScholarCross Ref
Xaro Benavent, Ana Garcia-Serrano, Ruben Granados, Joan Benavent, and Esther de Ves. 2013. Multimedia information retrieval based on late semantic fusion approaches: Experiments on a Wikipedia image collection. IEEE Transactions on Multimedia 15, 8 (2013), 2009–2021. Google ScholarDigital Library
Alexander C. Berg, Tamara L. Berg, and Jitendra Malik. 2005. Shape matching and object recognition using low distortion correspondences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 26–33. Google ScholarDigital Library
Goutam Bhat, Felix Järemo Lawin, Martin Danelljan, Andreas Robinson, Michael Felsberg, Luc Van Gool, and Radu Timofte. 2020. Learning what to learn for video object segmentation. In Proceedings of the European Conference on Computer Vision.Google ScholarDigital Library
Hilton Bristow, Jack Valmadre, and Simon Lucey. 2015. Dense semantic correspondence where every pixel is a classifier. In Proceedings of the IEEE International Conference on Computer Vision. 4024–4031. Google ScholarDigital Library
Michael Calonder, Vincent Lepetit, Mustafa Özuysal, Tomasz Trzcinski, Christoph Strecha, and Pascal Fua. 2012. BRIEF: Computing a local binary descriptor very fast. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 7 (2012), 1281–1298. Google ScholarDigital Library
Minsu Cho, Karteek Alahari, and Jean Ponce. 2013. Learning graphs to match. In Proceedings of the IEEE International Conference on Computer Vision. 25–32. Google ScholarDigital Library
Minsu Cho, Jungmin Lee, and Kyoung Mu Lee. 2010. Reweighted random walks for graph matching. In Proceedings of the European Conference on Computer Vision. 492–505. Google ScholarDigital Library
Minsu Cho, Jian Sun, Olivier Duchenne, and Jean Ponce. 2014. Finding matches in a haystack: A max-pooling strategy for graph matching in the presence of outliers. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2091–2098. Google ScholarDigital Library
Navneet Dalal and Bill Triggs. 2005. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 886–893. Google ScholarDigital Library
Olivier Duchenne, Armand Joulin, and Jean Ponce. 2011. A graph-matching kernel for object categorization. In Proceedings of the IEEE International Conference on Computer Vision. 1792–1799. Google ScholarDigital Library
Fangxiang Feng, Xiaojie Wang, Ruifan Li, and Ibrar Ahmad. 2015. Correspondence autoencoders for cross-modal retrieval. ACM Transactions on Multimedia Computing, Communications, and Applications 12, 1s (2015), 22. Google ScholarDigital Library
Martin A. Fischler and Robert C. Bolles. 1981. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24, 6 (1981), 381–395. Google ScholarDigital Library
Shiming Ge, Jia Li, Qiting Ye, and Zhao Luo. 2017. Detecting masked faces in the wild with LLE-CNNs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 426–434.Google ScholarCross Ref
Ross Girshick. 2015. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 1440–1448. Google ScholarDigital Library
V. Granville, M. Krivanek, and J. P. Rasson. 1994. Simulated annealing: A proof of convergence. IEEE Transactions on Pattern Analysis and Machine Intelligence 16, 6 (1994), 652–656. Google ScholarDigital Library
Tal Hassner, Viki Mayzels, and Lihi Zelnik-Manor. 2012. On SIFTs and their scales. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1522–1528. Google ScholarDigital Library
Shuaiyi Huang, Qiuyue Wang, Songyang Zhang, Shipeng Yan, and Xuming He. 2019. Dynamic context correspondence network for semantic alignment. In Proceedings of the IEEE International Conference on Computer Vision. 2010–2019.Google ScholarCross Ref
Junhwa Hur, Hwasup Lim, Changsoo Park, and Sang Chul Ahn. 2015. Generalized deformable spatial pyramid: Geometry-preserving dense correspondence estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1392–1400.Google ScholarCross Ref
Ignacio Rocco, Relja Arandjelovic, and Josef Sivic. 2018. End-to-end weakly-supervised semantic alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6917–6925.Google ScholarCross Ref
Sangryul Jeon, Dongbo Min, Seungryong Kim, and Kwanghoon Sohn. 2019. Joint learning of semantic alignment and object landmark detection. In Proceedings of the IEEE International Conference on Computer Vision. 7293–7302.Google ScholarCross Ref
Jaechul Kim, Ce Liu, Fei Sha, and Kristen Grauman. 2013. Deformable spatial pyramid matching for fast dense correspondences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2307–2314. Google ScholarDigital Library
Suna Kim, Suha Kwak, Jan Feyereisl, and Bohyung Han. 2012. Online multi-target tracking by large margin structured learning. In Proceedings of the Asian Conference on Computer Vision. 98–111. Google ScholarDigital Library
Seungryong Kim, Stephen Lin, Sang Ryul Jeon, Dongbo Min, and Kwanghoon Sohn. 2018. Recurrent transformer networks for semantic correspondence. In Proceedings of the International Conference on Neural Information Processing Systems. 6126–6136. Google ScholarDigital Library
Seungryong Kim, Dongbo Min, Bumsub Ham, Sangryul Jeon, Stephen Lin, and Kwanghoon Sohn. 2017. FCSS: Fully convolutional self-similarity for dense semantic correspondence. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 616–625.Google ScholarCross Ref
Seungryong Kim, Dongbo Min, Stephen Lin, and Kwanghoon Sohn. 2017. DCTM: Discrete-continuous transformation matching for semantic flow. In Proceedings of the IEEE International Conference on Computer Vision. 4539–4548.Google ScholarCross Ref
Seungryong Kim, Dongbo Min, Stephen Lin, and Kwanghoon Sohn. 2020. Discrete-continuous transformation matching for dense semantic correspondence. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 1 (2020), 59–73.Google ScholarDigital Library
S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. 1983. Optimization by simulated annealing. Science 220, 4598 (1983), 671–680. Google Scholar
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Communications of the ACM 60, 6 (2017), 84–90. Google ScholarDigital Library
Eugene L. Lawler. 1963. The quadratic assignment problem. Management Science 9, 4 (1963), 586–599. Google ScholarDigital Library
Junghyup Lee, Dohyung Kim, Wonkyung Lee, Jean Ponce, and Bumsub Ham. 2020. Learning semantic correspondence exploiting an object-level prior. IEEE Transactions on Pattern Analysis and Machine Intelligence. Early access, August 3, 2020.Google ScholarCross Ref
Soon-Young Lee, Jae-Young Sim, Chang-Su Kim, and Sang-Uk Lee. 2013. Correspondence matching of multi-view video sequences using mutual information based similarity measure. IEEE Transactions on Multimedia 15, 8 (2013), 1719–1731. Google ScholarDigital Library
Marius Leordeanu and Martial Hebert. 2005. A spectral technique for correspondence problems using pairwise constraints. In Proceedings of the IEEE International Conference on Computer Vision. 1482–1489. Google ScholarDigital Library
Marius Leordeanu, Martial Hebert, and Rahul Sukthankar. 2009. An integer projected fixed point method for graph matching and MAP inference. In Proceedings of the International Conference on Neural Information Processing Systems. 1114–1122. Google ScholarDigital Library
Chueh-Yu Li and Chiou-Ting Hsu. 2008. Image retrieval with relevance feedback based on graph-theoretic region correspondence estimation. IEEE Transactions on Multimedia 10, 3 (2008), 447–456. Google ScholarDigital Library
Ce Liu, Jenny Yuen, and Antonio Torralba. 2011. Nonparametric scene parsing via label transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 12 (2011), 2368–2382. Google ScholarDigital Library
Ce Liu, Jenny Yuen, and Antonio Torralba. 2011. SIFT flow: Dense correspondence across scenes and its applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 5 (2011), 978–994. Google ScholarDigital Library
David G. Lowe. 1999. Object recognition from local scale-invariant features. In Proceedings of the IEEE International Conference on Computer Vision. 1150–1157. Google ScholarDigital Library
K.-K. Maninis, S. Caelles, Y. Chen, J. Pont-Tuset, L. Leal-Taixe, D. Cremers, and L. Van Gool. 2019. Video object segmentation without temporal information. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 6 (2019), 1515–1530.Google ScholarDigital Library
David Novotný, Diane Larlus, and Andrea Vedaldi. 2017. AnchorNet: A weakly supervised network to learn geometry-sensitive features for semantic matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2867–2876.Google ScholarCross Ref
Deepti Pachauri, Risi Kondor, and Vikas Singh. 2013. Solving the multi-way matching problem by permutation synchronization. In Proceedings of the International Conference on Neural Information Processing Systems. 1860–1868. Google ScholarDigital Library
Jiangmiao Pang, Kai Chen, Jianping Shi, Huajun Feng, Wanli Ouyang, and Dahua Lin. 2019. Libra R-CNN: Towards balanced learning for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 821–830.Google ScholarCross Ref
Federico Perazzi, Anna Khoreva, Rodrigo Benenson, Bernt Schiele, and Alexander Sorkine-Hornung. 2017. Learning video object segmentation from static images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3491–3500.Google ScholarCross Ref
Richard Roberts, Sudipta N. Sinha, Richard Szeliski, and Drew Steedly. 2011. Structure from motion for scenes with large duplicate structures. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3137–3144. Google ScholarDigital Library
Ignacio Rocco, Relja Arandjelovic, and Josef Sivic. 2017. Convolutional neural network architecture for geometric matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 39–48.Google ScholarCross Ref
Douglas C. Schmidt and Larry E. Druffel. 1976. A fast backtracking algorithm to test directed graphs for isomorphism using distance matrices. Journal of the ACM 23, 3 (1976), 433–445. Google ScholarDigital Library
Yumin Suh, Kamil Adamczewski, and Kyoung Mu Lee. 2015. Subgraph matching using compactness prior for robust feature correspondence. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5070–5078.Google ScholarCross Ref
Yumin Suh, Minsu Cho, and Kyoung Mu Lee. 2012. Graph matching via sequential Monte Carlo. In Proceedings of the European Conference on Computer Vision. 624–637. Google ScholarDigital Library
Yoshikazu Terada and Ulrike V. Luxburg. 2014. Local ordinal embedding. In Proceedings of the International Conference on Machine Learning. 847–855. Google ScholarDigital Library
Prune Truong, Martin Danelljan, and Radu Timofte. 2020. GLU-Net: Global-local universal network for dense flow and correspondences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6258–6268.Google ScholarCross Ref
Nikolai Ufer and Bjorn Ommer. 2017. Deep semantic feature matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5929–5938.Google ScholarCross Ref
Julian R. Ullmann. 1976. An algorithm for subgraph isomorphism. Journal of the ACM 23, 1 (1976), 31–42. Google ScholarDigital Library
Carl Martin Vondrick, Abhinav Shrivastava, Alireza Fathi, Sergio Guadarrama, and Kevin Murphy. 2018. Tracking emerges by colorizing videos. In Proceedings of the European Conference on Computer Vision. 402–419.Google ScholarCross Ref
Rui Wang, Dong Liang, Wei Zhang, and Xiaochun Cao. 2016. MatchDR: Image correspondence by leveraging distance ratio constraint. In Proceedings of the ACM Conference on Multimedia. 606–610. Google ScholarDigital Library
Zhiyu Wang, Peng Cui, Lexing Xie, Wenwu Zhu, Yong Rui, and Shiqiang Yang. 2014. Bilateral correspondence model for words-and-pictures association in multimedia-rich microblogs. ACM Transactions on Multimedia Computing, Communications, and Applications 10, 4 (2014), 21. Google ScholarDigital Library
Zhichao Yin, Trevor Darrell, and Fisher Yu. 2019. Hierarchical discrete distribution decomposition for match density estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6044–6053.Google ScholarCross Ref
Zhengyou Zhang. 1994. Iterative point matching for registration of free-form curves and surfaces. International Journal of Computer Vision 13, 2 (1994), 119–152. Google ScholarDigital Library
Wanlei Zhao and Chong-Wah Ngo. 2013. Flip-Invariant SIFT for copy and object detection. IEEE Transactions on Image Processing 22, 3 (2013), 980–991. Google ScholarDigital Library
Xiaowei Zhou, Menglong Zhu, and Kostas Daniilidis. 2015. Multi-image matching via fast alternating minimization. In Proceedings of the IEEE International Conference on Computer Vision. 4032–4040. Google ScholarDigital Library

Index Terms

Semantic Correspondence with Geometric Structure Analysis
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Matching
2. Theory of computation
  1. Design and analysis of algorithms
    1. Mathematical optimization
      1. Discrete optimization
        Optimization with randomized search heuristics
        Simulated annealing

Recommendations

MatchDR: Image Correspondence by Leveraging Distance Ratio Constraint
MM '16: Proceedings of the 24th ACM international conference on Multimedia

Image correspondence is to establish the connections between coherent images, which can be quite challenging due to the visual and geometric deformations. This paper proposes a robust image correspondence technique from the perspective of spatial ...
Read More
3-D surface reconstruction from stereoscopic image sequences
ICCV '95: Proceedings of the Fifth International Conference on Computer Vision

A stereoscopic scene analysis system for 3-D modeling of objects from stereoscopic image sequences is described. A dense map of 3-D surface points is obtained by image correspondence, object segmentation, interpolation, and triangulation. Emphasis is ...
Read More
Tolerance near sets and image correspondence

The principal problem considered in this paper is how to solve the image correspondence problem using a bio-inspired approach. One solution to this problem is to consider tolerance near sets that model human perception in a physical continuum. Near sets ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Multimedia Computing, Communications, and Applications Volume 17, Issue 3
August 2021
443 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3476118
Editor:
Alberto Del Bimbo
University of Firenze, Italy
Issue’s Table of Contents
Copyright © 2021 Association for Computing Machinery.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 July 2021
- Accepted: 1 December 2020
- Revised: 1 October 2020
- Received: 1 March 2020
Published in tomm Volume 17, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Curl
image correspondence
bilateral symmetry
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 22
  Total Citations
  View Citations
- 132
  Total Downloads
- Downloads (Last 12 months)16
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Semantic Correspondence with Geometric Structure Analysis

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

References

Cited By

Index Terms

Recommendations

MatchDR: Image Correspondence by Leveraging Distance Ratio Constraint

3-D surface reconstruction from stereoscopic image sequences

Tolerance near sets and image correspondence