skip to main content
research-article

Semantic Correspondence with Geometric Structure Analysis

Authors Info & Claims
Published:22 July 2021Publication History
Skip Abstract Section

Abstract

This article studies the correspondence problem for semantically similar images, which is challenging due to the joint visual and geometric deformations. We introduce the Flip-aware Distance Ratio method (FDR) to solve this problem from the perspective of geometric structure analysis. First, a distance ratio constraint is introduced to enforce the geometric consistencies between images with large visual variations, whereas local geometric jitters are tolerated via a smoothness term. For challenging cases with symmetric structures, our proposed method exploits Curl to suppress the mismatches. Subsequently, image correspondence is formulated as a permutation problem, for which we propose a Gradient Guided Simulated Annealing (GGSA) algorithm to perform a robust discrete optimization. Experiments on simulated and real-world datasets, where both visual and geometric deformations are present, indicate that our method significantly improves the baselines for both visually and semantically similar images.

References

  1. Manya V. Afonso, Jacinto C. Nascimento, and Jorge S. Marques. 2013. Automatic estimation of multiple motion fields from video sequences using a region matching based approach. IEEE Transactions on Multimedia 16, 1 (2013), 1–14.Google ScholarGoogle ScholarCross RefCross Ref
  2. Xaro Benavent, Ana Garcia-Serrano, Ruben Granados, Joan Benavent, and Esther de Ves. 2013. Multimedia information retrieval based on late semantic fusion approaches: Experiments on a Wikipedia image collection. IEEE Transactions on Multimedia 15, 8 (2013), 2009–2021. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Alexander C. Berg, Tamara L. Berg, and Jitendra Malik. 2005. Shape matching and object recognition using low distortion correspondences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 26–33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Goutam Bhat, Felix Järemo Lawin, Martin Danelljan, Andreas Robinson, Michael Felsberg, Luc Van Gool, and Radu Timofte. 2020. Learning what to learn for video object segmentation. In Proceedings of the European Conference on Computer Vision.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Hilton Bristow, Jack Valmadre, and Simon Lucey. 2015. Dense semantic correspondence where every pixel is a classifier. In Proceedings of the IEEE International Conference on Computer Vision. 4024–4031. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Michael Calonder, Vincent Lepetit, Mustafa Özuysal, Tomasz Trzcinski, Christoph Strecha, and Pascal Fua. 2012. BRIEF: Computing a local binary descriptor very fast. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 7 (2012), 1281–1298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Minsu Cho, Karteek Alahari, and Jean Ponce. 2013. Learning graphs to match. In Proceedings of the IEEE International Conference on Computer Vision. 25–32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Minsu Cho, Jungmin Lee, and Kyoung Mu Lee. 2010. Reweighted random walks for graph matching. In Proceedings of the European Conference on Computer Vision. 492–505. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Minsu Cho, Jian Sun, Olivier Duchenne, and Jean Ponce. 2014. Finding matches in a haystack: A max-pooling strategy for graph matching in the presence of outliers. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2091–2098. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Navneet Dalal and Bill Triggs. 2005. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 886–893. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Olivier Duchenne, Armand Joulin, and Jean Ponce. 2011. A graph-matching kernel for object categorization. In Proceedings of the IEEE International Conference on Computer Vision. 1792–1799. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Fangxiang Feng, Xiaojie Wang, Ruifan Li, and Ibrar Ahmad. 2015. Correspondence autoencoders for cross-modal retrieval. ACM Transactions on Multimedia Computing, Communications, and Applications 12, 1s (2015), 22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Martin A. Fischler and Robert C. Bolles. 1981. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24, 6 (1981), 381–395. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Shiming Ge, Jia Li, Qiting Ye, and Zhao Luo. 2017. Detecting masked faces in the wild with LLE-CNNs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 426–434.Google ScholarGoogle ScholarCross RefCross Ref
  15. Ross Girshick. 2015. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 1440–1448. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. V. Granville, M. Krivanek, and J. P. Rasson. 1994. Simulated annealing: A proof of convergence. IEEE Transactions on Pattern Analysis and Machine Intelligence 16, 6 (1994), 652–656. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Tal Hassner, Viki Mayzels, and Lihi Zelnik-Manor. 2012. On SIFTs and their scales. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1522–1528. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Shuaiyi Huang, Qiuyue Wang, Songyang Zhang, Shipeng Yan, and Xuming He. 2019. Dynamic context correspondence network for semantic alignment. In Proceedings of the IEEE International Conference on Computer Vision. 2010–2019.Google ScholarGoogle ScholarCross RefCross Ref
  19. Junhwa Hur, Hwasup Lim, Changsoo Park, and Sang Chul Ahn. 2015. Generalized deformable spatial pyramid: Geometry-preserving dense correspondence estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1392–1400.Google ScholarGoogle ScholarCross RefCross Ref
  20. Ignacio Rocco, Relja Arandjelovic, and Josef Sivic. 2018. End-to-end weakly-supervised semantic alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6917–6925.Google ScholarGoogle ScholarCross RefCross Ref
  21. Sangryul Jeon, Dongbo Min, Seungryong Kim, and Kwanghoon Sohn. 2019. Joint learning of semantic alignment and object landmark detection. In Proceedings of the IEEE International Conference on Computer Vision. 7293–7302.Google ScholarGoogle ScholarCross RefCross Ref
  22. Jaechul Kim, Ce Liu, Fei Sha, and Kristen Grauman. 2013. Deformable spatial pyramid matching for fast dense correspondences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2307–2314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Suna Kim, Suha Kwak, Jan Feyereisl, and Bohyung Han. 2012. Online multi-target tracking by large margin structured learning. In Proceedings of the Asian Conference on Computer Vision. 98–111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Seungryong Kim, Stephen Lin, Sang Ryul Jeon, Dongbo Min, and Kwanghoon Sohn. 2018. Recurrent transformer networks for semantic correspondence. In Proceedings of the International Conference on Neural Information Processing Systems. 6126–6136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Seungryong Kim, Dongbo Min, Bumsub Ham, Sangryul Jeon, Stephen Lin, and Kwanghoon Sohn. 2017. FCSS: Fully convolutional self-similarity for dense semantic correspondence. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 616–625.Google ScholarGoogle ScholarCross RefCross Ref
  26. Seungryong Kim, Dongbo Min, Stephen Lin, and Kwanghoon Sohn. 2017. DCTM: Discrete-continuous transformation matching for semantic flow. In Proceedings of the IEEE International Conference on Computer Vision. 4539–4548.Google ScholarGoogle ScholarCross RefCross Ref
  27. Seungryong Kim, Dongbo Min, Stephen Lin, and Kwanghoon Sohn. 2020. Discrete-continuous transformation matching for dense semantic correspondence. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 1 (2020), 59–73.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. 1983. Optimization by simulated annealing. Science 220, 4598 (1983), 671–680. Google ScholarGoogle Scholar
  29. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Communications of the ACM 60, 6 (2017), 84–90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Eugene L. Lawler. 1963. The quadratic assignment problem. Management Science 9, 4 (1963), 586–599. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Junghyup Lee, Dohyung Kim, Wonkyung Lee, Jean Ponce, and Bumsub Ham. 2020. Learning semantic correspondence exploiting an object-level prior. IEEE Transactions on Pattern Analysis and Machine Intelligence. Early access, August 3, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  32. Soon-Young Lee, Jae-Young Sim, Chang-Su Kim, and Sang-Uk Lee. 2013. Correspondence matching of multi-view video sequences using mutual information based similarity measure. IEEE Transactions on Multimedia 15, 8 (2013), 1719–1731. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Marius Leordeanu and Martial Hebert. 2005. A spectral technique for correspondence problems using pairwise constraints. In Proceedings of the IEEE International Conference on Computer Vision. 1482–1489. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Marius Leordeanu, Martial Hebert, and Rahul Sukthankar. 2009. An integer projected fixed point method for graph matching and MAP inference. In Proceedings of the International Conference on Neural Information Processing Systems. 1114–1122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Chueh-Yu Li and Chiou-Ting Hsu. 2008. Image retrieval with relevance feedback based on graph-theoretic region correspondence estimation. IEEE Transactions on Multimedia 10, 3 (2008), 447–456. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Ce Liu, Jenny Yuen, and Antonio Torralba. 2011. Nonparametric scene parsing via label transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 12 (2011), 2368–2382. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Ce Liu, Jenny Yuen, and Antonio Torralba. 2011. SIFT flow: Dense correspondence across scenes and its applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 5 (2011), 978–994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. David G. Lowe. 1999. Object recognition from local scale-invariant features. In Proceedings of the IEEE International Conference on Computer Vision. 1150–1157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. K.-K. Maninis, S. Caelles, Y. Chen, J. Pont-Tuset, L. Leal-Taixe, D. Cremers, and L. Van Gool. 2019. Video object segmentation without temporal information. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 6 (2019), 1515–1530.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. David Novotný, Diane Larlus, and Andrea Vedaldi. 2017. AnchorNet: A weakly supervised network to learn geometry-sensitive features for semantic matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2867–2876.Google ScholarGoogle ScholarCross RefCross Ref
  41. Deepti Pachauri, Risi Kondor, and Vikas Singh. 2013. Solving the multi-way matching problem by permutation synchronization. In Proceedings of the International Conference on Neural Information Processing Systems. 1860–1868. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Jiangmiao Pang, Kai Chen, Jianping Shi, Huajun Feng, Wanli Ouyang, and Dahua Lin. 2019. Libra R-CNN: Towards balanced learning for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 821–830.Google ScholarGoogle ScholarCross RefCross Ref
  43. Federico Perazzi, Anna Khoreva, Rodrigo Benenson, Bernt Schiele, and Alexander Sorkine-Hornung. 2017. Learning video object segmentation from static images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3491–3500.Google ScholarGoogle ScholarCross RefCross Ref
  44. Richard Roberts, Sudipta N. Sinha, Richard Szeliski, and Drew Steedly. 2011. Structure from motion for scenes with large duplicate structures. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3137–3144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Ignacio Rocco, Relja Arandjelovic, and Josef Sivic. 2017. Convolutional neural network architecture for geometric matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 39–48.Google ScholarGoogle ScholarCross RefCross Ref
  46. Douglas C. Schmidt and Larry E. Druffel. 1976. A fast backtracking algorithm to test directed graphs for isomorphism using distance matrices. Journal of the ACM 23, 3 (1976), 433–445. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Yumin Suh, Kamil Adamczewski, and Kyoung Mu Lee. 2015. Subgraph matching using compactness prior for robust feature correspondence. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5070–5078.Google ScholarGoogle ScholarCross RefCross Ref
  48. Yumin Suh, Minsu Cho, and Kyoung Mu Lee. 2012. Graph matching via sequential Monte Carlo. In Proceedings of the European Conference on Computer Vision. 624–637. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Yoshikazu Terada and Ulrike V. Luxburg. 2014. Local ordinal embedding. In Proceedings of the International Conference on Machine Learning. 847–855. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Prune Truong, Martin Danelljan, and Radu Timofte. 2020. GLU-Net: Global-local universal network for dense flow and correspondences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6258–6268.Google ScholarGoogle ScholarCross RefCross Ref
  51. Nikolai Ufer and Bjorn Ommer. 2017. Deep semantic feature matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5929–5938.Google ScholarGoogle ScholarCross RefCross Ref
  52. Julian R. Ullmann. 1976. An algorithm for subgraph isomorphism. Journal of the ACM 23, 1 (1976), 31–42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Carl Martin Vondrick, Abhinav Shrivastava, Alireza Fathi, Sergio Guadarrama, and Kevin Murphy. 2018. Tracking emerges by colorizing videos. In Proceedings of the European Conference on Computer Vision. 402–419.Google ScholarGoogle ScholarCross RefCross Ref
  54. Rui Wang, Dong Liang, Wei Zhang, and Xiaochun Cao. 2016. MatchDR: Image correspondence by leveraging distance ratio constraint. In Proceedings of the ACM Conference on Multimedia. 606–610. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Zhiyu Wang, Peng Cui, Lexing Xie, Wenwu Zhu, Yong Rui, and Shiqiang Yang. 2014. Bilateral correspondence model for words-and-pictures association in multimedia-rich microblogs. ACM Transactions on Multimedia Computing, Communications, and Applications 10, 4 (2014), 21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Zhichao Yin, Trevor Darrell, and Fisher Yu. 2019. Hierarchical discrete distribution decomposition for match density estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6044–6053.Google ScholarGoogle ScholarCross RefCross Ref
  57. Zhengyou Zhang. 1994. Iterative point matching for registration of free-form curves and surfaces. International Journal of Computer Vision 13, 2 (1994), 119–152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Wanlei Zhao and Chong-Wah Ngo. 2013. Flip-Invariant SIFT for copy and object detection. IEEE Transactions on Image Processing 22, 3 (2013), 980–991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Xiaowei Zhou, Menglong Zhu, and Kostas Daniilidis. 2015. Multi-image matching via fast alternating minimization. In Proceedings of the IEEE International Conference on Computer Vision. 4032–4040. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Semantic Correspondence with Geometric Structure Analysis

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 3
      August 2021
      443 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3476118
      Issue’s Table of Contents

      Copyright © 2021 Association for Computing Machinery.

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 22 July 2021
      • Accepted: 1 December 2020
      • Revised: 1 October 2020
      • Received: 1 March 2020
      Published in tomm Volume 17, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format