Abstract
Establishing dense semantic correspondences between object instances remains a challenging problem due to background clutter, significant scale and pose differences, and large intra-class variations. In this paper, we present an end-to-end trainable network for learning semantic correspondences using only matching image pairs without manual keypoint correspondence annotations. To facilitate network training with this weaker form of supervision, we (1) explicitly estimate the foreground regions to suppress the effect of background clutter and (2) develop cycle-consistent losses to enforce the predicted transformations across multiple images to be geometrically plausible and consistent. We train the proposed model using the PF-PASCAL dataset and evaluate the performance on the PF-PASCAL, PF-WILLOW, and TSS datasets. Extensive experimental results show that the proposed approach achieves favorably performance compared to the state-of-the-art. The code and model will be available at https://yunchunchen.github.io/WeakMatchNet/.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Horn, B.K.P., Schunck, B.G.: Determining optical flow. Artif. Intell. 17(1–3), 185–203 (1981)
Lucas, B.D., Kanade, T., et al.: An iterative image registration technique with an application to stereo vision. In: IJCAI (1981)
Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IJCV 47, 7–42 (2002)
Wang, Z.-F., Zheng, Z.-G.: A region based stereo matching algorithm using cooperative optimization. In: CVPR (2008)
Liu, C., Yuen, J., Torralba, A.: SIFT Flow: dense correspondence across scenes and its applications. TPAMI 33, 978–994 (2011)
Chen, H.-Y., Lin, Y.-Y., Chen, B.-Y.: Co-segmentation guided hough transform for robust feature matching. TPAMI 37, 2388–2401 (2015)
Taniai, T., Sinha, S.N., Sato, Y.: Joint recovery of dense correspondence and cosegmentation in two images. In: CVPR (2016)
Hsu, K.-J., Lin, Y.-Y., Chuang, Y.-Y.: Co-attention CNNs for unsupervised object co-segmentation. In: IJCAI (2018)
Mustafa, A., Hilton, A.: Semantically coherent co-segmentation and reconstruction of dynamic scenes. In: CVPR (2017)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)
Choy, C.B., Gwak, J.Y., Savarese, S., Chandraker, M.: Universal correspondence network. In: NIPS (2016)
Han, K., et al.: SCNet: learning semantic correspondence. In: ICCV (2017)
Kim, S., Min, D., Ham, B., Jeon, S., Lin, S., Sohn, K.: FCSS: fully convolutional self-similarity for dense semantic correspondence. In: CVPR (2017)
Rocco, I., Arandjelović, R., Sivic, J.: Convolutional neural network architecture for geometric matching. In: CVPR (2017)
Rocco, I., Arandjelović, R., Sivic, J.: End-to-end weakly-supervised semantic alignment. In: CVPR (2018)
Ham, B., Cho, M., Schmid, C., Ponce, J.: Proposal flow: semantic correspondences from object proposals. TPAMI 40, 1711–1725 (2017)
Hu, Y.-T., Lin, Y.-Y.: Progressive feature matching with alternate descriptor selection and correspondence enrichment. In: CVPR (2016)
Hu, Y.-T., Lin, Y.-Y., Chen, H.-Y., Hsu, K.-J., Chen, B.-Y.: Matching images with multiple descriptors: an unsupervised approach for locally adaptive descriptor selection. TIP 24, 5995–6010 (2015)
Hsu, K.-J., Lin, Y.-Y., Chuang, Y.-Y., et al.: Robust image alignment with multiple feature descriptors and matching-guided neighborhoods. In: CVPR (2015)
Kim, J., Liu, C., Sha, F., Grauman, K.: Deformable spatial pyramid matching for fast dense correspondences. In: CVPR (2013)
Novotny, D., Larlus, D., Vedaldi, A.: AnchorNet: a weakly supervised network to learn geometry-sensitive features for semantic matching. In: CVPR (2017)
Kanazawa, A., Jacobs, D.W., Chandraker, M.: WarpNet: Weakly supervised matching for single-view reconstruction. In: CVPR (2016)
Meister, S., Hur, J., Roth, S.: UnFlow: unsupervised learning of optical flow with a bidirectional census loss. In: AAAI (2018)
Zou, Y., Luo, Z., Huang, J.-B.: DF-Net: unsupervised joint learning of depth and flow using cross-task consistency. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 38–55. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_3
Huang, J.-B., Kang, S.B., Ahuja, N., Kopf, J.: Temporally coherent completion of dynamic video. ACM Trans. Graph. (TOG) 35(6), 196 (2016)
Lai, W.-S., Huang, J.-B., Wang, O., Shechtman, E., Yumer, E., Yang, M.-H.: Learning blind video temporal consistency. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 179–195. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_11
Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: CVPR (2017)
Lee, H.-Y., Tseng, H.-Y., Huang, J.-B., Singh, M., Yang, M.-H.: Diverse image-to-image translation via disentangled representations. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 36–52. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_3
Zhou, X., Zhu, M., Daniilidis, K.: Multi-image matching via fast alternating minimization. In: ICCV (2015)
Zhou, T., Jae Lee, Y., Yu, S.X., Efros, A.A.: FlowWeb: joint image set alignment by weaving consistent, pixel-wise correspondences. In: CVPR (2015)
Zhou, T., Krahenbuhl, P., Aubry, M., Huang, Q., Efros, A.A.: Learning dense correspondence via 3D-guided cycle consistency. In: CVPR (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv:1412.6980 (2014)
Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. TPAMI 35, 2878–2890 (2013)
Bourdev, L., Malik, J.: Poselets: body part detectors trained using 3D human pose annotations. In: ICCV (2009)
Tola, E., Lepetit, V., Fua, P.: DAISY: An efficient dense descriptor applied to wide-baseline stereo. TPAMI 32, 815–830 (2010)
Zagoruyko, S., Komodakis, N.: Learning to compare image patches via convolutional neural networks. In: CVPR (2015)
Yi, K.M., Trulls, E., Lepetit, V., Fua, P.: LIFT: Learned invariant feature transform. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 467–483. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_28
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
Ufer, N., Ommer, B.: Deep semantic feature matching. In: CVPR (2017)
Yang, F., Li, X., Cheng, H., Li, J., Chen, L.: Object-aware dense semantic correspondence. In: CVPR (2017)
Kim, S., Min, D., Lin, S., Sohn, K.: DCTM: Discrete-continuous transformation matching for semantic flow. In: CVPR (2017)
Acknowledgement
This work is supported in part by Ministry of Science and Technology under grants MOST 105-2221-E-001-030-MY2 and MOST 107-2628-E-001-005-MY3.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, YC., Huang, PH., Yu, LY., Huang, JB., Yang, MH., Lin, YY. (2019). Deep Semantic Matching with Foreground Detection and Cycle-Consistency. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11363. Springer, Cham. https://doi.org/10.1007/978-3-030-20893-6_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-20893-6_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20892-9
Online ISBN: 978-3-030-20893-6
eBook Packages: Computer ScienceComputer Science (R0)