research-article

Unsupervised Domain Adaptation for Referring Semantic Segmentation

Authors:

Fei WuAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 5807 - 5818

https://doi.org/10.1145/3581783.3611879

Published: 27 October 2023 Publication History

Get Access

Abstract

In this paper, we study the task of referring semantic segmentation in a highly practical setting, in which labeled visual data with corresponding text descriptions are available in the source, but only unlabeled visual data (without text descriptions) are available in the target. It is a challenging task that has many difficulties: (1) how to obtain proper queries for the target domain; (2) how to adapt visual-text joint distribution shifts; (3) how to maintain the original segmentation performance. Thus, we propose a cycle-consistent vision-language matching network to narrow down the domain gap and ease adaptation difficulty. Our model has significant practical applications since they are capable generalising to new data sources without requiring corresponding text annotations. First, a pseudo-text selector is devised to handle the missing modality, through the pre-trained clip model to measure the gap between query features of the source and visual features of the target. Next, a cross-domain segmentation predictor is adopted, which prompts the joint representations to be domain invariant and minimize the discrepancy between two domains. Then, we present a cycle-consistent query matcher to learn discriminative features via reconstructing visual features from masks. Instead of doing the textual comparison, we match the visual features to the pseudo queries. Extensive experiments show the effectiveness of our method.

Supplemental Material

MP4 File

We present our paper "Unsupervised Domain Adaptation for Referring Semantic Segmentation" in this video. In this paper, we study the task of referring semantic segmentation in a highly practical setting, in which labeled visual data with corresponding text descriptions are available in the source, but only unlabeled visual data (without text descriptions) are available in the target. We propose a cycle-consistent vision-language matching network (CVMN) to narrow down the domain gap and ease adaptation difficulty. First, a pseudo-text selector is devised to handle the missing modality. Next, a cross-domain segmentation predictor is adopted, which prompts the joint representations to be domain invariant and minimize the discrepancy between two domains. Then, we present a cycle-consistent query matcher to learn discriminative features via reconstructing visual features from masks. Extensive experiments show the effectiveness of our method.

Download
357.42 MB

References

[1]

Karsten M. Borgwardt, Arthur Gretton, Malte J. Rasch, Hans-Peter Kriegel, Bernhard Schö lkopf, and Alexander J. Smola. 2006. Integrating structured biological data by Kernel Maximum Mean Discrepancy. In Proceedings 14th International Conference on Intelligent Systems for Molecular Biology 2006. 49--57. https://doi.org/10.1093/bioinformatics/btl242

Abstract

Supplemental Material

References

Cited By

Index Terms

Recommendations

Meta-learning for efficient unsupervised domain adaptation

Unsupervised domain adaptation based COVID-19 CT infection segmentation network

Cone-Beam Computed Tomography (CBCT) Segmentation by Adversarial Learning Domain Adaptation

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations