research-article

Structure-Aware Semantic-Aligned Network for Universal Cross-Domain Retrieval

Authors:

Jialin Tian,

Xing Xu,

Kai Wang,

Zuo Cao,

Xunliang Cai,

Heng Tao ShenAuthors Info & Claims

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 278 - 289

https://doi.org/10.1145/3477495.3532061

Published: 07 July 2022 Publication History

Get Access

Abstract

The goal of cross-domain retrieval (CDR) is to search for instances of the same category in one domain by using a query from another domain. Existing CDR approaches mainly consider the standard scenario that the cross-domain data for both training and testing come from the same categories and underlying distributions. However, these methods cannot be well extended to the newly emerging task of universal cross-domain retrieval (UCDR), where the testing data belong to the domain and categories not present during training. Compared to CDR, the UCDR task is more challenging due to (1) visually diverse data from multi-source domains, (2) the domain shift between seen and unseen domains, and (3) the semantic shift across seen and unseen categories. To tackle these problems, we propose a novel model termed Structure-Aware Semantic-Aligned Network (SASA) to align the heterogeneous representations of multi-source domains without loss of generalizability for the UCDR task. Specifically, we leverage the advanced Vision Transformer (ViT) as the backbone and devise a distillation-alignment ViT (DAViT) with a novel token-based strategy, which incorporates two complementary distillation and alignment tokens into the ViT architecture. In addition, the distillation token is devised to improve the generalizability of our model by structure information preservation and the alignment token is used to improve discriminativeness with trainable categorical prototypes. Extensive experiments on three large-scale benchmarks, i.e., Sketchy, TU-Berlin, and DomainNet, demonstrate the superiority of our SASA method over the state-of-the-art UCDR and ZS-SBIR methods.

Supplementary Material

MP4 File (SIGIR2022-fp0257.mp4)

Presentation video

Download
36.81 MB

References

[1]

Zeynep Akata, Mateusz Malinowski, Mario Fritz, and Bernt Schiele. 2016. Multi-cue zero-shot learning with strong supervision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 59--68.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Multimodal Disentanglement Variational AutoEncoders for Zero-Shot Cross-Modal Retrieval

Cross-domain 3D model retrieval via visual domain adaptation

Universal unsupervised cross-domain 3D shape retrieval

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations