research-article

Category-Stitch Learning for Union Domain Generalization

Authors:

Zheng-Jun ZhaAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications and Applications, Volume 19, Issue 1

Article No.: 25, Pages 1 - 19

https://doi.org/10.1145/3524136

Published: 05 January 2023 Publication History

Abstract

Domain generalization aims at generalizing the network trained on multiple domains to unknown but related domains. Under the assumption that different domains share the same classes, previous works can build relationships across domains. However, in realistic scenarios, the change of domains is always followed by the change of categories, which raises a difficulty for collecting sufficient aligned categories across domains. Bearing this in mind, this article introduces union domain generalization (UDG) as a new domain generalization scenario, in which the label space varies across domains, and the categories in unknown domains belong to the union of all given domain categories. The absence of categories in given domains is the main obstacle to aligning different domain distributions and obtaining domain-invariant information. To address this problem, we propose category-stitch learning (CSL), which aims at jointly learning the domain-invariant information and completing missing categories in all domains through an improved variational autoencoder and generators. The domain-invariant information extraction and sample generation cross-promote each other to better generalizability. Additionally, we decouple category and domain information and propose explicitly regularizing the semantic information by the classification loss with transferred samples. Thus our method can breakthrough the category limit and generate samples of missing categories in each domain. Extensive experiments and visualizations are conducted on MNIST, VLCS, PACS, Office-Home, and DomainNet datasets to demonstrate the effectiveness of our proposed method.

References

[1]

Yogesh Balaji, Swami Sankaranarayanan, and Rama Chellappa. 2018. Metareg: Towards domain generalization using meta-regularization. In Proceedings of the Advances in Neural Information Processing Systems. 998–1008.

[2]

Pierre Baldi. 2012. Autoencoders, unsupervised learning, and deep architectures. In Proceedings of the ICML Workshop on Unsupervised and Transfer Learning. 37–49.

[3]

Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. 2007. Analysis of representations for domain adaptation. In Proceedings of the Advances in Neural Information Processing Systems. 137–144.

[4]

Zhangjie Cao, Lijia Ma, Mingsheng Long, and Jianmin Wang. 2018. Partial adversarial domain adaptation. In Proceedings of the European Conference on Computer Vision. 135–150.

Digital Library

[5]

Zhangjie Cao, Kaichao You, Mingsheng Long, Jianmin Wang, and Qiang Yang. 2019. Learning to transfer examples for partial domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]

Fabio M. Carlucci, Antonio D’Innocente, Silvia Bucci, Barbara Caputo, and Tatiana Tommasi. 2019. Domain generalization by solving jigsaw puzzles. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2229–2238.

[7]

Myung Jin Choi, Joseph J. Lim, Antonio Torralba, and Alan S. Willsky. 2010. Exploiting hierarchical context on a large database of object categories. InProceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]

Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. 2018. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8789–8797.

[9]

Yuhang Ding, Hehe Fan, Mingliang Xu, and Yi Yang. 2020. Adaptive exploration for unsupervised person re-identification. ACM Transactions on Multimedia Computing, Communications, and Applications 16, 1(2020), 19 pages. DOI:

Digital Library

[10]

Carl Doersch. 2016. Tutorial on variational autoencoders. arXiv:1606.05908. Retrieved from https://arxiv.org/abs/1606.05908.

[11]

Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. 2010. The pascal visual object classes (voc) challenge. International Journal of Computer Vision 88, 2 (2010), 303–338.

Digital Library

[12]

Geoff French, Michal Mackiewicz, and Mark Fisher. 2018. Self-ensembling for visual domain adaptation. In Proceedings of the International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=rkpoTaxA-.

[13]

Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. 2016. Domain-adversarial training of neural networks. The Journal of Machine Learning Research 17, 1 (2016), 2096–2030.

[14]

Muhammad Ghifary, David Balduzzi, W. Bastiaan Kleijn, and Mengjie Zhang. 2017. Scatter component analysis: A unified framework for domain adaptation and domain generalization. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 7 (2017), 1414–1430.

Digital Library

[15]

Muhammad Ghifary, W. Bastiaan Kleijn, Mengjie Zhang, and David Balduzzi. 2015. Domain generalization for object recognition with multi-task autoencoders. In Proceedings of the IEEE International Conference on Computer Vision. 2551–2559.

Digital Library

[16]

Gregory Griffin, Alex Holub, and Pietro Perona. 2007. Caltech-256 Object Category Dataset. California Institute of Technology.

[17]

Aditya Khosla, Tinghui Zhou, Tomasz Malisiewicz, Alexei A. Efros, and Antonio Torralba. 2012. Undoing the damage of dataset bias. In Proceedings of the European Conference on Computer Vision. Springer, 158–171.

Digital Library

[18]

My Kieu, Andrew D. Bagdanov, and Marco Bertini. 2021. Bottom-up and layerwise domain adaptation for pedestrian detection in thermal images. ACM Transactions on Multimedia Computing, Communications, and Applications 17, 1(2021), 19 pages. DOI:

Digital Library

[19]

Diederik P. Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv:1312.6114. Retrieved from https://arxiv.org/abs/1312.6114.

[20]

Jack Klys, Jake Snell, and Richard Zemel. 2018. Learning latent subspaces in variational autoencoders. In Proceedings of the Advances in Neural Information Processing Systems. 6444–6454.

[21]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems. 1097–1105.

Digital Library

[22]

Solomon Kullback. 1997. Information Theory and Statistics. Courier Corporation.

[23]

Da Li, Yongxin Yang, Yi-Zhe Song, and Timothy M. Hospedales. 2017. Deeper, broader and artier domain generalization. In Proceedings of the 2017 IEEE International Conference on Computer Vision. IEEE, 5543–5551.

[24]

Da Li, Yongxin Yang, Yi-Zhe Song, and Timothy M. Hospedales. 2018. Learning to generalize: Meta-learning for domain generalization. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.

[25]

Da Li, Jianshu Zhang, Yongxin Yang, Cong Liu, Yi-Zhe Song, and Timothy M. Hospedales. 2019. Episodic training for domain generalization. In Proceedings of the International Conference on Computer Vision. Institute of Electrical and Electronics Engineers (IEEE).

[26]

Ya Li, Xinmei Tian, Mingming Gong, Yajing Liu, Tongliang Liu, Kun Zhang, and Dacheng Tao. 2018. Deep domain generalization via conditional invariant adversarial networks. In Proceedings of the European Conference on Computer Vision. 624–639.

Digital Library

[27]

Yiying Li, Yongxin Yang, Wei Zhou, and Timothy Hospedales. 2019. Feature-critic networks for heterogeneous domain generalization. In Proceedings of the International Conference on Machine Learning. 3915–3924.

[28]

Yajing Liu, Xinmei Tian, Ya Li, Zhiwei Xiong, and Feng Wu. 2019. Compact feature learning for multi-domain image classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]

Yajing Liu, Zhiwei Xiong, Ya Li, Xinmei Tian, and Zheng-Jun Zha. 2021. Domain Generalization via Encoding and Resampling in a Unified Latent Space. In IEEE Transactions on Multimedia. DOI:

[30]

Yu Mitsuzumi, Go Irie, Daiki Ikami, and Takashi Shibata. 2021. Generalized domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1084–1093.

[31]

Krikamol Muandet, David Balduzzi, and Bernhard Schölkopf. 2013. Domain generalization via invariant feature representation. In Proceedings of the International Conference on Machine Learning. 10–18.

[32]

Augustus Odena, Christopher Olah, and Jonathon Shlens. 2017. Conditional image synthesis with auxiliary classifier gans. In Proceedings of the International Conference on Machine Learning. PMLR, 2642–2651.

[33]

Sinno Jialin Pan and Qiang Yang. 2009. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22, 10 (2009), 1345–1359.

Digital Library

[34]

Yingwei Pan, Ting Yao, Yehao Li, Chong-Wah Ngo, and Tao Mei. 2020. Exploring category-agnostic clusters for open-set domain adaptation. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition.13864–13872. DOI:

[35]

Yingwei Pan, Ting Yao, Yehao Li, Yu Wang, Chong-Wah Ngo, and Tao Mei. 2019. Transferrable prototypical networks for unsupervised domain adaptation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2234–2242. DOI:

[36]

G. Parascandolo, A. Neitz, A. Orvieto, L. Gresele, and B. Schlkopf. 2020. Learning explanations that are hard to vary. (2020).

[37]

Xingchao Peng, Qinxun Bai, Xide Xia, Zijun Huang, Kate Saenko, and Bo Wang. 2019. Moment matching for multi-source domain adaptation. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. 1406–1415. DOI:

[38]

Xingchao Peng, Qinxun Bai, Xide Xia, Zijun Huang, Kate Saenko, and Bo Wang. 2019. Moment matching for multi-source domain adaptation. In Proceedings of the IEEE International Conference on Computer Vision. 1406–1415.

[39]

Bryan C. Russell, Antonio Torralba, Kevin P. Murphy, and William T. Freeman. 2008. LabelMe: A database and web-based tool for image annotation. International Journal of Computer Vision 77, 1–3 (2008), 157–173.

Digital Library

[40]

Mert Bulent Sariyildiz and Ramazan Gokberk Cinbis. 2019. Gradient matching generative networks for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2168–2178.

[41]

Shiv Shankar, Vihari Piratla, Soumen Chakrabarti, Siddhartha Chaudhuri, Preethi Jyothi, and Sunita Sarawagi. 2018. Generalizing across domains via cross-gradient training. In Proceedings of the International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=r1Dx7fbCW.

[42]

Kihyuk Sohn, Honglak Lee, and Xinchen Yan. 2015. Learning structured output representation using deep conditional generative models. In Proceedings of the Advances in Neural Information Processing Systems. 3483–3491.

[43]

Jinhui Tang, Xiangbo Shu, Zechao Li, Guo-Jun Qi, and Jingdong Wang. 2016. Generalized deep transfer networks for knowledge propagation in heterogeneous domains. ACM Transactions on Multimedia Computing, Communications, and Applications 12, 4s(2016), 22 pages. DOI:

Digital Library

[44]

Yingtao Tian and Jesse Engel. 2018. Latent domain transfer: Crossing modalities with bridging autoencoders. In Proceedings of the ICLR 2019 Conference on Blind Submission.

[45]

Antonio Torralba and Alexei A. Efros. 2011. Unbiased look at dataset bias. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1521–1528.

Digital Library

[46]

Hemanth Venkateswara, Jose Eusebio, Shayok Chakraborty, and Sethuraman Panchanathan. 2017. Deep hashing network for unsupervised domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5018–5027.

[47]

Yaxing Wang, Abel Gonzalez-Garcia, Joost van de Weijer, and Luis Herranz. 2019. SDIT: Scalable and diverse cross-domain image translation. In Proceedings of the 27th ACM International Conference on Multimedia. 1267–1276.

Digital Library

[48]

Ruijia Xu, Ziliang Chen, Wangmeng Zuo, Junjie Yan, and Liang Lin. 2018. Deep cocktail network: Multi-source unsupervised domain adaptation with category shift. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3964–3973.

[49]

Zheng Xu, Wen Li, Li Niu, and Dong Xu. 2014. Exploiting low-rank structure from latent domains for domain generalization. In Proceedings of the European Conference on Computer Vision. Springer, 628–643.

[50]

Yuan Yao, Yu Zhang, Xutao Li, and Yunming Ye. 2019. Heterogeneous domain adaptation via soft transfer network. In Proceedings of the 27th ACM International Conference on Multimedia. 1578–1586.

Digital Library

[51]

Masoumeh Zareapoor and Jie Yang. 2021. Equivariant adversarial network for image-to-image translation. ACM Transactions on Multimedia Computing, Communications, and Applications 17, 2s(2021), 14 pages. DOI:

Digital Library

[52]

Shanshan Zhao, Mingming Gong, Tongliang Liu, Huan Fu, and Dacheng Tao. 2020. Domain generalization via entropy regularization. Advances in Neural Information Processing Systems 33 (2020).

Cited By

Park SJung SSim C(2025)NeXtSRGAN: enhancing super-resolution GAN with ConvNeXt discriminator for superior realismThe Visual Computer10.1007/s00371-024-03797-2Online publication date: 27-Jan-2025
https://doi.org/10.1007/s00371-024-03797-2
Yang KDu W(2024)A Low-Density Parity-Check Coding Scheme for LoRa NetworkingACM Transactions on Sensor Networks10.1145/366592820:4(1-29)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3665928
Zhang DZhu WLiao XQi FYang GDing X(2024)Spatiotemporal Inconsistency Learning and Interactive Fusion for Deepfake Video DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3664654Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3664654
Show More Cited By

Index Terms

Category-Stitch Learning for Union Domain Generalization
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Multi-task learning
        Transfer learning

Recommendations

Graph-based domain adversarial learning framework for video anomaly detection domain generalization
Abstract
The limited domain generalization capability of contemporary video anomaly detection methods restricts their efficacy to specific datasets. To enhance the generalizability and portability of video anomaly detection models, we propose a domain ...
Domain generalization based on domain-specific adversarial learning
Abstract
Deep learning models often suffer from degraded performance when the distributions of the training and testing data differ (i.e., domain shift). Domain generalization (DG) techniques can help improve the generalization performance for unseen ...
Gradient-aware domain-invariant learning for domain generalization: Gradient-Aware Domain-Invariant Learning for Domain Generalization
Abstract
In realistic scenarios, the effectiveness of Deep Neural Networks is hindered by domain shift, where discrepancies between training (source) and testing (target) domains lead to poor generalization on previously unseen data. The Domain ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 19, Issue 1

January 2023

505 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3572858

Editor:
Abdulmotaleb El Saddik
Mohamed Bin Zayed University of Artificial Intelligence, UAE and University of Ottawa, Canada

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 January 2023

Online AM: 17 March 2022

Accepted: 06 March 2022

Revised: 03 March 2022

Received: 18 June 2021

Published in TOMM Volume 19, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

National Key R&D Program of China
National Natural Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
529
Total Downloads

Downloads (Last 12 months)84
Downloads (Last 6 weeks)3

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Park SJung SSim C(2025)NeXtSRGAN: enhancing super-resolution GAN with ConvNeXt discriminator for superior realismThe Visual Computer10.1007/s00371-024-03797-2Online publication date: 27-Jan-2025
https://doi.org/10.1007/s00371-024-03797-2
Yang KDu W(2024)A Low-Density Parity-Check Coding Scheme for LoRa NetworkingACM Transactions on Sensor Networks10.1145/366592820:4(1-29)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3665928
Zhang DZhu WLiao XQi FYang GDing X(2024)Spatiotemporal Inconsistency Learning and Interactive Fusion for Deepfake Video DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3664654Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3664654
Zhou WYang QChen WJiang QZhai GLin W(2024)Blind Quality Assessment of Dense 3D Point Clouds with Structure Guided ResamplingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366419920:8(1-21)Online publication date: 13-Jun-2024
https://dl.acm.org/doi/10.1145/3664199
Peng BSun LLei JLiu BShen HLi WHuang Q(2024)Self-Supervised Monocular Depth Estimation via Binocular Geometric Correlation LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366357020:8(1-19)Online publication date: 13-Jun-2024
https://dl.acm.org/doi/10.1145/3663570
Yang KHan JGuo GFang CFan YCheng LZhang D(2024)Progressive Adapting and Pruning: Domain-Incremental Learning for Saliency PredictionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366131220:8(1-243)Online publication date: 13-Jun-2024
https://dl.acm.org/doi/10.1145/3661312
Jin XLi NKong WTang JYang B(2024)Unbiased Semantic Representation Learning Based on Causal Disentanglement for Domain GeneralizationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365995320:8(1-20)Online publication date: 12-Jun-2024
https://dl.acm.org/doi/10.1145/3659953
Jiang XYao YLiu SShen FNie LHua X(2024)Dual Dynamic Threshold Adjustment StrategyACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365604720:7(1-18)Online publication date: 15-May-2024
https://dl.acm.org/doi/10.1145/3656047
Zhao JYang HHe HPeng JZhang WNi JSangaiah ACastiglione A(2024)Backdoor Two-Stream Video Models on Federated LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3651307Online publication date: 7-Mar-2024
https://dl.acm.org/doi/10.1145/3651307
Suo YZheng ZWang XZhang BYang Y(2024)Jointly Harnessing Prior Structures and Temporal Consistency for Sign Language Video GenerationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364836820:6(1-18)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3648368
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents