Domain Generalization by Joint-Product Distribution Alignment
Introduction
A common assumption underlying most supervised learning algorithms is that the training (source) and test (target) data are drawn from the same domain , where is the feature variable and is the class label. Under this assumption, a classification model appropriately trained in the source domain is guaranteed to generalize well to the target domain in a probability sense [1]. Unfortunately, in real-world applications, control over the data generation process is less perfect: The source data available for training the classification model can be distributionally different from the target data on which the model will be tested, a problem known as dataset shift [2], [3], dataset bias [4], or domain shift [5], [6]. Under such circumstances, the source trained model may perform poorly on the target data [7], [8], [9], [10].
Domain generalization is concerned with the above non-identically-distributed supervised learning problem, where the training data are respectively drawn from source domains , while the test data are sampled from a target domain . The source and target domains are different but related to one another [11], [12], [13], and the goal of domain generalization for classification is to learn/train a classification model on the source domains and generalize it to the target domain.
Domain generalization methods aim to exploit the domain relationship (i.e., relationship among distributions) to reduce the domain difference and train a classification model on the source data [11], [13], [14], [15]. Essentially, these methods aim to learn a feature transformation (i.e., a projection matrix or a neural transformation) to align the source domains , , whose samples are available during training, and expect the learned transformation to generalize to the target domain such that its difference from the source ones is reduced. As a result, the source trained model can generalize better to the target domain [11], [14], [16].
Since a domain can be factorized into or , generally there are two solutions to align the domains . The first one learns a feature transformation to align a set of marginal distributions (marginals) , and assumes that the posterior distribution is stable [7], [11], [12]. However, as discussed in Zhao et al. [14], Nguyen et al. [15], Li et al. [16], the stability of is often violated in practice (e.g., speaker recognition, object recognition), which could result in the under-alignment of domains. Therefore, the second solution proposes to align domains by seeking a feature transformation (e.g., a neural transformation) to align a set of marginals and multiple sets of class-conditional distributions (class-conditionals) [13], [14], [16]. These sets of class-conditionals could either be sets of distributions for , or sets of distributions for , where is the number of classes. However, since this solution has to align multiple sets of class-conditionals, with each set containing multiple distributions, it may not scale well with the number of classes [15]. Besides, to align distributions in the neural network context, it usually needs to introduce additional discriminator subnetworks, and solves the challenging minimax problem between the neural transformation and the added subnetworks.
In this work, we address the above issues and propose a Joint-Product Distribution Alignment (JPDA) solution to align the domains , . To be specific, we first introduce domain label for the domains and rewrite them as , respectively. We then learn a neural transformation (feature extractor) to align the joint distribution and the product distribution such that , which implies that the distribution of is independent of the domain label . This independence conveniently leads to the alignment of the domains, i.e., . Compared to the aforementioned two solutions from prior works [7], [11], [13], [14], our JPDA solution (1) avoids the factorization of domains and the alignment of the many factorized components, i.e., the marginal distributions and the class-conditional distributions, and (2) only needs to align two distributions, i.e., the joint distribution and the product distribution. Such alignment is algorithmically straightforward and scales well with the number of classes. Namely, unlike previous works [13], [14], in our solution the number of distributions aligned is fixed and does not grow with the number of classes. Apart from aligning distributions in the neural transformation space, we learn a downstream classifier for the target classification task.
To be more specific, we align joint distribution and product distribution under the distribution-ratio-based Relative Chi-Square (RCS) divergence [18]. Importantly, we show that the RCS divergence between these two distributions can be analytically estimated as the maximal value of a quadratic function, and consequently obtain an explicit estimate of the RCS divergence. This allows us to directly minimize the divergence estimate with respect to the neural transformation to achieve joint-product distribution alignment. Compared to the existing adversarial methods [7], [13], [14] that make use of another distribution-ratio-based divergence, the Jensen–Shannon (JS) divergence, our JPDA solution (1) does not need to introduce additional discriminator subnetworks, which could result in learning more network parameters, and (2) avoids solving the challenging minimax problem between the neural transformation and the discriminator subnetworks. Our cost function is a combination of the joint-product distribution divergence and the classification loss. We minimize it via the minibatch Stochastic Gradient Descent (SGD) algorithm, and obtain a network model (containing the neural transformation and the classifier) with better generalization capability. See Fig. 1 that illustrates our solution to domain generalization for image classification. To summarize, our major contributions in this work can be listed as follows:
- •
We propose to align domains via the alignment of joint distribution and product distribution , where the domain label .
- •
We analytically derive an explicit estimate of the RCS divergence between and to serve as the alignment loss.
- •
We demonstrate the effectiveness of our solution by conducting comprehensive experiments on several multi-domain image classification datasets.
Section snippets
Related work
We first discuss the domain alignment works in domain generalization, which align domains by factorizing them and aligning the factorized components, i.e., the marginal distributions (marginals), the class-conditional distributions (class-conditionals). Subsequently, we briefly review works that tackle the problem via other strategies.
The study of learning and generalizing a classification model from multiple source domains to a target domain can be traced back to the early works of
Methodology
We define the domain generalization problem, give an overview of our solution, and elaborate on the technical details.
Experiments
For conducting the domain generalization experiments, we note that there exist two different experimental settings in the field: one commonly practiced in Nguyen et al. [15], Xu et al. [27], Yang et al. [28], Carlucci et al. [37], and the other one proposed by Gulrajani and Lopez-Paz [38]. We conduct our experiments under the former, which involves following the settings in prior works and citing the available results reported by the authors themselves.
Conclusion
In this work, we study the domain generalization problem and propose the JPDA solution to better generalize a source trained network classification model to a different but related target domain. Our solution aligns the joint distribution and the product distribution in the neural transformation space, and minimizes the classification loss. Particularly, the two distributions are aligned under the RCS divergence, which is estimated from empirical data via analytically solving an unconstrained
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was supported in part by the Technology and Innovation Major Project of the Ministry of Science and Technology of China under Grant 2020AAA0108404, in part by the National Natural Science Foundation of China under Grant 62106137, and in part by the Shantou University under Grant NTF21035.
Sentao Chen received the Ph.D. degree in software engineering from South China University of Technology, Guangzhou, China, in 2020. He is currently a Lecturer with the Department of Computer Science, Shantou University, Shantou, China. His research interests include statistical machine learning, domain adaptation, and domain generalization.
References (55)
- et al.
A unifying view on dataset shift in classification
Pattern Recognit.
(2012) - et al.
Generalizable model-agnostic semantic segmentation via target-specific normalization
Pattern Recognit.
(2022) - et al.
Exploring uncertainty in pseudo-label guided unsupervised domain adaptation
Pattern Recognit.
(2019) - et al.
Correlation-aware adversarial domain adaptation and generalization
Pattern Recognit.
(2020) - et al.
Domain generalization and adaptation based on second-order style information
Pattern Recognit.
(2022) - et al.
A two-way alignment approach for unsupervised multi-source domain adaptation
Pattern Recognit.
(2022) Statistical Learning Theory
(1998)- et al.
Dataset Shift in Machine Learning
(2008) - et al.
Unbiased look at dataset bias
IEEE Conference on Computer Vision and Pattern Recognition
(2011) - et al.
Deeper, broader and artier domain generalization
IEEE International Conference on Computer Vision
(2017)
Domain generalization with adversarial feature learning
IEEE Conference on Computer Vision and Pattern Recognition
Domain adaptation by joint distribution invariant projections
IEEE Trans. Image Process.
Domain generalization with mixstyle
International Conference on Learning Representations
Domain generalization via invariant feature representation
International Conference on Machine Learning
Scatter component analysis: a unified framework for domain adaptation and domain generalization
IEEE Trans. Pattern Anal. Mach. Intell.
Deep domain generalization via conditional invariant adversarial networks
European Conference on Computer Vision
Domain generalization via entropy regularization
Advances in Neural Information Processing Systems
Domain invariant representation learning with domain density transformations
Advances in Neural Information Processing Systems
Domain generalization via conditional invariant representations
AAAI Conference on Artificial Intelligence
Deep hashing network for unsupervised domain adaptation
IEEE Conference on Computer Vision and Pattern Recognition
Relative density-ratio estimation for robust distribution comparison
Neural Comput.
Generalizing from several related classification tasks to a new unlabeled sample
Advances in Neural Information Processing Systems
Undoing the damage of dataset bias
European Conference on Computer Vision
Adversarial invariant feature learning with accuracy constraint for domain generalization
Joint European Conference on Machine Learning and Knowledge Discovery in Databases
Domain-adversarial training of neural networks
J. Mach. Learn. Res.
Domain adaptation with conditional transferable components
International Conference on Machine Learning
Learning to generalize: meta-learning for domain generalization
AAAI Conference on Artificial Intelligence
Cited by (17)
Multiple-environment Self-adaptive Network for aerial-view geo-localization
2024, Pattern RecognitionDiscovering causally invariant features for out-of-distribution generalization
2024, Pattern RecognitionMulti-Source Domain Adaptation with Mixture of Joint Distributions
2024, Pattern RecognitionSemi-supervised domain generalization with evolving intermediate domain
2024, Pattern RecognitionDomain generalization via Inter-domain Alignment and Intra-domain Expansion
2024, Pattern RecognitionGeneralization of deep learning models for natural gas indication in 2D seismic data
2023, Pattern Recognition
Sentao Chen received the Ph.D. degree in software engineering from South China University of Technology, Guangzhou, China, in 2020. He is currently a Lecturer with the Department of Computer Science, Shantou University, Shantou, China. His research interests include statistical machine learning, domain adaptation, and domain generalization.
Lei Wang received his Ph.D. degree from Nanyang Technological University, Singapore in 2004. Now he works as associate professor at School of Computing and Information Technology of University of Wollongong, Australia. His research interests include pattern recognition, machine/deep learning, computer vision and image retrieval.
Zijie Hong received the B.S. degree in software engineering from South China University of Technology, Guangzhou, China, in 2019. He is currently pursuing the M.Sc. degree in software engineering in the School of Software Engineering, South China University of Technology. His research interests include domain adaptation and computer vision.
Xiaowei Yang received the B.S. degree in theoretical and applied mechanics, the M.Sc. degree in computational mechanics, and the Ph.D. degree in solid mechanics from Jilin University, Changchun, China, in 1991, 1996, and 2000, respectively. He is currently a full-time professor in the School of Software Engineering, South China University of Technology. His current research interests include designs and analyses of algorithms for large-scale pattern recognition, imbalanced learning, semisupervised learning, support vector machines, tensor learning, and evolutionary computation.