Elsevier

Pattern Recognition

Volume 134, February 2023, 109086
Pattern Recognition

Domain Generalization by Joint-Product Distribution Alignment

https://doi.org/10.1016/j.patcog.2022.109086Get rights and content

Highlights

  • To address the domain difference in domain generalization, we propose to align multiple domains P1(x,y),,Pn(x,y) via the alignment of two distributions: the joint distribution P(x,y,l) and the product distribution P(x,y)P(l), where the domain label l{1,,n}.

  • We analytically derive an explicit estimate of the Relative Chi-Square (RCS) divergence between P(x,y,l) and P(x,y)P(l), and minimize this estimate to align distributions in the neural transformation space.

  • We demonstrate the effectiveness of our solution through conducting comprehensive experiments on several multi-domain image classification datasets.

Abstract

In this work, we address the problem of domain generalization for classification, where the goal is to learn a classification model on a set of source domains and generalize it to a target domain. The source and target domains are different, which weakens the generalization ability of the learned model. To tackle the domain difference, we propose to align a joint distribution and a product distribution using a neural transformation, and minimize the Relative Chi-Square (RCS) divergence between the two distributions to learn that transformation. In this manner, we conveniently achieve the alignment of multiple domains in the neural transformation space. Specifically, we show that the RCS divergence can be explicitly estimated as the maximal value of a quadratic function, which allows us to perform joint-product distribution alignment by minimizing the divergence estimate. We demonstrate the effectiveness of our solution through comparison with the state-of-the-art methods on several image classification datasets.

Introduction

A common assumption underlying most supervised learning algorithms is that the training (source) and test (target) data are drawn from the same domain P(x,y), where x is the feature variable and y is the class label. Under this assumption, a classification model appropriately trained in the source domain is guaranteed to generalize well to the target domain in a probability sense [1]. Unfortunately, in real-world applications, control over the data generation process is less perfect: The source data available for training the classification model can be distributionally different from the target data on which the model will be tested, a problem known as dataset shift [2], [3], dataset bias [4], or domain shift [5], [6]. Under such circumstances, the source trained model may perform poorly on the target data [7], [8], [9], [10].

Domain generalization is concerned with the above non-identically-distributed supervised learning problem, where the training data are respectively drawn from n(n2) source domains P1(x,y),,Pn(x,y), while the test data are sampled from a target domain Pt(x,y). The source and target domains are different but related to one another [11], [12], [13], and the goal of domain generalization for classification is to learn/train a classification model on the source domains and generalize it to the target domain.

Domain generalization methods aim to exploit the domain relationship (i.e., relationship among distributions) to reduce the domain difference and train a classification model on the source data [11], [13], [14], [15]. Essentially, these methods aim to learn a feature transformation (i.e., a projection matrix or a neural transformation) to align the n source domains P1(x,y), , Pn(x,y) whose samples are available during training, and expect the learned transformation to generalize to the target domain Pt(x,y) such that its difference from the source ones is reduced. As a result, the source trained model can generalize better to the target domain [11], [14], [16].

Since a domain P(x,y) can be factorized into P(x,y)=P(y|x)P(x) or P(x,y)=P(x|y)P(y), generally there are two solutions to align the n domains P1(x,y),,Pn(x,y). The first one learns a feature transformation to align a set of n marginal distributions (marginals) P1(x),,Pn(x), and assumes that the posterior distribution P(y|x) is stable [7], [11], [12]. However, as discussed in Zhao et al. [14], Nguyen et al. [15], Li et al. [16], the stability of P(y|x) is often violated in practice (e.g., speaker recognition, object recognition), which could result in the under-alignment of domains. Therefore, the second solution proposes to align domains P1(x,y),,Pn(x,y) by seeking a feature transformation (e.g., a neural transformation) to align a set of n marginals and multiple sets of class-conditional distributions (class-conditionals) [13], [14], [16]. These sets of class-conditionals could either be c sets of n distributions P1(x|y=i),,Pn(x|y=i) for i{1,,c}, or n sets of c distributions Pl(x|y=1),,Pl(x|y=c) for l{1,,n}, where c is the number of classes. However, since this solution has to align multiple sets of class-conditionals, with each set containing multiple distributions, it may not scale well with the number of classes [15]. Besides, to align distributions in the neural network context, it usually needs to introduce additional discriminator subnetworks, and solves the challenging minimax problem between the neural transformation and the added subnetworks.

In this work, we address the above issues and propose a Joint-Product Distribution Alignment (JPDA) solution to align the n domains P1(x,y), ,Pn(x,y). To be specific, we first introduce domain label l{1,,n} for the n domains and rewrite them as P(x,y|l=1),,P(x,y|l=n), respectively. We then learn a neural transformation (feature extractor) T to align the joint distribution P(x,y,l) and the product distribution P(x,y)P(l) such that P(T(x),y,l)=P(T(x),y)P(l), which implies that the distribution of (T(x),y) is independent of the domain label l. This independence conveniently leads to the alignment of the n domains, i.e., P(T(x),y|l=1)==P(T(x),y|l=n). Compared to the aforementioned two solutions from prior works [7], [11], [13], [14], our JPDA solution (1) avoids the factorization of domains and the alignment of the many factorized components, i.e., the marginal distributions and the class-conditional distributions, and (2) only needs to align two distributions, i.e., the joint distribution and the product distribution. Such alignment is algorithmically straightforward and scales well with the number of classes. Namely, unlike previous works [13], [14], in our solution the number of distributions aligned is fixed and does not grow with the number of classes. Apart from aligning distributions in the neural transformation space, we learn a downstream classifier for the target classification task.

To be more specific, we align joint distribution P(x,y,l) and product distribution P(x,y)P(l) under the distribution-ratio-based Relative Chi-Square (RCS) divergence [18]. Importantly, we show that the RCS divergence between these two distributions can be analytically estimated as the maximal value of a quadratic function, and consequently obtain an explicit estimate of the RCS divergence. This allows us to directly minimize the divergence estimate with respect to the neural transformation to achieve joint-product distribution alignment. Compared to the existing adversarial methods [7], [13], [14] that make use of another distribution-ratio-based divergence, the Jensen–Shannon (JS) divergence, our JPDA solution (1) does not need to introduce additional discriminator subnetworks, which could result in learning more network parameters, and (2) avoids solving the challenging minimax problem between the neural transformation and the discriminator subnetworks. Our cost function is a combination of the joint-product distribution divergence and the classification loss. We minimize it via the minibatch Stochastic Gradient Descent (SGD) algorithm, and obtain a network model (containing the neural transformation and the classifier) with better generalization capability. See Fig. 1 that illustrates our solution to domain generalization for image classification. To summarize, our major contributions in this work can be listed as follows:

  • We propose to align domains P1(x,y),,Pn(x,y) via the alignment of joint distribution P(x,y,l) and product distribution P(x,y)P(l), where the domain label l{1,,n}.

  • We analytically derive an explicit estimate of the RCS divergence between P(x,y,l) and P(x,y)P(l) to serve as the alignment loss.

  • We demonstrate the effectiveness of our solution by conducting comprehensive experiments on several multi-domain image classification datasets.

Section snippets

Related work

We first discuss the domain alignment works in domain generalization, which align domains by factorizing them and aligning the factorized components, i.e., the marginal distributions (marginals), the class-conditional distributions (class-conditionals). Subsequently, we briefly review works that tackle the problem via other strategies.

The study of learning and generalizing a classification model from multiple source domains to a target domain can be traced back to the early works of

Methodology

We define the domain generalization problem, give an overview of our solution, and elaborate on the technical details.

Experiments

For conducting the domain generalization experiments, we note that there exist two different experimental settings in the field: one commonly practiced in Nguyen et al. [15], Xu et al. [27], Yang et al. [28], Carlucci et al. [37], and the other one proposed by Gulrajani and Lopez-Paz [38]. We conduct our experiments under the former, which involves following the settings in prior works and citing the available results reported by the authors themselves.

Conclusion

In this work, we study the domain generalization problem and propose the JPDA solution to better generalize a source trained network classification model to a different but related target domain. Our solution aligns the joint distribution and the product distribution in the neural transformation space, and minimizes the classification loss. Particularly, the two distributions are aligned under the RCS divergence, which is estimated from empirical data via analytically solving an unconstrained

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported in part by the Technology and Innovation Major Project of the Ministry of Science and Technology of China under Grant 2020AAA0108404, in part by the National Natural Science Foundation of China under Grant 62106137, and in part by the Shantou University under Grant NTF21035.

Sentao Chen received the Ph.D. degree in software engineering from South China University of Technology, Guangzhou, China, in 2020. He is currently a Lecturer with the Department of Computer Science, Shantou University, Shantou, China. His research interests include statistical machine learning, domain adaptation, and domain generalization.

References (55)

  • H. Li et al.

    Domain generalization with adversarial feature learning

    IEEE Conference on Computer Vision and Pattern Recognition

    (2018)
  • S. Chen et al.

    Domain adaptation by joint distribution invariant projections

    IEEE Trans. Image Process.

    (2020)
  • K. Zhou et al.

    Domain generalization with mixstyle

    International Conference on Learning Representations

    (2021)
  • K. Muandet et al.

    Domain generalization via invariant feature representation

    International Conference on Machine Learning

    (2013)
  • M. Ghifary et al.

    Scatter component analysis: a unified framework for domain adaptation and domain generalization

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2017)
  • Y. Li et al.

    Deep domain generalization via conditional invariant adversarial networks

    European Conference on Computer Vision

    (2018)
  • S. Zhao et al.

    Domain generalization via entropy regularization

    Advances in Neural Information Processing Systems

    (2020)
  • A.T. Nguyen et al.

    Domain invariant representation learning with domain density transformations

    Advances in Neural Information Processing Systems

    (2021)
  • Y. Li et al.

    Domain generalization via conditional invariant representations

    AAAI Conference on Artificial Intelligence

    (2018)
  • H. Venkateswara et al.

    Deep hashing network for unsupervised domain adaptation

    IEEE Conference on Computer Vision and Pattern Recognition

    (2017)
  • M. Yamada et al.

    Relative density-ratio estimation for robust distribution comparison

    Neural Comput.

    (2013)
  • G. Blanchard et al.

    Generalizing from several related classification tasks to a new unlabeled sample

    Advances in Neural Information Processing Systems

    (2011)
  • A. Khosla et al.

    Undoing the damage of dataset bias

    European Conference on Computer Vision

    (2012)
  • K. Akuzawa et al.

    Adversarial invariant feature learning with accuracy constraint for domain generalization

    Joint European Conference on Machine Learning and Knowledge Discovery in Databases

    (2019)
  • Y. Ganin et al.

    Domain-adversarial training of neural networks

    J. Mach. Learn. Res.

    (2016)
  • M. Gong et al.

    Domain adaptation with conditional transferable components

    International Conference on Machine Learning

    (2016)
  • D. Li et al.

    Learning to generalize: meta-learning for domain generalization

    AAAI Conference on Artificial Intelligence

    (2018)
  • Cited by (17)

    View all citing articles on Scopus

    Sentao Chen received the Ph.D. degree in software engineering from South China University of Technology, Guangzhou, China, in 2020. He is currently a Lecturer with the Department of Computer Science, Shantou University, Shantou, China. His research interests include statistical machine learning, domain adaptation, and domain generalization.

    Lei Wang received his Ph.D. degree from Nanyang Technological University, Singapore in 2004. Now he works as associate professor at School of Computing and Information Technology of University of Wollongong, Australia. His research interests include pattern recognition, machine/deep learning, computer vision and image retrieval.

    Zijie Hong received the B.S. degree in software engineering from South China University of Technology, Guangzhou, China, in 2019. He is currently pursuing the M.Sc. degree in software engineering in the School of Software Engineering, South China University of Technology. His research interests include domain adaptation and computer vision.

    Xiaowei Yang received the B.S. degree in theoretical and applied mechanics, the M.Sc. degree in computational mechanics, and the Ph.D. degree in solid mechanics from Jilin University, Changchun, China, in 1991, 1996, and 2000, respectively. He is currently a full-time professor in the School of Software Engineering, South China University of Technology. His current research interests include designs and analyses of algorithms for large-scale pattern recognition, imbalanced learning, semisupervised learning, support vector machines, tensor learning, and evolutionary computation.

    View full text