Skip to main content
Log in

Maximum Mean and Covariance Discrepancy for Unsupervised Domain Adaptation

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

A fundamental research topic in domain adaptation is how best to evaluate the distribution discrepancy across domains. The maximum mean discrepancy (MMD) is one of the most commonly used statistical distances in this field. However, information about distributions could be lost when adopting non-characteristic kernels by MMD. To address this issue, we devise a new distribution metric named maximum mean and covariance discrepancy (MMCD) by combining MMD and the proposed maximum covariance discrepancy (MCD). MCD probes the second-order statistics in reproducing kernel Hilbert space, which equips MMCD to capture more information compared to MMD alone. To verify the efficacy of MMCD, an unsupervised learning model based on MMCD abbreviated as McDA was proposed and efficiently optimized to resolve the domain adaptation problem. Experiments on image classification conducted on two benchmark datasets show that McDA outperforms other representative domain adaptation methods, which implies the effectiveness of MMCD in domain adaptation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Baktashmotlagh M, Harandi M, Salzmann M (2016) Distribution-matching embedding for visual domain adaptation. J Mach Learn Res 17(1):3760–3789

    Google Scholar 

  2. Baktashmotlagh M, Harandi MT, Lovell BC, Salzmann M (2013) Unsupervised domain adaptation by domain invariant projection. In: IEEE international conference on computer vision, pp 769–776

  3. Borgwardt KM, Gretton A, Rasch MJ, Kriegel HP, Schölkopf B, Smola AJ (2006) Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics 22(14):e49–e57

    Google Scholar 

  4. Bruzzone L, Marconcini M (2010) Domain adaptation problems: a DASVM classification technique and a circular validation strategy. IEEE Trans Pattern Anal Mach Intell 32(5):770–787

    Google Scholar 

  5. Cao X, Wipf D, Wen F, Duan G, Sun J (2013) A practical transfer learning algorithm for face verification. In: IEEE international conference on computer vision, pp 3208–3215

  6. Dai W, Yang Q, Xue GR, Yu Y (2007) Boosting for transfer learning. In: Ghahramani Z (ed) Proceedings of the 24th international conference on machine learning. ACM, New York, pp 193–200

  7. Dudley R, Fulton W, Katok A, Sarnak P, Bollobás B, Kirwan F (2002) Real analysis and probability. Cambridge University Press, Cambridge. https://doi.org/10.1017/CBO9780511755347

  8. Fernando B, Habrard A, Sebban M, Tuytelaars T (2013) Unsupervised visual domain adaptation using subspace alignment. In: IEEE International conference on computer vision, pp 2960–2967

  9. Fukumizu K, Gretton A, Sun X, Schölkopf B (2008) Kernel measures of conditional dependence. In: Advances in neural information processing systems, pp 489–496

  10. Gong B, Shi Y, Sha F, Grauman K (2012) Geodesic flow kernel for unsupervised domain adaptation. In: IEEE international conference on computer vision, pp 2066–2073

  11. Gretton A, Borgwardt KM, Rasch MJ, Schölkopf B, Smola A (2012) A kernel two-sample test. J Mach Learn Res 13(Mar):723–773

    Google Scholar 

  12. Gretton A, Bousquet O, Smola A, Schölkopf B (2005) Measuring statistical dependence with Hilbert–Schmidt norms. In: International conference on algorithmic learning theory, pp 63–77

  13. Griffin G, Holub A, Perona P (2007) Caltech-256 object category dataset. Technical Report 7694, California Institute of Technology

  14. Guo Y, Ding G, Liu Q (2015) Distribution regularized nonnegative matrix factorization for transfer visual feature learning. In: ACM international conference on multimedia retrieval, pp 299–306

  15. Hsieh YT, Tao SY, Tsai YHH, Yeh YR, Wang YCF (2016) Recognizing heterogeneous cross-domain data via generalized joint distribution adaptation. In: IEEE international conference on multimedia and expo, pp 1–6

  16. Hsu TH, Chen W, Hou C, Tsai YH, Yeh Y, Wang YF (2015) Unsupervised domain adaptation with imbalanced cross-domain data. In: IEEE international conference on computer vision, pp 4121–4129

  17. Huang J, Smola AJ, Gretton A, Borgwardt KM, Schölkopf B (2006) Correcting sample selection bias by unlabeled data. In: Advances in neural information processing systems, pp 601–608

  18. Jiang W, Deng C, Liu W, Nie F, Chung F, Huang H (2017) Theoretic analysis and extremely easy algorithms for domain adaptive feature learning. In: Proceedings of the international joint conference on artificial intelligence, pp 1958–1964

  19. Long M, Cao Y, Wang J, Jordan M (2015) Learning transferable features with deep adaptation networks. In: International conference on machine learning, pp 97–105

  20. Long M, Wang J, Ding G, Sun J, Yu PS (2013) Transfer feature learning with joint distribution adaptation. In: IEEE international conference on computer vision, pp 2200–2207

  21. Long M, Zhu H, Wang J, Jordan MI (2016) Unsupervised domain adaptation with residual transfer networks. In: Advances in neural information processing systems, pp 136–144

  22. Mroueh Y, Sercu T, Goel V (2017) McGan: mean and covariance feature matching GAN. In: International conference on machine learning, pp 2527–2535

  23. Muandet K, Fukumizu K, Sriperumbudur B, Schölkopf B et al (2017) Kernel mean embedding of distributions: a review and beyond. Found Trends Mach Learn 10(1–2):1–141

    Google Scholar 

  24. Müller A (1997) Integral probability metrics and their generating classes of functions. Adv Appl Probab 29(2):429–443

    Google Scholar 

  25. Pan SJ, Tsang IW, Kwok JT, Yang Q (2011) Domain adaptation via transfer component analysis. IEEE Trans Neural Netw 22(2):199–210

    Google Scholar 

  26. Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359

    Google Scholar 

  27. Patel VM, Gopalan R, Li R, Chellappa R (2015) Visual domain adaptation: a survey of recent advances. IEEE Signal Process Mag 32(3):53–69

    Google Scholar 

  28. Quang M.H, San Biagio M, Murino V (2014) Log-Hilbert–Schmidt metric between positive definite operators on Hilbert spaces. In: Advances in neural information processing systems, pp 388–396

  29. Quang Minh H, San Biagio M, Bazzani L, Murino V (2016) Approximate log-Hilbert–Schmidt distances between covariance operators for image classification. In: IEEE conference on computer vision and pattern recognition, pp 5195–5203

  30. Saenko K, Kulis B, Fritz M, Darrell T (2010) Adapting visual category models to new domains. In: European conference on computer vision, pp 213–226

  31. Schölkopf B, Smola AJ, Bach F et al (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge

    Google Scholar 

  32. Shen J, Qu Y, Zhang W, Yu Y (2018) Wasserstein distance guided representation learning for domain adaptation. In: AAAI conference on artificial intelligence

  33. Si S, Tao D, Geng B (2010) Bregman divergence-based regularization for transfer subspace learning. IEEE Trans Knowl Data Eng 22(7):929–942

    Google Scholar 

  34. Sriperumbudur BK, Gretton A, Fukumizu K, Lanckriet GRG, Schölkopf B (2008) Injective Hilbert space embeddings of probability measures. In: Annual conference on learning theory, pp 111–122

  35. Sun B, Feng J, Saenko K (2016) Return of frustratingly easy domain adaptation. In: AAAI conference on artificial intelligence, pp 2058–2065

  36. Sun H, Liu S, Zhou S (2016) Discriminative subspace alignment for unsupervised visual domain adaptation. Neural Process Lett 44(3):779–793

    Google Scholar 

  37. Wang T, Ye T, Gurrin C (2016) Transfer nonnegative matrix factorization for image representation. In: International conference on multimedia modeling, pp 3–14

  38. Xie X, Sun S, Chen H, Qian J (2018) Domain adaptation with twin support vector machines. Neural Process Lett 48(2):1213–1226

    Google Scholar 

  39. Zellinger W, Grubinger T, Lughofer E, Natschläger T, Saminger-Platz S (2017) Central moment discrepancy (CMD) for domain-invariant representation learning. In: International conference on learning representations

  40. Zhu L, Zhang X, Zhang W, Huang X, Guan N, Luo Z (2017) Unsupervised domain adaptation with joint supervised sparse coding and discriminative regularization term. In: IEEE international conference on image processing, pp 3066–3070

  41. Zong Y, Zheng W, Zhang T, Huang X (2016) Cross-corpus speech emotion recognition based on domain-adaptive least-squares regression. IEEE Signal Process Lett 23(5):585–589

    Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (61806213, 61702134, U1435222).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiang Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Gradient Computation

Appendix A: Gradient Computation

According to (16), when the polynomial kernel of degree d is adopted, the gradient of the empirical estimator of squared MMD with respect to the data matrix \(A=[X, Y]\) is given by

$$\begin{aligned} \frac{{\partial \widehat{\mathrm{MMD}}^2}}{{\partial A}} = 2dA( {M \circ {K_{d - 1}}} ), \end{aligned}$$
(39)

where \({\left( {{K_{d - 1}}} \right) _{ij}} = {\left( {{A_i^T}A_j + c} \right) ^{d - 1}}\) and \( \circ \) denotes the element-wise product. Likewise, there holds

$$\begin{aligned} \frac{{\partial \widehat{\mathrm{MCD}}^2}}{{\partial A}} = 4dA( {Z{K_d}Z \circ {K_{d - 1}}} ), \end{aligned}$$
(40)

and

$$\begin{aligned} \frac{{\partial \widehat{\mathrm{MMCD}}^2}}{{\partial A}} = \frac{{\partial \widehat{\mathrm{MMD}}^2}}{{\partial A}} + \beta \frac{{\partial \widehat{\mathrm{MCD}}^2}}{{\partial A}}. \end{aligned}$$
(41)

The gradients of MMD, MCD and MMCD with the linear kernel can be obtained by setting \(d=1\) and \(c=0\) in (39)–(41), respectively.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, W., Zhang, X., Lan, L. et al. Maximum Mean and Covariance Discrepancy for Unsupervised Domain Adaptation. Neural Process Lett 51, 347–366 (2020). https://doi.org/10.1007/s11063-019-10090-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-019-10090-0

Keywords

Navigation