Abstract
This paper concerns the problem of Unsupervised Domain Adaptation (UDA) in text classification, aiming to transfer the knowledge from a source domain to a different but related target domain. Previous methods learn the discriminative feature of target domain in terms of noisy pseudo labels, which inevitably produces negative effects on training a robust model. In this paper, we propose a novel criterion Conditional Mean Discrepancy (CMD) to learn the discriminative features by matching the conditional distributions across domains. CMD embeds both the conditional distributions of source and target domains into tensor-product Hilbert space and computes Hilbert-Schmidt norm instead. We shed a new light on discriminative feature adaptation: the collective knowledge of discriminative features of different domains is naturally discovered by minimizing CMD. We propose Aligned Adaptation Networks (AAN) to learn the domain-invariant and discriminative features simultaneously based on Maximum Mean Discrepancy (MMD) and CMD. Meanwhile, to trade off between the marginal and conditional distributions, we further maximize both MMD and CMD criterions using adversarial strategy to make the features of AAN more discrepancy-invariant. To the best of our knowledge, this is the first work to definitely evaluate the shifts in the conditional distributions across domains. Experiments on cross-domain text classification demonstrate that AAN achieves better classification accuracy but less convergence time compared to the state-of-the-art deep methods.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, Dordrecht (2011). https://doi.org/10.1007/978-1-4419-9096-9
Borgwardt, K.M., Gretton, A., Rasch, M.J., Kriegel, H., Scholkopf, B., Smola, A.J.: Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics 22(14), 49–57 (2006)
Chen, M., Xu, Z., Weinberger, K.Q., Sha, F.: Marginalized denoising autoencoders for domain adaptation. In: Proceedings of the 29th International Conference on International Conference on Machine Learning, pp. 1627–1634 (2012)
Chen, X., Sun, Y., Athiwaratkun, B., Cardie, C., Weinberger, K.: Adversarial deep averaging networks for cross-lingual sentiment classification. Trans. Assoc. Comput. Linguist. 6, 557–570 (2018)
Cho, K., et al.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734 (2014)
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT 2019, pp. 4171–4186. Association for Computational Linguistics (2019)
Fang, X., Bai, H., Guo, Z., Shen, B., Hoi, S., Xu, Z.: DART: domain-adversarial residual-transfer networks for unsupervised cross-domain image classification. Neural Netw. 127, 182–192 (2020)
Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: International Conference on Machine Learning, pp. 1180–1189 (2015)
Ge, Y., Chen, D., Li, H.: Mutual mean-teaching: pseudo label refinery for unsupervised domain adaptation on person re-identification. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020 (2020)
Gong, B., Shi, Y., Sha, F., Grauman, K.: Geodesic flow kernel for unsupervised domain adaptation. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2066–2073. IEEE (2012)
Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. J. Mach. Learn. Res. 13, 723–773 (2012)
Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737 (2017)
Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP 2014, pp. 1746–1751. ACL (2014)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv e-prints arXiv:1412.6980 (December 2014)
Long, M., Cao, Z., Wang, J., Jordan, M.I.: Conditional adversarial domain adaptation. In: Advances in Neural Information Processing Systems, pp. 1640–1650 (2018)
Long, M., Wang, J., Ding, G., Sun, J., Yu, P.S.: Transfer feature learning with joint distribution adaptation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2200–2207 (2013)
Long, M., Zhu, H., Wang, J., Jordan, M.I.: Deep transfer learning with joint adaptation networks. In: International Conference on Machine Learning, pp. 2208–2217 (2017)
Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
Muandet, K., Fukumizu, K., Sriperumbudur, B., Schölkopf, B., et al.: Kernel mean embedding of distributions: a review and beyond. Found. Trends Mach. Learn. 10(1–2), 1–141 (2017)
Pan, S.J., Tsang, I.W., Kwok, J.T., Yang, Q.: Domain adaptation via transfer component analysis. IEEE Trans. Neural Netw. 22(2), 199–210 (2010)
Pei, Z., Cao, Z., Long, M., Wang, J.: Multi-adversarial domain adaptation. In: 32nd AAAI Conference on Artificial Intelligence (2018)
Saito, K., Ushiku, Y., Harada, T.: Asymmetric tri-training for unsupervised domain adaptation. In: International Conference on Machine Learning, pp. 2988–2997 (2017)
Song, L., Huang, J., Smola, A., Fukumizu, K.: Hilbert space embeddings of conditional distributions with applications to dynamical systems. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 961–968 (2009)
Tzeng, E., Hoffman, J., Zhang, N., Saenko, K., Darrell, T.: Deep domain confusion: maximizing for domain invariance. arXiv preprint arXiv:1412.3474 (2014)
Wang, J., Chen, Y., Hao, S., Feng, W., Shen, Z.: Balanced distribution adaptation for transfer learning. In: 2017 IEEE International Conference on Data Mining (ICDM), pp. 1129–1134. IEEE (2017)
Wang, J., Feng, W., Chen, Y., Yu, H., Huang, M., Yu, P.S.: Visual domain adaptation with manifold embedded distribution alignment. In: Proceedings of the 26th ACM International Conference on Multimedia, pp. 402–410 (2018)
Wolf, T., et al.: HuggingFace’s transformers: state-of-the-art natural language processing. arXiv arXiv:abs/1910.03771 (2019)
Ye, H., Tan, Q., He, R., Li, J., Ng, H.T., Bing, L.: Feature adaptation of pre-trained language models across languages and domains for text classification. In: Empirical Methods in Natural Language Processing (2020)
Yu, C., Wang, J., Chen, Y., Huang, M.: Transfer learning with dynamic adversarial adaptation network. In: International Conference on Data Mining, pp. 778–786. IEEE (2019)
Zou, Y., Yu, Z., Liu, X., Kumar, B., Wang, J.: Confidence regularized self-training. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5982–5991 (2019)
Acknowledgements
This work was supported in part by Fund of the State Key Laboratory of Software Development Environment and in part by the Open Research Fund from Shenzhen Research Institute of Big Data (No. 2019ORF01012).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, B., Zhang, X., Liu, Y., Chen, L. (2021). Discriminative Feature Adaptation via Conditional Mean Discrepancy for Cross-Domain Text Classification. In: Jensen, C.S., et al. Database Systems for Advanced Applications. DASFAA 2021. Lecture Notes in Computer Science(), vol 12682. Springer, Cham. https://doi.org/10.1007/978-3-030-73197-7_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-73197-7_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73196-0
Online ISBN: 978-3-030-73197-7
eBook Packages: Computer ScienceComputer Science (R0)