Abstract
In this paper, we propose a new unsupervised domain adaptation method for document classification. We address the problem of domain adaptation for document classification where the source and target domains do not differ significantly and there is no labeled data in the target domain. In this case, we can use conventional semi-supervised learning. Thus, we use the naive Bayes-based expectation-maximization method (NBEM) which is very effective for document classification. However, NBEM does not utilize the difference between a source domain and a target domain. We combine NBEM with the feature weighted method for domain adaptation, referred to as “self-training feature weight” (STFW). Our proposed method alternately uses NBEM and STFW to gradually improve document classification precision for a target domain. This method significantly outperforms the conventional unsupervised methods for domain adaptation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
This equation is smoothed by considering the frequency 0.
- 2.
We set these values according to the result of preliminary experiment.
- 3.
References
Blitzer, J., McDonald, R., Pereira, F.: Domain adaptation with structural correspondence learning. In: EMNLP-2006, pp. 120–128 (2006)
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp. 92–100 (1998)
Chapelle, O., Schölkopf, B., Zien, A., et al.: Semi-supervised Learning, vol. 2. MIT Press, Cambridge (2006)
Chen, M., Weinberger, K.Q., Blitzer, J.: Co-training for domain adaptation. In: NIPS, pp. 2456–2464 (2011)
Dai, W., Xue, G.R., Yang, Q., Yu, Y.: Transferring naive Bayes classifiers for text classification. In: AAAI-2007 (2007)
Daumé III, H.: Frustratingly easy domain adaptation. In: ACL-2007, pp. 256–263 (2007)
Joachims, T.: Transductive inference for text classification using support vector machines. In: ICML, vol. 99, pp. 200–209 (1999)
Kanamori, T., Hido, S., Sugiyama, M.: A least-squares approach to direct importance estimation. J. Mach. Learn. Res. 10, 1391–1445 (2009)
Nigam, K., McCallum, A., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39(2/3), 103–134 (2000)
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)
Rosenstein, M.T., Marx, Z., Kaelbling, L.P., Dietterich, T.G.: To transfer or not to transfer. In: NIPS 2005 Workshop on Transfer Learning, vol. 898 (2005)
Settles, B.: Active Learning Literature Survey. University of Wisconsin, Madison (2010)
Søgaard, A.: Semi-supervised Learning and Domain Adaptation in Natural Language Processing. Morgan & Claypool, San Rafael (2013)
Sugiyama, M., Kawanabe, M.: Machine Learning in Non-stationary Environments: Introduction to Covariate Shift Adaptation. MIT Press, Cambridge (2011)
Tan, S., Cheng, X., Wang, Y., Xu, H.: Adapting naive Bayes to domain adaptation for sentiment analysis. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 337–349. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00958-7_31
Acknowledgment
The work reported in this article was supported by the NINJAL collaborative research project ‘Development of all-words WSD systems and construction of a correspondence table between WLSP and IJD by these systems.’
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Shinnou, H., Komiya, K., Sasaki, M. (2018). Domain Adaptation for Document Classification by Alternately Using Semi-supervised Learning and Feature Weighted Learning. In: Hasida, K., Pa, W. (eds) Computational Linguistics. PACLING 2017. Communications in Computer and Information Science, vol 781. Springer, Singapore. https://doi.org/10.1007/978-981-10-8438-6_17
Download citation
DOI: https://doi.org/10.1007/978-981-10-8438-6_17
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8437-9
Online ISBN: 978-981-10-8438-6
eBook Packages: Computer ScienceComputer Science (R0)