Domain Adaptation for Document Classification by Alternately Using Semi-supervised Learning and Feature Weighted Learning

Shinnou, Hiroyuki; Komiya, Kanako; Sasaki, Minoru

doi:10.1007/978-981-10-8438-6_17

Hiroyuki Shinnou¹¹,
Kanako Komiya¹¹ &
Minoru Sasaki¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 781))

Included in the following conference series:

International Conference of the Pacific Association for Computational Linguistics

857 Accesses

Abstract

In this paper, we propose a new unsupervised domain adaptation method for document classification. We address the problem of domain adaptation for document classification where the source and target domains do not differ significantly and there is no labeled data in the target domain. In this case, we can use conventional semi-supervised learning. Thus, we use the naive Bayes-based expectation-maximization method (NBEM) which is very effective for document classification. However, NBEM does not utilize the difference between a source domain and a target domain. We combine NBEM with the feature weighted method for domain adaptation, referred to as “self-training feature weight” (STFW). Our proposed method alternately uses NBEM and STFW to gradually improve document classification precision for a target domain. This method significantly outperforms the conventional unsupervised methods for domain adaptation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
This equation is smoothed by considering the frequency 0.
2.
We set these values according to the result of preliminary experiment.
3.
http://qwone.com/~jason/20Newsgroups/.

References

Blitzer, J., McDonald, R., Pereira, F.: Domain adaptation with structural correspondence learning. In: EMNLP-2006, pp. 120–128 (2006)
Google Scholar
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp. 92–100 (1998)
Google Scholar
Chapelle, O., Schölkopf, B., Zien, A., et al.: Semi-supervised Learning, vol. 2. MIT Press, Cambridge (2006)
Book Google Scholar
Chen, M., Weinberger, K.Q., Blitzer, J.: Co-training for domain adaptation. In: NIPS, pp. 2456–2464 (2011)
Google Scholar
Dai, W., Xue, G.R., Yang, Q., Yu, Y.: Transferring naive Bayes classifiers for text classification. In: AAAI-2007 (2007)
Google Scholar
Daumé III, H.: Frustratingly easy domain adaptation. In: ACL-2007, pp. 256–263 (2007)
Google Scholar
Joachims, T.: Transductive inference for text classification using support vector machines. In: ICML, vol. 99, pp. 200–209 (1999)
Google Scholar
Kanamori, T., Hido, S., Sugiyama, M.: A least-squares approach to direct importance estimation. J. Mach. Learn. Res. 10, 1391–1445 (2009)
MathSciNet MATH Google Scholar
Nigam, K., McCallum, A., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39(2/3), 103–134 (2000)
Article MATH Google Scholar
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)
Article Google Scholar
Rosenstein, M.T., Marx, Z., Kaelbling, L.P., Dietterich, T.G.: To transfer or not to transfer. In: NIPS 2005 Workshop on Transfer Learning, vol. 898 (2005)
Google Scholar
Settles, B.: Active Learning Literature Survey. University of Wisconsin, Madison (2010)
MATH Google Scholar
Søgaard, A.: Semi-supervised Learning and Domain Adaptation in Natural Language Processing. Morgan & Claypool, San Rafael (2013)
Google Scholar
Sugiyama, M., Kawanabe, M.: Machine Learning in Non-stationary Environments: Introduction to Covariate Shift Adaptation. MIT Press, Cambridge (2011)
Google Scholar
Tan, S., Cheng, X., Wang, Y., Xu, H.: Adapting naive Bayes to domain adaptation for sentiment analysis. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 337–349. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00958-7_31
Chapter Google Scholar

Download references

Acknowledgment

The work reported in this article was supported by the NINJAL collaborative research project ‘Development of all-words WSD systems and construction of a correspondence table between WLSP and IJD by these systems.’

Author information

Authors and Affiliations

Ibaraki University, 4-12-1 Nakanarusawa, Hitachi, Ibaraki, 316-8511, Japan
Hiroyuki Shinnou, Kanako Komiya & Minoru Sasaki

Authors

Hiroyuki Shinnou
View author publications
You can also search for this author in PubMed Google Scholar
Kanako Komiya
View author publications
You can also search for this author in PubMed Google Scholar
Minoru Sasaki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hiroyuki Shinnou .

Editor information

Editors and Affiliations

Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan
Kôiti Hasida
Natural Language Processing Lab, University of Computer Studies, Yangon, Yangon, Myanmar
Win Pa Pa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shinnou, H., Komiya, K., Sasaki, M. (2018). Domain Adaptation for Document Classification by Alternately Using Semi-supervised Learning and Feature Weighted Learning. In: Hasida, K., Pa, W. (eds) Computational Linguistics. PACLING 2017. Communications in Computer and Information Science, vol 781. Springer, Singapore. https://doi.org/10.1007/978-981-10-8438-6_17

Download citation

DOI: https://doi.org/10.1007/978-981-10-8438-6_17
Published: 04 March 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8437-9
Online ISBN: 978-981-10-8438-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics