Cross-Domain Knowledge Transfer Using Semi-supervised Classification

Zhen, Yi; Li, Chunping

doi:10.1007/978-3-540-89378-3_36

Cross-Domain Knowledge Transfer Using Semi-supervised Classification

Yi Zhen³ &
Chunping Li⁴

Conference paper

1865 Accesses
6 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5360))

Abstract

Traditional text classification algorithms are based on a basic assumption: the training and test data should hold the same distribution. However, this identical distribution assumption is always violated in real applications. Due to the distribution of test data from target domain and the distribution of training data from auxiliary domain are different, we call this classification problem cross-domain classification. Although most of the training data are drawn from auxiliary domain, we still can obtain a few training data drawn from target domain. To solve the cross-domain classification problem in this situation, we propose a two-stage algorithm which is based on semi-supervised classification. We firstly utilizes labeled data in target domain to filter the support vectors of the auxiliary domain, then uses filtered data and labeled data from target domain to construct a classifier for the target domain. The experimental evaluation on real-world text classification problems demonstrates encouraging results and validates our approach.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Caruana, R.: Multitask learning. Machine Learning 28(1), 41–75 (1997)
Article MathSciNet Google Scholar
Thrun, S.: Is Learning the n-th thing any easier than learning the first? In: Advances in Neural Information Processing System (NIPS), pp. 640–646 (1996)
Google Scholar
Baxter, J.: A Baysian/Information theoretic model of learning to learn via multiple task sampling. Machine Learning 28(1), 7–39 (1997)
Article MATH Google Scholar
Thrun, S., Mitchell, T.: Learning one more thing. In: Proceedings of 14th International Joint Conference on Artificial Intelligence (IJCAI 1995), pp. 1217–1223 (1995)
Google Scholar
Schmidhuber, J.: On Learning How to Learn Learning Strategies. Technical Report FKI-198-94, Fakultat fur Informatik (1994)
Google Scholar
Ben-David, S., Schuller, R.: Exploiting task relatedness for multiple task learning. In: Proceedings of 16th Annual Conference on Learning Theory (COLT 2003), pp. 567–580 (2003)
Google Scholar
Ando, R., Zhang, T.: A Algorithm for Learning Predictive Structures from Multiple Tasks and Unlabeled Data. Journal of Machine Learning Research (JMLR) 6, 1817–1853 (2005)
MathSciNet MATH Google Scholar
Zadrozy, B.: Learning and Evaluating Classifiers under Sample Selection Bias. In: Proceedings of 21th International Conference on Machine Learning (ICML 2004), pp. 114–121 (2004)
Google Scholar
Elkan, C.: The Foundations of Cost-sensitive Learning. In: Proceedings of 17th International Joint Conference on Artificial Intelligence (IJCAI 2001), pp. 239–246 (2001)
Google Scholar
Widmer, G., Kubat, M.: Learning in the Presence of Concept Drift and Hidden Contexts. Machine Learning 23(1), 69–101 (1996)
Google Scholar
Zhu, X.: Semi-Supervised Learning Literature Survey (2007), http://pages.cs.wisc.edu/j~erryzhu/research/ssl/semireview.html
Joachims, T.: Transductive Inference for Text Classification using Support Vector Machines. In: Proceedings of the 16th International Conference on Machine Learning (ICML 1999), pp. 200–209 (1999)
Google Scholar
Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions. In: Proceedings of the 20th International Conference on Machine Learning (ICML 2003), pp. 912–919 (2003)
Google Scholar
Zhou, D., Bousquet, O., Lal, T., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: Advances in Neural Information Processing System (NIPS), vol. 16, pp. 321–328 (2004)
Google Scholar
Nigam, K., Mccallum, A., Thrun, S., Mitchell, T.: Text Classification from Labeled and Unlabeled Documents using EM. Machine Learning 39(2-3), 103–134 (2000)
Article MATH Google Scholar
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory (COLT 1998), pp. 92–100 (1998)
Google Scholar
Dai, W., Xue, G., Yang, Q., Yu, Y.: Co-clustering based Classification for Out-of-domain Documnets. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2007), pp. 210–219 (2007)
Google Scholar
Dai, W., Xue, G., Yang, Q., Yu, Y.: Transfering Naive Bayes Classifiers for Text CLassification. In: Proceedings of Association for the Advancement of Artificial Intelligence (AAAI 2007), pp. 540–545 (2007)
Google Scholar
Dai, W., Yang, Q., Xue, G., Yu, Y.: Boosting for Transfer Learning. In: Proceedings of 24th International Conference on Machine Learning (ICML 2007), pp. 193–200 (2007)
Google Scholar
McCallum, A.: Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering (1996), http://www.cs.cmu.edu/~mccallum/bow
Joachims, T.: Making large-Scale SVM Learning Practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT Press, Cambridge (1999)
Google Scholar
Kullback, S., Leibler, R.: On Information and sufficiency. Annals of Mathematical Statistics 22(1), 79–86 (1951)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
Yi Zhen
School of Software, Tsinghua University, Beijing, 100084, China
Chunping Li

Authors

Yi Zhen
View author publications
You can also search for this author in PubMed Google Scholar
Chunping Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Wales, School of Computer Science and Engineering,, University of New South, NSW 2052, Sydney, Australia
Wayne Wobcke
School of Mathematics, Statistics and Computer Science, Victoria University of Wellington, P.O. Box 600, 6140, Wellington, New Zealand
Mengjie Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhen, Y., Li, C. (2008). Cross-Domain Knowledge Transfer Using Semi-supervised Classification. In: Wobcke, W., Zhang, M. (eds) AI 2008: Advances in Artificial Intelligence. AI 2008. Lecture Notes in Computer Science(), vol 5360. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89378-3_36

Download citation

DOI: https://doi.org/10.1007/978-3-540-89378-3_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89377-6
Online ISBN: 978-3-540-89378-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics