Skip to main content

Cross-Domain Knowledge Transfer Using Semi-supervised Classification

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5360))

Abstract

Traditional text classification algorithms are based on a basic assumption: the training and test data should hold the same distribution. However, this identical distribution assumption is always violated in real applications. Due to the distribution of test data from target domain and the distribution of training data from auxiliary domain are different, we call this classification problem cross-domain classification. Although most of the training data are drawn from auxiliary domain, we still can obtain a few training data drawn from target domain. To solve the cross-domain classification problem in this situation, we propose a two-stage algorithm which is based on semi-supervised classification. We firstly utilizes labeled data in target domain to filter the support vectors of the auxiliary domain, then uses filtered data and labeled data from target domain to construct a classifier for the target domain. The experimental evaluation on real-world text classification problems demonstrates encouraging results and validates our approach.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Caruana, R.: Multitask learning. Machine Learning 28(1), 41–75 (1997)

    Article  MathSciNet  Google Scholar 

  2. Thrun, S.: Is Learning the n-th thing any easier than learning the first? In: Advances in Neural Information Processing System (NIPS), pp. 640–646 (1996)

    Google Scholar 

  3. Baxter, J.: A Baysian/Information theoretic model of learning to learn via multiple task sampling. Machine Learning 28(1), 7–39 (1997)

    Article  MATH  Google Scholar 

  4. Thrun, S., Mitchell, T.: Learning one more thing. In: Proceedings of 14th International Joint Conference on Artificial Intelligence (IJCAI 1995), pp. 1217–1223 (1995)

    Google Scholar 

  5. Schmidhuber, J.: On Learning How to Learn Learning Strategies. Technical Report FKI-198-94, Fakultat fur Informatik (1994)

    Google Scholar 

  6. Ben-David, S., Schuller, R.: Exploiting task relatedness for multiple task learning. In: Proceedings of 16th Annual Conference on Learning Theory (COLT 2003), pp. 567–580 (2003)

    Google Scholar 

  7. Ando, R., Zhang, T.: A Algorithm for Learning Predictive Structures from Multiple Tasks and Unlabeled Data. Journal of Machine Learning Research (JMLR) 6, 1817–1853 (2005)

    MathSciNet  MATH  Google Scholar 

  8. Zadrozy, B.: Learning and Evaluating Classifiers under Sample Selection Bias. In: Proceedings of 21th International Conference on Machine Learning (ICML 2004), pp. 114–121 (2004)

    Google Scholar 

  9. Elkan, C.: The Foundations of Cost-sensitive Learning. In: Proceedings of 17th International Joint Conference on Artificial Intelligence (IJCAI 2001), pp. 239–246 (2001)

    Google Scholar 

  10. Widmer, G., Kubat, M.: Learning in the Presence of Concept Drift and Hidden Contexts. Machine Learning 23(1), 69–101 (1996)

    Google Scholar 

  11. Zhu, X.: Semi-Supervised Learning Literature Survey (2007), http://pages.cs.wisc.edu/j~erryzhu/research/ssl/semireview.html

  12. Joachims, T.: Transductive Inference for Text Classification using Support Vector Machines. In: Proceedings of the 16th International Conference on Machine Learning (ICML 1999), pp. 200–209 (1999)

    Google Scholar 

  13. Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions. In: Proceedings of the 20th International Conference on Machine Learning (ICML 2003), pp. 912–919 (2003)

    Google Scholar 

  14. Zhou, D., Bousquet, O., Lal, T., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: Advances in Neural Information Processing System (NIPS), vol. 16, pp. 321–328 (2004)

    Google Scholar 

  15. Nigam, K., Mccallum, A., Thrun, S., Mitchell, T.: Text Classification from Labeled and Unlabeled Documents using EM. Machine Learning 39(2-3), 103–134 (2000)

    Article  MATH  Google Scholar 

  16. Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory (COLT 1998), pp. 92–100 (1998)

    Google Scholar 

  17. Dai, W., Xue, G., Yang, Q., Yu, Y.: Co-clustering based Classification for Out-of-domain Documnets. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2007), pp. 210–219 (2007)

    Google Scholar 

  18. Dai, W., Xue, G., Yang, Q., Yu, Y.: Transfering Naive Bayes Classifiers for Text CLassification. In: Proceedings of Association for the Advancement of Artificial Intelligence (AAAI 2007), pp. 540–545 (2007)

    Google Scholar 

  19. Dai, W., Yang, Q., Xue, G., Yu, Y.: Boosting for Transfer Learning. In: Proceedings of 24th International Conference on Machine Learning (ICML 2007), pp. 193–200 (2007)

    Google Scholar 

  20. McCallum, A.: Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering (1996), http://www.cs.cmu.edu/~mccallum/bow

  21. Joachims, T.: Making large-Scale SVM Learning Practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT Press, Cambridge (1999)

    Google Scholar 

  22. Kullback, S., Leibler, R.: On Information and sufficiency. Annals of Mathematical Statistics 22(1), 79–86 (1951)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhen, Y., Li, C. (2008). Cross-Domain Knowledge Transfer Using Semi-supervised Classification. In: Wobcke, W., Zhang, M. (eds) AI 2008: Advances in Artificial Intelligence. AI 2008. Lecture Notes in Computer Science(), vol 5360. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89378-3_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-89378-3_36

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-89377-6

  • Online ISBN: 978-3-540-89378-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics