Learning Algorithms for Domain Adaptation

Pathak, Manas A.; Nyberg, Eric H.

doi:10.1007/978-3-642-05224-8_23

Learning Algorithms for Domain Adaptation

Manas A. Pathak²¹ &
Eric H. Nyberg²¹

Conference paper

2274 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5828))

Abstract

A fundamental assumption for any machine learning task is to have training and test data instances drawn from the same distribution while having a sufficiently large number of training instances. In many practical settings, this ideal assumption is invalidated as the labeled training instances are scarce and there is a high cost associated with labeling them. On the other hand, we might have access to plenty of labeled data from a different domain, which can provide useful information for the present domain. In this paper, we discuss adaptive learning techniques to address this specific problem: learning with little training data from the same distribution along with a large pool of data from a different distribution. An underlying theme of our work is to identify situations when the auxiliary data is likely to help in training with the primary data. We propose two algorithms for the domain adaptation task: dataset reweighting and subset selection. We present theoretical analysis of behavior of the algorithms based on the concept of domain similarity, which we use to formulate error bounds for our algorithms. We also present an experimental evaluation of our techniques on data from a real world question answering system.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of english: The penn treebank. Computational Linguistics 19(2), 313–330 (1994)
Google Scholar
Crammer, K., Kearns, M.J., Wortman, J.: Learning from multiple sources. In: Advances in Neural Information Processing Systems 19, pp. 321–328 (2006)
Google Scholar
Wu, P., Dietterich, T.: Improving svm accuracy by training on auxiliary data sources. In: Proceedings of the 21st International Conference on Machine Learning, pp. 871–878 (2004)
Google Scholar
Liao, X., Xue, Y., Carin, L.: Logistic regression with an auxiliary data source. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 505–512 (2005)
Google Scholar
Chelba, C., Acero, A.: Adaptation of maximum entropy capitalizer: Little data can help a lot. In: Proceedings of EMNLP 2004, pp. 285–292 (2004)
Google Scholar
Bickel, S., Bruckner, M., Scheffer, T.: Discrimminative learning for differing training and test distributions. In: Proceedings of the 24th International Conference on Machine Learning, pp. 81–88 (2007)
Google Scholar
Jiang, J., Zhai, C.: Instance weighting for domain adaptation in nlp. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 264–271 (2007)
Google Scholar
Ben-david, S., Blitzer, J., Crammer, K., Pereira, F.: Analysis of representations for domain adaptation. In: NIPS (2006)
Google Scholar
Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., Wortman, J.: Learning bounds for domain adaptation. In: Advances in Neural Information Processing Systems 20, pp. 129–136 (2007)
Google Scholar
Kifer, D., Ben-David, S., Gehrke, J.: Detecting change in data streams. In: Proceedings of the Thirtieth international conference on very large databases, pp. 180–191 (2004)
Google Scholar
Devroye, L., Gyorfi, L., Lugosi, G.: A probabilistic theory of pattern recognition, pp. 271–272. Springer, Heidelberg (1996)
MATH Google Scholar
Voorhees, E.M., Harman, D.: Overview of the eighth text retrieval conference (trec-8). In: Proceedings of the Eighth Text REtrieval Conference, TREC-8 (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Language Technologies Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
Manas A. Pathak & Eric H. Nyberg

Authors

Manas A. Pathak
View author publications
You can also search for this author in PubMed Google Scholar
Eric H. Nyberg
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Laboratory for Novel Software Technology, Nanjing University, 22 Hankou Road, 210093, Nanjing, China
Zhi-Hua Zhou
The Institute of Scientific and Industrial Research, Osaka University, 8-1 Mihogaoka, 567, Osaka, Ibaraki, Japan
Takashi Washio

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pathak, M.A., Nyberg, E.H. (2009). Learning Algorithms for Domain Adaptation. In: Zhou, ZH., Washio, T. (eds) Advances in Machine Learning. ACML 2009. Lecture Notes in Computer Science(), vol 5828. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-05224-8_23

Download citation

DOI: https://doi.org/10.1007/978-3-642-05224-8_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-05223-1
Online ISBN: 978-3-642-05224-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics