Domain adaptation–can quantity compensate for quality?

Ben-David, Shai; Urner, Ruth

doi:10.1007/s10472-013-9371-9

Domain adaptation–can quantity compensate for quality?

Published: 18 October 2013

Volume 70, pages 185–202, (2014)
Cite this article

Annals of Mathematics and Artificial Intelligence Aims and scope Submit manuscript

Shai Ben-David¹ &
Ruth Urner¹

608 Accesses
22 Citations
Explore all metrics

Abstract

The Domain Adaptation problem in machine learning occurs when the distribution generating the test data differs from the one that generates the training data. A common approach to this issue is to train a standard learner for the learning task with the available training sample (generated by a distribution that is different from the test distribution). One can view such learning as learning from a not-perfectly-representative training sample. The question we focus on is under which circumstances large sizes of such training samples can guarantee that the learned classifier preforms just as well as one learned from target generated samples. In other words, are there circumstances in which quantity can compensate for quality (of the training data)? We give a positive answer, showing that this is possible when using a Nearest Neighbor algorithm. We show this under some assumptions about the relationship between the training and the target data distributions (the assumptions of covariate shift as well as a bound on the ratio of certain probability weights between the source (training) and target (test) distribution). We further show that in a slightly different learning model, when one imposes restrictions on the nature of the learned classifier, these assumptions are not always sufficient to allow such a replacement of the training sample: For proper learning, where the output classifier has to come from a predefined class, we prove that any learner needs access to data generated from the target distribution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Ben-David, S., and Urner, R.: On the hardness of domain adaptation and the utility of unlabeled target samples. In: ALT, pp. 139–153 (2012)
Ben-David, S., Blitzer, J., Crammer, K., Pereira, F.: Analysis of representations for domain adaptation. In: NIPS, pp. 137–144 (2006)
Cortes, C., Mansour, Y., Mohri, M.: Learning bounds for importance weighting. In: Lafferty, J., Williams, C.K.I., Shawe-Taylor, J., Zemel, R., Culotta, A. (eds.) Advances in Neural Information Processing Systems, vol. 23, pp. 442–450 (2010)
Daumé III, H., Jagarlamudi, J.: Domain adaptation for machine translation by mining unseen words. In: Association for Computational Linguistics (2011)
Gong, B., Shi, Y., Sha, F., Grauman, K: Geodesic flow kernel for unsupervised domain adaptation. In: CVPR, pp. 2066–2073 (2012)
Haussler, D., Welzl, E.: Epsilon-nets and simplex range queries. In: Proceedings of the Second Annual Symposium on Computational Geometry, SCG ’86, pp. 61–71. New York, NY, USA, ACM (1986)
Huang, J., Gretton, A., Schölkopf, B., Smola, A.J., Borgwardt, K.M.: Correcting sample selection bias by unlabeled data. In: NIPS. MIT Press, Cambridge (2007)
Google Scholar
Kifer, D., Ben-David, S., Gehrke, J.: Detecting change in data streams. In: VLDB, pp. 180–191 (2004)
Mansour, Y., Mohri, M., Rostamizadeh, A.: Domain adaptation: Learning bounds and algorithms. In: COLT (2009)
Shalev-Shwartz, S., Ben-David, S.: Understanding machine learning. Cambridge University Press (2014, in press)
Steinwart, I., Scovel, C.: Fast rates for support vector machines. Ann. Statist. 35(2), 575–607 (2007)
Article MATH MathSciNet Google Scholar
Sugiyama, M., Mueller, K.: Generalization error estimation under covariate shift. In: Workshop on Information-Based Induction Sciences (2005)
Urner, R., Ben-David, S., Shalev-Shwartz, S.: Supplementay material to: Unlabeled data can speed-up prediction time. http://www.cs.uwaterloo.ca/~rurner/SSLSupplementICML2011.pdf (2011)
Urner, R., Ben-David, S., Shalev-Shwartz, S.: Unlabeled data can speed up prediction time. In: ICML (2011)

Download references

Author information

Authors and Affiliations

D. R. Cheriton School of Computer Science, University of Waterloo, Waterloo, ON, N2L 3G, Canada
Shai Ben-David & Ruth Urner

Authors

Shai Ben-David
View author publications
You can also search for this author in PubMed Google Scholar
Ruth Urner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruth Urner.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ben-David, S., Urner, R. Domain adaptation–can quantity compensate for quality?. Ann Math Artif Intell 70, 185–202 (2014). https://doi.org/10.1007/s10472-013-9371-9

Download citation

Published: 18 October 2013
Issue Date: March 2014
DOI: https://doi.org/10.1007/s10472-013-9371-9

Keywords

Mathematics Subject Classification (2010)

68Q32

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Domain adaptation–can quantity compensate for quality?

Abstract

Access this article

Similar content being viewed by others

On the analysis of adaptability in multi-source domain adaptation

Boosting for Unsupervised Domain Adaptation

Domain Adaptation with a Domain Specific Class Means Classifier

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2010)

Navigation

Domain adaptation–can quantity compensate for quality?

Abstract

Access this article

Similar content being viewed by others

On the analysis of adaptability in multi-source domain adaptation

Boosting for Unsupervised Domain Adaptation

Domain Adaptation with a Domain Specific Class Means Classifier

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation