Learning Under Data Shift for Domain Adaptation: A Model-Based Co-clustering Transfer Learning Solution

Kumar, Santosh; Gao, Xiaoying; Welch, Ian

doi:10.1007/978-3-319-42706-5_4

Santosh Kumar¹⁵,
Xiaoying Gao¹⁵ &
Ian Welch¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9806))

Included in the following conference series:

Pacific Rim Knowledge Acquisition Workshop

1571 Accesses
5 Citations

Abstract

Data shifting in machine learning problems violates the common assumption that the training and testing samples should be drawn from the same distribution. Most of the algorithms which provide the solution for data shifting problems first try to evaluate the distributions and then reweight samples based on their distributions. Due to the difficulty of evaluating a precise distribution, conventional methods cannot achieve good classification performance. In this paper, we introduce two types of data-shift problems and propose a model-based co-clustering transfer learning based solution which consistently deals with both scenarios of data shift. Experimental results demonstrate that our proposed method achieves better generalization and running efficiency compared to traditional methods under data or covariate shift setting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Rebbapragada, U., Bue, B., Wozniak, P.R.: Time-domain surveys and data shift: case study at the intermediate palomar transient factory. In: American Astronomical Society Meeting Abstracts, vol. 225 (2015)
Google Scholar
Sajobi, T.T., et al.: Identifying reprioritization response shift in a stroke caregiver population: a comparison of missing data methods. Qual. Life Res. 24(3), 529–540 (2015)
Article Google Scholar
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)
Article Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
Quionero-Candela, J., et al.: Dataset Shift in Machine Learning. The MIT Press, Cambridge (2009)
Google Scholar
Shimodaira, H.: Improving predictive inference under covariate shift by weighting the log-likelihood function. J. stat. plann. infer. 90(2), 227–244 (2000)
Article MathSciNet MATH Google Scholar
Liao, X., Xue, Y., Carin, L.: Logistic regression with an auxiliary data source. In: Proceedings of the 22nd International Conference on Machine learning. ACM (2005)
Google Scholar
Rosenstein, M.T., et al.: To transfer or not to transfer. In: NIPS 2005 Workshop on Transfer Learning, vol. 898 (2005)
Google Scholar
Dai, W., et al.: Boosting for transfer learning. In: Proceedings of the 24th International Conference on Machine Learning. ACM (2007)
Google Scholar
Freund, Y., Schapire, R.E.: A desicion-theoretic generalization of on-line learning and an application to boosting. In: Vitányi, P. (ed.) EuroCOLT 1995. LNCS, vol. 904, pp. 23–37. Springer, Heidelberg (1995)
Chapter Google Scholar
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory. ACM (1998)
Google Scholar
Zadrozny, B.: Learning and evaluating classifiers under sample selection bias. In: Proceedings of the Twenty-First International Conference on Machine Learning. ACM (2004)
Google Scholar
Huang, J., et al.: Correcting sample selection bias by unlabeled data. In: Advances in Neural Information Processing Systems (2006)
Google Scholar
Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Ijcai, vol. 14, no. 2 (1995)
Google Scholar
Sugiyama, M., et al.: Direct importance estimation with model selection and its application to covariate shift adaptation. In: Advances in Neural Information Processing Systems (2008)
Google Scholar
Li, B., Yang, Q., Xue, X.: Transfer learning for collaborative filtering via a rating-matrix generative model. In: Proceedings of the 26th Annual International Conference on Machine Learning. ACM (2009)
Google Scholar
Cleuziou, G.: An extended version of the k-means method for overlapping clustering. In: 19th International Conference on Pattern Recognition, ICPR 2008. IEEE (2008)
Google Scholar
Park, Y.-J., Tuzhilin, A.: The long tail of recommender systems and how to leverage it. In: Proceedings of the 2008 ACM conference on Recommender systems. ACM (2008)
Google Scholar
Hotho, A., Steffen, S., Stumme, G.: Ontologies improve text document clustering. In: Third IEEE International Conference on Data Mining, ICDM 2003. IEEE (2003)
Google Scholar
Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoretic co-clustering. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Engineering and Computer Science, Victoria University of Wellington, Wellington, New Zealand
Santosh Kumar, Xiaoying Gao & Ian Welch

Authors

Santosh Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoying Gao
View author publications
You can also search for this author in PubMed Google Scholar
Ian Welch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoying Gao .

Editor information

Editors and Affiliations

Tokyo University of Science , Noda, Japan
Hayato Ohwada
University of Tsukuba, Tokyo, Japan
Kenichi Yoshida

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kumar, S., Gao, X., Welch, I. (2016). Learning Under Data Shift for Domain Adaptation: A Model-Based Co-clustering Transfer Learning Solution. In: Ohwada, H., Yoshida, K. (eds) Knowledge Management and Acquisition for Intelligent Systems . PKAW 2016. Lecture Notes in Computer Science(), vol 9806. Springer, Cham. https://doi.org/10.1007/978-3-319-42706-5_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-42706-5_4
Published: 07 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42705-8
Online ISBN: 978-3-319-42706-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics