Abstract
Data shifting in machine learning problems violates the common assumption that the training and testing samples should be drawn from the same distribution. Most of the algorithms which provide the solution for data shifting problems first try to evaluate the distributions and then reweight samples based on their distributions. Due to the difficulty of evaluating a precise distribution, conventional methods cannot achieve good classification performance. In this paper, we introduce two types of data-shift problems and propose a model-based co-clustering transfer learning based solution which consistently deals with both scenarios of data shift. Experimental results demonstrate that our proposed method achieves better generalization and running efficiency compared to traditional methods under data or covariate shift setting.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Rebbapragada, U., Bue, B., Wozniak, P.R.: Time-domain surveys and data shift: case study at the intermediate palomar transient factory. In: American Astronomical Society Meeting Abstracts, vol. 225 (2015)
Sajobi, T.T., et al.: Identifying reprioritization response shift in a stroke caregiver population: a comparison of missing data methods. Qual. Life Res. 24(3), 529–540 (2015)
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
Quionero-Candela, J., et al.: Dataset Shift in Machine Learning. The MIT Press, Cambridge (2009)
Shimodaira, H.: Improving predictive inference under covariate shift by weighting the log-likelihood function. J. stat. plann. infer. 90(2), 227–244 (2000)
Liao, X., Xue, Y., Carin, L.: Logistic regression with an auxiliary data source. In: Proceedings of the 22nd International Conference on Machine learning. ACM (2005)
Rosenstein, M.T., et al.: To transfer or not to transfer. In: NIPS 2005 Workshop on Transfer Learning, vol. 898 (2005)
Dai, W., et al.: Boosting for transfer learning. In: Proceedings of the 24th International Conference on Machine Learning. ACM (2007)
Freund, Y., Schapire, R.E.: A desicion-theoretic generalization of on-line learning and an application to boosting. In: Vitányi, P. (ed.) EuroCOLT 1995. LNCS, vol. 904, pp. 23–37. Springer, Heidelberg (1995)
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory. ACM (1998)
Zadrozny, B.: Learning and evaluating classifiers under sample selection bias. In: Proceedings of the Twenty-First International Conference on Machine Learning. ACM (2004)
Huang, J., et al.: Correcting sample selection bias by unlabeled data. In: Advances in Neural Information Processing Systems (2006)
Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Ijcai, vol. 14, no. 2 (1995)
Sugiyama, M., et al.: Direct importance estimation with model selection and its application to covariate shift adaptation. In: Advances in Neural Information Processing Systems (2008)
Li, B., Yang, Q., Xue, X.: Transfer learning for collaborative filtering via a rating-matrix generative model. In: Proceedings of the 26th Annual International Conference on Machine Learning. ACM (2009)
Cleuziou, G.: An extended version of the k-means method for overlapping clustering. In: 19th International Conference on Pattern Recognition, ICPR 2008. IEEE (2008)
Park, Y.-J., Tuzhilin, A.: The long tail of recommender systems and how to leverage it. In: Proceedings of the 2008 ACM conference on Recommender systems. ACM (2008)
Hotho, A., Steffen, S., Stumme, G.: Ontologies improve text document clustering. In: Third IEEE International Conference on Data Mining, ICDM 2003. IEEE (2003)
Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoretic co-clustering. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Kumar, S., Gao, X., Welch, I. (2016). Learning Under Data Shift for Domain Adaptation: A Model-Based Co-clustering Transfer Learning Solution. In: Ohwada, H., Yoshida, K. (eds) Knowledge Management and Acquisition for Intelligent Systems . PKAW 2016. Lecture Notes in Computer Science(), vol 9806. Springer, Cham. https://doi.org/10.1007/978-3-319-42706-5_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-42706-5_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42705-8
Online ISBN: 978-3-319-42706-5
eBook Packages: Computer ScienceComputer Science (R0)