Abstract
In this paper, we propose a general framework for transfer learning, referred to as transfer sparse subspace learning (TSSL). This framework is suitable for different assumptions on the divergence measures of the data distributions, such as maximum mean discrepancy, Bregman divergence, and K–L divergence. We introduce an effective sparse regularization to the proposed transfer subspace learning framework, which can reduce time and space cost obviously, and more importantly, which can avoid or at least reduce over-fitting problem. We give different solutions to the problems based on different distribution distance estimation criteria, and convergence analysis is also given. Comprehensive experiments on the text data sets and the face image data sets demonstrate that TSSL-based methods outperform existing transfer learning methods.




Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Yan S, Xu D, Zhang B, Zhang H, Yang Q, Lin S (2007) Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Trans Pattern Anal Mach Intell 29(1):40–51
Belhumeur P, Hespanha J, Kriegman D (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720
Li H, Jiang T, Zhang K (2006) Effective and robust feature extraction by maximum margin criterion. IEEE Trans Neural Netw 17(1):157–165
He X, Niyogi P (2003) Locality preserving projections. In: Proceedings of the annual conference on advances in neural information processing systems (NIPS-03)
Zhang Y, d’Aspremont A, Ghaoui L (2010) Sparse PCA: convex relaxations, algorithms and applications, handbook on semidefinite, cone and polynomial optimization
Zou H, Hastie T, Tibshirani R (2004) Sparse principle component analysis. Technical report, Statistics Department, Stanford University
Moghaddam B, Weiss Y, Avidan S (2005) Spectral bounds for sparse PCA: exact and greedy algorithms. In: Proceedings of the annual conference on advances in neural information processing systems (NIPS-05)
Moghaddam B, Weiss Y, Avidan S (2006) Generalized spectral bounds for sparse LDA. In: Proceedings of the 23rd international conference on Machine learning (ICML-06), pp 641–648
Cai D, He X, Han J (2007) Spectral regression: a unified approach for sparse subspace learning. In: Proceedings of 2007 international conference on data mining (ICDM-07), Omaha
Caruana R (1997) Multitask learning. Mach Learn 28(1):41–75
Tikhonov AN (1963) Regularization of incorrectly posed problems. Soviet Math Dokl 4:1624–1627
Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7:2399–2434
Ando RK, Zhang T (2006) Learning on graph with Laplacian regularization, advances in neural information processing systems (NIPS-06), vol 19. MIT Press, Cambridge, pp 25–33
Bradley P, Mangasarian O (1998) Feature selection via concave minimization and support vector machines. In: Proceedings of the 15th international conference on machine learning (ICML-98)
Wang L, Zhu J, Zou H (2007) Hybrid huberized support vector machines for microarray classification. In: Proceedings of the 24th international conference on machine learning (ICML-07)
Obozinski G, Taskar B, Jordan M (2006) Multi-task feature selection. Technical report, Department of Statistics, University of California, Berkeley
Argyriou A, Evgeniou T, Pontil M (2007) Multi-task feature learning. In: Proceedings of the annual conference on advances in neural information processing systems (NIPS-07), pp 41–48
Gu Q, Li Z, Han J (2011) Joint feature selection and subspace learning. In: The 22nd international joint conference on artificial intelligence (IJCAI-11), Barcelona
Ding C, Zhou D, He X, Zha H (2006) R1-PCA: rotational invariant l1-norm principal component analysis for robust subspace factorization. In: Proceedings of the 23rd international conference on machine learning (ICML-06)
Liu J, Ji S, Ye J (2009) Multi-task feature learning via effective L2,1-norm minimization. In: The conference on uncertainty in artificial intelligence (UAI-09)
Nie F, Huang H, Cai X, Ding C (2010) Effective and robust feature selection via joint l2,1-norms minimization. In: Proceedings of the annual conference on advances in neural information processing systems (NIPS-10)
Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Sugiyama M, Nakajima S, Kashima H, Buenau PV, Kawanabe M (2008) Direct importance estimation with model selection and its application to covariate shift adaptation. In: Proceedings of the 20th annual conference on neural information processing systems (NIPS-08), Vancouver
Dai W, Yang Q, Xue G, Yu Y (2007) Boosting for transfer learning. In Proceedings of the 24th international conference on machine learning (ICML-07), New York, pp 193–200
Eaton E, desJardins M (2009) Set-based boosting for instance level transfer. In Proceedings of the 2009 IEEE international conference on data mining workshops (ICDMW-09), Washington, pp 422–428
Pardoe D, Stone P (2010) Boosting for regression transfer. In: Proceedings of the 27th international conference on Machine learning (ICML-10), pp 863–870
Yao Y, Doretto G (2010) Boosting for transfer learning with multiple sources. In: The 24th IEEE conference on computer vision and pattern recognition (CVPR-10), pp 1855–1862
Lawrence ND, Platt JC (2004) Learning to learn with the informative vector machine. In: Proceedings of the 21st international conference on machine learning (ICML-04). ACM, Banff
Tong B, Gao J, Thach N, Suzuki E (2011) Gaussian process for dimensionality reduction in transfer learning. In: Proceedings of the 11th SIAM international conference on data mining (SDM-11), pp 783–794
Gao X, Wang X, Li X, Tao D (2011) Transfer latent variable model based on divergence analysis. Pattern Recogn 44(10–11):2358–2366
Mihalkova L, Mooney RJ (2008) Transfer learning by mapping with minimal target data. In: Proceedings of the AAAI-2008 workshop on transfer learning for complex tasks, Chicago
Davis J, Domingos P (2008) Deep transfer via second-order markov logic. In: Proceedings of the AAAI-2008 workshop on transfer learning for complex tasks, Chicago
Arnold A, Nallapati R, Cohen W (2007) A comparative study of methods for transductive transfer learning. In: Proceedings of the seventh IEEE international conference on data mining workshops (ICDMW-07), Washington, pp 77–82
Daum′e H III (2007) Frustratingly easy domain adaptation. The association for computational linguistics (ACL-2007)
Blitzer J, Dredze M, Pereira F. Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. In: Association for computational linguistics, Prague
Blitzer J, McDonald R, Pereira F (2006) Domain adaptation with structural correspondence learning. In: Proceedings of the 2006 conference on empirical methods in natural language processing (EMNLP-06), Association for Computational Linguistics, Stroudsburg, pp 120–128
Krupka E, Tishby N (2007) Incorporating prior knowledge on features into learning. In: Proceedings of the 11th international conference on artificial intelligence and statistics, San Juan
Satpal S, Sarawagi S (2007) Domain adaptation of conditional probability models via feature subsetting. In: Proceedings of the 11th European conference on principles and practice of knowledge discovery in databases (PKDD-2007), Berlin, pp 224–235
Tu W, Sun S (2011) Transferable discriminative dimensionality reduction. In: Proceedings of the ICTAI, pp 865–868
Tu W, Sun S (2012) Subject transfer framework for EEG classification. Neurocomputing 82:109–116
Pan SJ, Kwok JT, Yang Q (2008) Transfer learning via dimensionality reduction. In: Proceedings of the 23rd AAAI conference on artificial intelligence, Chicago (AAAI-08), Illinois, pp 677–682
Pan SJ, Tsang IW, Kwok JT, Yang Q (2009) Domain adaptation via transfer component analysis. In: Proceedings of the 21st international joint conference on artificial intelligence (IJCAI-09), Pasadena
Borgwardt K, Gretton A, Rasch M, Kriegel H, Schölkopf B, Smola A. Integrating structured biological data by kernel maximum mean discrepancy. In: Proceedings of the 14th international conference on intelligent systems for molecular biology, pp 49–57
Steinwart I (2001) On the influence of the kernel on the consistency of support vector machines. J Mach Learn Res 2:67–93
Quanz B, HuanJ, Mishra M (2011) Knowledge transfer with low-quality data: a feature extraction issue. In: Proceedings of the IEEE international conference on data engineering (ICDE-11), Hannover
Quanz B, Huan J (2009) Large Margin Transductive Transfer Learning. In: Proceedings of the 18th ACM conference on information and knowledge management (CIKM-09), Hong Kong, pp 1327–1336
Ren J, Liang Z, Hu S (2010) Multiple kernel learning improved by MMD. ADMA (2):63–74
Zhang Z, Zhou J (2012) Multi-task clustering via domain adaptation. Pattern Recogn 45(1):465–473
Uguroglu S, Carbonell J (2011) Feature selection for transfer learning. ECML/PKDD 3:430–442
Duan L, Tsang I, Xu D (2012) Domain transfer multiple kernel learning. IEEE Trans Pattern Anal Mach Intell 34(3):465–479
Bregman L (1967) The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. USSR Comput Mathe Mathe Phys 7:200–217
Zhang J, Zhang C (2010) Multitask Bregman clustering. In: Proceedings of the 25th AAAI conference on artificial intelligence, Chicago (AAAI-10), pp 655–660
Si S, Tao D, Geng B (2010) Bregman divergence-based regularization for transfer subspace learning. IEEE Trans Knowl Data Eng 22(7):929–942
Si S, Tao D, Chan K (2010) Evolutionary cross-domain discriminative hessian eigenmaps. IEEE Trans Image Process 19(4):1075–1086
Si S, Tao D, Wang M, Chan K (2012) Social image annotation via cross-domain subspace learning. Multimed Tools Appl 56(1):91–108
Wu L, Hoi S, Jin R, Zhu J, Yu N (2012) Learning Bregman distance functions for semi-supervised clustering. IEEE Trans Knowl Data Eng 24(3):478–491
Gao X, Wang X, Li X, Tao D (2011) Transfer latent variable model based on divergence analysis. Pattern Recogn 44(10–11):2358–2366
Zhang J, Zhang C (2011) Multitask Bregman clustering. Neurocomputing 74(10):1720–1734
Schölkopf B, Smola A, Müller K-R (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5):1299–1319
Parzen E (1962) On estimation of a probability density function and mode. Ann Math Stat 33(3):1065–1076
Dai W, Yang Q, Xue G-R, Yu Y (2009) EigenTransfer: a unified framework for transfer learning. In: Proceedings of the 26th international conference on machine learning (ICML-09)
Phillips JP, Moon H, Rizvi SA, Rauss PJ (2000) The FERET evaluation methodology for face-recognition algorithms. IEEE Trans Pattern Anal Mach Intell 22(10):1090–1104
Acknowledgments
We would like to thank Sinno Jialin Pan and Sisi for providing the code of transfer component analysis and transfer subspace learning. We would like to express our appreciations to the editors and reviewers for their contributions in improving the quality of our paper. We gratefully acknowledge the supports from National Natural Science Foundation of China, under Grant No. 60975038 and Grant No. 61005003.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yang, S., Lin, M., Hou, C. et al. A general framework for transfer sparse subspace learning. Neural Comput & Applic 21, 1801–1817 (2012). https://doi.org/10.1007/s00521-012-1084-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-012-1084-1