Abstract
Joint modeling of related data sources has the potential to improve various data mining tasks such as transfer learning, multitask clustering, information retrieval etc. However, diversity among various data sources might outweigh the advantages of the joint modeling, and thus may result in performance degradations. To this end, we propose a regularized shared subspace learning framework, which can exploit the mutual strengths of related data sources while being immune to the effects of the variabilities of each source. This is achieved by further imposing a mutual orthogonality constraint on the constituent subspaces which segregates the common patterns from the source specific patterns, and thus, avoids performance degradations. Our approach is rooted in nonnegative matrix factorization and extends it further to enable joint analysis of related data sources. Experiments performed using three real world data sets for both retrieval and clustering applications demonstrate the benefits of regularization and validate the effectiveness of the model. Our proposed solution provides a formal framework appropriate for jointly analyzing related data sources and therefore, it is applicable to a wider context in data mining.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agarwal A, Daumé H III, Gerber S (2010) Learning multiple tasks using manifold regularization. In: Advances in neural information processing systems, vol 23, pp 46–54
Ando R, Zhang T (2005) A framework for learning predictive structures from multiple tasks and unlabeled data. J Mach Learn Res 6: 1817–1853
Bae E, Bailey J (2006) Coala: a novel approach for the extraction of an alternate clustering of high quality and high dissimilarity. In: IEEE international conference on data mining, pp 53–62
Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley, Reading, MA
Baxter J (2000) A model of inductive bias learning. J Artif Intell Res 12: 149–198
Ben-David S, Schuller R (2003) Exploiting task relatedness for multiple task learning. In: 16th annual conference on computational learning theory, vol 2777, pp 567–580
Berry M, Browne M (2005) Email surveillance using non-negative matrix factorization. Comput Math Organ Theory 11(3): 249–264
Berry M, Browne M, Langville A, Pauca V, Plemmons R (2007) Algorithms and applications for approximate nonnegative matrix factorization. Comput Stat Data Anal 52(1): 155–173
Bickel S, Scheffer T (2004) Multi-view clustering. In: Proceedings of the IEEE international conference on data mining, pp 19–26
Blei D, Ng A, Jordan M (2003) Latent Dirichlet allocation. J Mach Learn Res 3: 993–1022
Boutsidis C, Gallopoulos E (2008) SVD based initialization: a head start for nonnegative matrix factorization. Pattern Recogn 41(4): 1350–1362
Bucak S, Gunsel B (2007) Video content representation by incremental non-negative matrix factorization. ICIP 2: 113–116
Cai D, He X, Han J (2007) Semi-supervised discriminant analysis. In: International conference on computer vision, pp 1–7
Cai D, He X, Wu X, Han J (2008) Non-negative matrix factorization on manifold. IEEE international conference on data mining, pp 63–72
Cai D, He X, Han J, Huang T (2011) Graph regularized non-negative matrix factorization for data representation. IEEE Trans Pattern Anal Mach Intell 33(8): 1548–1560
Caruana R (1997) Multitask learning. Mach Learn 28(1): 41–75
Chaudhuri K, Kakade S, Livescu K, Sridharan K (2009) Multi-view clustering via canonical correlation analysis. In: Proceedings of the 26th international conference on machine learning, pp 129–136
Choi S (2008) Algorithms for orthogonal nonnegative matrix factorization. In: Proceedings of the international joint conference on neural networks, pp 1828–1832
Cui Y, Fern X, Dy J (2007) Non-redundant multi-view clustering via orthogonalization. In: IEEE international conference on data mining. IEEE, pp 133–142
Dai W, Jin O, Xue G, Yang Q, Yu Y (2009) Eigentransfer: a unified framework for transfer learning. ICML, pp 193–200
Dempster A, Laird N, Rubin D et al (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodol) 39(1): 1–38
Ding C, Li T, Peng W, Park H (2006) Orthogonal nonnegative matrix tri-factorizations for clustering. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 126–135
Duda R, Hart P, Stork D (2001) Pattern classification, vol 2. Wiley-Interscience, New York
Gu Q, Zhou J (2009a) Learning the shared subspace for multi-task clustering and transductive transfer classification. In: IEEE international conference on data mining, pp 159–168
Gu Q, Zhou J (2009b) Local learning regularized nonnegative matrix factorization. In: Proceedings of the 21st international joint conference on artificial intelligence, pp 1046–1051
Gupta S, Phung D, Adams B, Tran T, Venkatesh S (2010) Nonnegative shared subspace learning and its application to social media retrieval. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1169–1178
Gupta S, Phung D, Adams B, Venkatesh S (2011a) A Bayesian framework for learning shared and individual subspaces from multiple data sources. In: Advances in knowledge discovery and data mining, 15th Pacific-Asia conference (PAKDD), pp 136–147
Gupta SK, Phung D, Adams B, Venkatesh S (2011b) A matrix factorization framework for jointly analyzing multiple nonnegative data sources. In: Proceedings of text mining workshop, in conjuction with SIAM international conference on data mining
Hardoon D, Szedmak S, Shawe-Taylor J (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12): 2639–2664
Ji S, Tang L, Yu S, Ye J (2008) Extracting shared subspace for multi-label classification. In: Proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 381–389
Jolliffe I (2002) Principle component analysis. Springer, Heidelberg
Kailing K, Kriegel H, Pryakhin A, Schubert M (2004) Clustering multi-represented objects with noise. In: Advances in knowledge discovery and data mining, 8th Pacific-Asia conference (PAKDD), pp 394–403
Kim H, Park H (2008) Non-negative matrix factorization based on alternating non-negativity constrained least squares and active set method. SIAM J Matrix Anal Appl 30(2): 713–730
Langville A, Meyer C, Albright R, Cox J, Duling D (2006) Initializations for the nonnegative matrix factorization. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining
Lee D, Seung H (2001) Algorithms for non-negative matrix factorization. Adv Neural Inform Process Syst 13: 556–562
Li T, Ma S, Ogihara M (2004) Document clustering via adaptive subspace iteration. In: Proceedings of the 27th international ACM SIGIR conference on research and development in information retrieval, pp 218–225
Lin C (2007) Projected gradient methods for nonnegative matrix factorization. Neural Comput 19(10): 2756–2779
Lin Y, Sundaram H, De Choudhury M, Kelliher A (2009) Temporal patterns in social media streams: theme discovery and evolution using joint analysis of content and context. In: IEEE international conference on multimedia and expo, pp 1456–1459
Lovász L, Plummer M (1986) Matching theory. Elsevier, Amsterdam
Manning C, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge
Mardia KV, Bibby JM, Kent JT (1979) Multivariate analysis. Academic, New York
Niu D, Dy J, Jordan M (2010) Multiple non-redundant spectral clustering views. In: Proceedings of the 27th international conference on machine learning, pp 831–838
Pan S, Yang Q (2008) A survey on transfer learning. Technical Report HKUST-CS08-08. Department of Computer Science and Engineering, HKUST, Hong Kong
Qi Z, Davidson I (2009) A principled and flexible framework for finding alternative clusterings. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 717–726
Rui Y, Huang T (2000) Optimizing learning in image retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition. Published by the IEEE Computer Society, pp 236–243
Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inform Process Manag 24(5): 513–523
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8): 888–905
Si S, Tao D, Geng B (2009) Bregman divergence based regularization for transfer subspace learning. IEEE Trans Knowl Data Eng 22(7): 929–942
Thrun S (1996) Is learning the n-th thing any easier than learning the first? In: Advances in neural information processing systems, pp 640–646
Wild S, Curry J, Dougherty A (2004) Improving non-negative matrix factorizations through structured initialization. Pattern Recogn 37(11): 2217–2232
Wiswedel B, Höppner F, Berthold M (2010) Learning in parallel universes. Data Min Knowl Disc 21(1): 130–152
Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th international ACM SIGIR conference on research and development in information retrieval. ACM, New York, NY, pp 267–273
Yan R, Tesic J, Smith J (2007) Model-shared subspace boosting for multi-label classification. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 834–843
Yang T, Jin R, Jain A, Zhou Y, Tong W (2010) Unsupervised transfer classification: application to text categorization. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1159–1168
Yu K, Zhu S, Lafferty J, Gong Y (2009) Fast nonparametric matrix factorization for large-scale collaborative filtering. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval, pp 211–218
Zhang J, Zhang C (2011) Multitask Bregman clustering. Neurocomputing 74(10): 1720–1734
Zhou D, Bousquet O, Lal T, Weston J, Schölkopf B (2004) Learning with local and global consistency. Adv Neural Inform Process Syst 16: 595–602
Zhuang F, Luo P, Shen Z, He Q, Xiong Y, Shi Z, Xiong H (2010) Collaborative dual-plsa: mining distinction and commonality across multiple domains for text classification. CIKM, pp 359–368
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Kristian Kersting.
Rights and permissions
About this article
Cite this article
Gupta, S.K., Phung, D., Adams, B. et al. Regularized nonnegative shared subspace learning. Data Min Knowl Disc 26, 57–97 (2013). https://doi.org/10.1007/s10618-011-0244-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-011-0244-8