Abstract
Clustering high-dimensional data and making sense out of its result is a challenging problem. In this paper, we present a weakly supervised nonnegative matrix factorization (NMF) and its symmetric version that take into account various prior information via regularization in clustering applications. Unlike many other existing methods, the proposed weakly supervised NMF methods provide interpretable and flexible outputs by directly incorporating various forms of prior information. Furthermore, the proposed methods maintain a comparable computational complexity to the standard NMF under an alternating nonnegativity-constrained least squares framework. By using real-world data, we conduct quantitative analyses to compare our methods against other semi-supervised clustering methods. We also present the use cases where the proposed methods lead to semantically meaningful and accurate clustering results by properly utilizing user-driven prior information.
Similar content being viewed by others
Notes
MPCK-Means and PCK-Means: http://www.cs.utexas.edu/users/ml/risc/code/.
References
Aggarwal CC, Reddy CK (eds) (2013) Data clustering: algorithms and applications. Chapman and Hall/CRC Press, Boca Raton
Alqadah F, Bader JS, Anand R, Reddy CK (2012) Query-based biclustering using formal concept analysis. In: Proceedings of the SIAM international conference on data mining (SDM), pp 648–659
Basu S, Bilenko M, Mooney RJ (2004) A probabilistic framework for semi-supervised clustering. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 59–68
Basu S, Davidson I, Wagstaff K (2008) Constrained clustering: advances in algorithms, theory, and applications. Chapman & Hall/CRC Press, Boca Raton
Bertsekas DP (1999) Nonlinear programming, 2nd edn. Athena Scientific, Belmont
Bilenko M, Basu S, Mooney RJ (2004) Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of the international conference on machine learning (ICML), pp 81–88
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res (JMLR) 3:993–1022
Cai D, He X, Han J, Huang T (2011) Graph regularized nonnegative matrix factorization for data representation. IEEE Trans Pattern Anal Mach Intell (TPAMI) 33(8):1548–1560
Chakrabarti D, Kumar R, Tomkins A (2006) Evolutionary clustering. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 554–560
Chen Y, Rege M, Dong M, Hua J (2008) Non-negative matrix factorization for semi-supervised data clustering. Knowl Inf Syst (KAIS) 17:355–379
Chen Y, Wang L, Dong M (2010) Non-negative matrix factorization for semi-supervised heterogeneous data coclustering. IEEE Trans Knowl Data Eng (TKDE) 22(10):1459–1474
Chi Y, Song X, Zhou D, Hino K, Tseng BL (2009) On evolutionary spectral clustering. ACM Trans Knowl Discov Data (TKDD) 3:17:1–17:30
Choo J, Lee C, Reddy CK, Park H (2013) UTOPIAN: user-driven topic modeling based on interactive nonnegative matrix factorization. IEEE Trans Vis Comput Graph (TVCG) 19(12):1992–2001
Guan N, Tao D, Luo Z, Yuan B (2011) Manifold regularized discriminative nonnegative matrix factorization with fast gradient descent. IEEE Trans Image Process (TIP) 20(7):2030–2048
Gupta M, Gao J, Sun Y, Han J (2012) Integrating community matching and outlier detection for mining evolutionary community outliers. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 859–867
Hoyer PO (2004) Non-negative matrix factorization with sparseness constraints. J Mach Learn Res (JMLR) 5:1457–1469
Kim H, Park H (2007) Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics 23(12):1495–1502
Kim J, Park H (2008) Sparse nonnegative matrix factorization for clustering. Technical report, Georgia Institute of Technology
Kim J, Park H (2011) Fast nonnegative matrix factorization: an active-set-like method and comparisons. SIAM J Sci Comput 33(6):3261–3281
Kim J, He Y, Park H (2014) Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework. J Glob Optim 58(2):285–319
Kuang D, Ding C, Park H (2012) Symmetric nonnegative matrix factorization for graph clustering. In: Proceedings of the SIAM international conference on data mining (SDM), pp 106–117
Kuang D, Yun S, Park H (2014) SymNMF: nonnegative low-rank approximation of a similarity matrix for graph clustering. J Glob Optim (to appear)
Kulis B, Basu S, Dhillon I, Mooney R (2009) Semi-supervised graph clustering: a kernel approach. Mach Learn 74(1):1–22
Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791
Lee H, Yoo J, Choi S (2010) Semi-supervised nonnegative matrix factorization. IEEE Signal Process Lett 17(1):4–7
Li T, Ding C, Jordan M (2007) Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. In: Proceedings of the IEEE international conference on data mining (ICDM), pp 577–582
Lin CJ (2007) Projected gradient methods for nonnegative matrix factorization. Neural Comput 19(10):2756–2779
Liu H, Wu Z, Li X, Cai D, Huang T (2012) Constrained nonnegative matrix factorization for image representation. IEEE Trans Pattern Anal Mach Intell (TPAMI) 34(7):1299–1311
Liu Y, Jin R, Yang L (2006) Semi-supervised multi-label learning by constrained non-negative matrix factorization. In: Proceedings of the national conference on artificial intelligence, pp 421–426
Mimno D, McCallum A (2012) Topic models conditioned on arbitrary features with dirichlet-multinomial regression. arXiv:1206.3278
Shahnaz F, Berry MW, Pauca V, Plemmons RJ (2006) Document clustering using nonnegative matrix factorization. Inf Process Manag (IPM) 42(2):373–386
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell (TPAMI) 22(8):888–905
Wang C, Yan S, Zhang L, Zhang H (2009) Non-negative semi-supervised learning. In: Proceedings of the international conference on artificial intelligence and statistics (AISTATS), pp 575–582
Xie X, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans Pattern Anal Mach Intell (TPAMI) 13(8):841–847
Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. In: Proceedings of the ACM SIGIR international conference on research and development in informaion retrieval (SIGIR), pp 267–273
Zelnik-Manor L, Perona P (2004) Self-tuning spectral clustering. In: Advances in neural information processing system (NIPS), vol 17, pp 1601–1608
Zeng J, Cheung W, Li CH, Liu J (2009) Multirelational topic models. In: Proceedings of the IEEE international conference on data mining (ICDM), pp 1070–1075
Acknowledgments
The work of these authors was supported in part by NSF Grants CCF-0808863, CCF-1348152, IIS-1242304, and IIS-1231742, NIH Grant R21CA175974, and DARPA XDATA Grant FA8750-12-2-0309. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding agencies.
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Hendrik Blockeel, Kristian Kersting, Siegfried Nijssen, Filip Železný.
Rights and permissions
About this article
Cite this article
Choo, J., Lee, C., Reddy, C.K. et al. Weakly supervised nonnegative matrix factorization for user-driven clustering. Data Min Knowl Disc 29, 1598–1621 (2015). https://doi.org/10.1007/s10618-014-0384-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-014-0384-8