Skip to main content
Log in

Weakly supervised nonnegative matrix factorization for user-driven clustering

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Clustering high-dimensional data and making sense out of its result is a challenging problem. In this paper, we present a weakly supervised nonnegative matrix factorization (NMF) and its symmetric version that take into account various prior information via regularization in clustering applications. Unlike many other existing methods, the proposed weakly supervised NMF methods provide interpretable and flexible outputs by directly incorporating various forms of prior information. Furthermore, the proposed methods maintain a comparable computational complexity to the standard NMF under an alternating nonnegativity-constrained least squares framework. By using real-world data, we conduct quantitative analyses to compare our methods against other semi-supervised clustering methods. We also present the use cases where the proposed methods lead to semantically meaningful and accurate clustering results by properly utilizing user-driven prior information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. For the sake of notation simplicity, we do not distinguish \(H\) between Eqs. (1) and (2).

  2. http://qwone.com/~jason/20Newsgroups/.

  3. http://jmlr.csail.mit.edu/papers/volume5/lewis04a/lyrl2004_rcv1v2_README.htm .

  4. http://ai.stanford.edu/~gal/data.html.

  5. http://people.cs.umass.edu/~mccallum/data.html.

  6. http://vision.ucsd.edu/~leekc/ExtYaleDatabase/ExtYaleB.html.

  7. http://www2.ece.ohio-state.edu/~aleix/ARdatabase.html.

  8. SS-NMF1: http://www-personal.umich.edu/~chenyanh/SEMI_NMF_CODE.zip.

  9. MPCK-Means and PCK-Means: http://www.cs.utexas.edu/users/ml/risc/code/.

  10. http://dais.cs.uiuc.edu/manish/ECOutlier/.

References

  • Aggarwal CC, Reddy CK (eds) (2013) Data clustering: algorithms and applications. Chapman and Hall/CRC Press, Boca Raton

    Google Scholar 

  • Alqadah F, Bader JS, Anand R, Reddy CK (2012) Query-based biclustering using formal concept analysis. In: Proceedings of the SIAM international conference on data mining (SDM), pp 648–659

  • Basu S, Bilenko M, Mooney RJ (2004) A probabilistic framework for semi-supervised clustering. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 59–68

  • Basu S, Davidson I, Wagstaff K (2008) Constrained clustering: advances in algorithms, theory, and applications. Chapman & Hall/CRC Press, Boca Raton

    Google Scholar 

  • Bertsekas DP (1999) Nonlinear programming, 2nd edn. Athena Scientific, Belmont

    MATH  Google Scholar 

  • Bilenko M, Basu S, Mooney RJ (2004) Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of the international conference on machine learning (ICML), pp 81–88

  • Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res (JMLR) 3:993–1022

    MATH  Google Scholar 

  • Cai D, He X, Han J, Huang T (2011) Graph regularized nonnegative matrix factorization for data representation. IEEE Trans Pattern Anal Mach Intell (TPAMI) 33(8):1548–1560

    Article  Google Scholar 

  • Chakrabarti D, Kumar R, Tomkins A (2006) Evolutionary clustering. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 554–560

  • Chen Y, Rege M, Dong M, Hua J (2008) Non-negative matrix factorization for semi-supervised data clustering. Knowl Inf Syst (KAIS) 17:355–379

    Article  Google Scholar 

  • Chen Y, Wang L, Dong M (2010) Non-negative matrix factorization for semi-supervised heterogeneous data coclustering. IEEE Trans Knowl Data Eng (TKDE) 22(10):1459–1474

    Article  Google Scholar 

  • Chi Y, Song X, Zhou D, Hino K, Tseng BL (2009) On evolutionary spectral clustering. ACM Trans Knowl Discov Data (TKDD) 3:17:1–17:30

    Google Scholar 

  • Choo J, Lee C, Reddy CK, Park H (2013) UTOPIAN: user-driven topic modeling based on interactive nonnegative matrix factorization. IEEE Trans Vis Comput Graph (TVCG) 19(12):1992–2001

    Article  Google Scholar 

  • Guan N, Tao D, Luo Z, Yuan B (2011) Manifold regularized discriminative nonnegative matrix factorization with fast gradient descent. IEEE Trans Image Process (TIP) 20(7):2030–2048

    Article  MathSciNet  Google Scholar 

  • Gupta M, Gao J, Sun Y, Han J (2012) Integrating community matching and outlier detection for mining evolutionary community outliers. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 859–867

  • Hoyer PO (2004) Non-negative matrix factorization with sparseness constraints. J Mach Learn Res (JMLR) 5:1457–1469

    MathSciNet  MATH  Google Scholar 

  • Kim H, Park H (2007) Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics 23(12):1495–1502

    Article  Google Scholar 

  • Kim J, Park H (2008) Sparse nonnegative matrix factorization for clustering. Technical report, Georgia Institute of Technology

  • Kim J, Park H (2011) Fast nonnegative matrix factorization: an active-set-like method and comparisons. SIAM J Sci Comput 33(6):3261–3281

    Article  MathSciNet  MATH  Google Scholar 

  • Kim J, He Y, Park H (2014) Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework. J Glob Optim 58(2):285–319

    Article  MathSciNet  MATH  Google Scholar 

  • Kuang D, Ding C, Park H (2012) Symmetric nonnegative matrix factorization for graph clustering. In: Proceedings of the SIAM international conference on data mining (SDM), pp 106–117

  • Kuang D, Yun S, Park H (2014) SymNMF: nonnegative low-rank approximation of a similarity matrix for graph clustering. J Glob Optim (to appear)

  • Kulis B, Basu S, Dhillon I, Mooney R (2009) Semi-supervised graph clustering: a kernel approach. Mach Learn 74(1):1–22

    Article  Google Scholar 

  • Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791

    Article  Google Scholar 

  • Lee H, Yoo J, Choi S (2010) Semi-supervised nonnegative matrix factorization. IEEE Signal Process Lett 17(1):4–7

    Article  Google Scholar 

  • Li T, Ding C, Jordan M (2007) Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. In: Proceedings of the IEEE international conference on data mining (ICDM), pp 577–582

  • Lin CJ (2007) Projected gradient methods for nonnegative matrix factorization. Neural Comput 19(10):2756–2779

    Article  MathSciNet  MATH  Google Scholar 

  • Liu H, Wu Z, Li X, Cai D, Huang T (2012) Constrained nonnegative matrix factorization for image representation. IEEE Trans Pattern Anal Mach Intell (TPAMI) 34(7):1299–1311

    Article  Google Scholar 

  • Liu Y, Jin R, Yang L (2006) Semi-supervised multi-label learning by constrained non-negative matrix factorization. In: Proceedings of the national conference on artificial intelligence, pp 421–426

  • Mimno D, McCallum A (2012) Topic models conditioned on arbitrary features with dirichlet-multinomial regression. arXiv:1206.3278

  • Shahnaz F, Berry MW, Pauca V, Plemmons RJ (2006) Document clustering using nonnegative matrix factorization. Inf Process Manag (IPM) 42(2):373–386

    Article  MATH  Google Scholar 

  • Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell (TPAMI) 22(8):888–905

    Article  Google Scholar 

  • Wang C, Yan S, Zhang L, Zhang H (2009) Non-negative semi-supervised learning. In: Proceedings of the international conference on artificial intelligence and statistics (AISTATS), pp 575–582

  • Xie X, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans Pattern Anal Mach Intell (TPAMI) 13(8):841–847

    Article  Google Scholar 

  • Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. In: Proceedings of the ACM SIGIR international conference on research and development in informaion retrieval (SIGIR), pp 267–273

  • Zelnik-Manor L, Perona P (2004) Self-tuning spectral clustering. In: Advances in neural information processing system (NIPS), vol 17, pp 1601–1608

  • Zeng J, Cheung W, Li CH, Liu J (2009) Multirelational topic models. In: Proceedings of the IEEE international conference on data mining (ICDM), pp 1070–1075

Download references

Acknowledgments

The work of these authors was supported in part by NSF Grants CCF-0808863, CCF-1348152, IIS-1242304, and IIS-1231742, NIH Grant R21CA175974, and DARPA XDATA Grant FA8750-12-2-0309. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding agencies.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jaegul Choo.

Additional information

Responsible editor: Hendrik Blockeel, Kristian Kersting, Siegfried Nijssen, Filip Železný.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Choo, J., Lee, C., Reddy, C.K. et al. Weakly supervised nonnegative matrix factorization for user-driven clustering. Data Min Knowl Disc 29, 1598–1621 (2015). https://doi.org/10.1007/s10618-014-0384-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-014-0384-8

Keywords

Navigation