Skip to main content
Log in

Adaptive local learning regularized nonnegative matrix factorization for data clustering

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Data clustering aims to group the input data instances into certain clusters according to the high similarity to each other, and it could be regarded as a fundamental and essential immediate or intermediate task that appears in areas of machine learning, pattern recognition, and information retrieval. Clustering algorithms based on graph regularized extensions have accumulated much interest for a couple of decades, and the performance of this category of approaches is largely determined by the data similarity matrix, which is usually calculated by the predefined model with carefully tuned parameters combination. However, they may lack a more flexible ability and not be optimal in practice. In this paper, we consider both discriminative information as well as the data manifold in a matrix factorization point of view, and propose an adaptive local learning regularized nonnegative matrix factorization (ALLRNMF) approach for data clustering, which assumes that similar instance pairs with a smaller distance should have a larger probability to be assigned to the probabilistic neighbors. ALLRNMF simultaneously learns the data similarity matrix under the assumption and performs the nonnegative matrix factorization. The constraint of the similarity matrix encodes both the discriminative information as well as the learned adaptive local structure and benefits the data clustering on manifold. In order to solve the optimization problem of our approach, an effective alternative optimization algorithm is proposed such that our objective function could be decomposed into several subproblems that each has an optimal solution, and its convergence is theoretically guaranteed. Experiments on real-world benchmark datasets demonstrate the superior performance of our approach against the existing clustering approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. Wap, La1 and tr12 are publicly available from http://archive.ics.uci.edu/ml/datasets.html.

  2. Vote, Diag-Bcw, Abalone, Krvs and Caltech101 Silhouettes, which are publicly available from http://glaros.dtc.umn.edu/gkhome/views/cluto/.

  3. tr12 is publicly available from http://trec.nist.gov.

  4. TDT2-20 is publicly available from http://www.nist.gov/speech/tests/tdt/tdt98/index.htm.

References

  1. Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7:2399–2434

    MathSciNet  MATH  Google Scholar 

  2. Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  3. Cai D, He X, Han J, Huang TS (2011) Graph regularized nonnegative matrix factorization for data representation. IEEE Trans Pattern Anal Mach Intell 33(8):1548–1560

    Article  Google Scholar 

  4. Cai D, He X, Wu X, Han J (2008) Non-negative matrix factorization on manifold. In: Proceedings of the 8th international conference on data mining. IEEE, Piscataway, pp 63–72

  5. Cai X, Nie F, Huang H (2013) Multi-view k-means clustering on big data. In: Proceedings of the 25th international joint conference on artificial intelligence. AAAI, Cambridge, pp 2598–2604

  6. Chung FR (1997) Spectral graph theory, vol 92 American Mathematical Soc

  7. Ding C, Li T, Jordan MI (2009) Convex and semi-nonnegative matrix factorizations. IEEE Trans Pattern Anal Mach Intell 32(1):45–55

    Article  Google Scholar 

  8. Ding C, Li T, Peng W, Park H (2006) Orthogonal nonnegative matrix t-factorizations for clustering. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 126–135

  9. Elhamifar E, Vidal R (2015) Sparse subspace clustering: algorithm, theory, and applications. IEEE Trans Pattern Anal Mach Intell 35(11):2765–2781

    Article  Google Scholar 

  10. Gokcay E, Principe JC (2002) Information theoretic clustering. IEEE Trans Pattern Anal Mach Intell 24 (2):158–171

    Article  Google Scholar 

  11. Gu Q, Ding C, Han J (2011) On trivial solution and scale transfer problems in graph regularized nmf. In: Proceedings of the 23rd international joint conference on artificial intelligence, vol 22. AAAI, Cambridge, pp 1288–1295

  12. Gu Q, Zhou J (2009) Local learning regularized nonnegative matrix factorization. In: Proceedings of the 21st international joint conference on artificial intelligence. AAAI, Cambridge, pp 1046–1051

  13. Guo X (2015) Robust subspace segmentation by simultaneously learning data representations and their affinity matrix Proceeding of the 24nd international joint conference on artificial intelligence. AAAI, Cambridge, pp 3547–3553

  14. Hagen L, Kahng AB (2006) New spectral methods for ratio cut partitioning and clustering. IEEE Trans Comput Aided Des Integr Circuits Syst 11(9):1074–1085

    Article  Google Scholar 

  15. Han EH, Boley D, Gini M, Gross R, Hastings K, Karypis G, Kumar V, Mobasher B, Moore J (1998) Webace:a web agent for document categorization and exploration. In: Proceedings of the 2nd international conference on autonomous agents, pp 408– 415

  16. Huang J, Nie F, Huang H, Ding C (2014) Robust manifold nonnegative matrix factorization. ACM Trans Knowl Discov Data 8(3):11

    Article  Google Scholar 

  17. Huang S, Wang H, Li T, Li T, Xu Z (2018) Robust graph regularized nonnegative matrix factorization for clustering. Data Min Knowl Disc 32(2):483–503

    Article  MathSciNet  Google Scholar 

  18. Huang S, Xu Z, Lv J (2018) Adaptive local structure learning for document co-clustering. Knowl-Based Syst 148:74–84

    Article  Google Scholar 

  19. Jain AK (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323

    Article  Google Scholar 

  20. Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. In: Proceedings of the 14th advances in neural information processing systems. MIT Press, Cambridge, pp 556–562

  21. Liu G, Lin Z, Yan S, Sun J, Yu Y, Ma Y (2013) Robust recovery of subspace structures by low-rank representation. IEEE Trans Pattern Anal Mach Intell 35(1):171–184

    Article  Google Scholar 

  22. Liu Y, Jiao L, Shang F (2013) A fast tri-factorization method for low-rank matrix recovery and completion. Pattern Recogn 46(1):163–173

    Article  MATH  Google Scholar 

  23. Luxburg UV (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416

    Article  MathSciNet  Google Scholar 

  24. MacQueen J, et al. (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 50th Berkeley symposium on mathematical statistics and probability, vol 1, pp 281-297, Oakland, USA

  25. Nie F, Wang X, Huang H (2014) Clustering and projected clustering with adaptive neighbors. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 977–986

  26. Nie F, Wang X, Jordan MI, Huang H (2016) The constrained laplacian rank algorithm for graph-based clustering. In: Proceedings of the 30th AAAI conference on artificial intelligence. AAAI, Cambridge, pp 1969–1976

  27. Peng C, Kang Z, Hu Y, Cheng J, Cheng Q (2017) Robust graph regularized nonnegative matrix factorization for clustering. ACM Trans Knowl Discov Data (TKDD) 11(3):33

    Google Scholar 

  28. Rai N, Negi S, Chaudhury S, Deshmukh O (2016) Partial multi-view clustering using graph regularized nmf. In: Proceeding of 23rd international conference on pattern recognition (ICPR). IEEE, Piscataway, pp 2192–2197

  29. Seung HS, Lee DD (2000) The manifold ways of perception. Science 290(5500):2268–2269

    Article  Google Scholar 

  30. Shang F, Jiao L, Wang F (2012) Graph dual regularization non-negative matrix factorization for co-clustering. Pattern Recogn 45(6):2237–2250

    Article  MATH  Google Scholar 

  31. Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22 (8):888–905

    Article  Google Scholar 

  32. Shlens J (2014) A tutorial on principal component analysis. arXiv:1404.1100

  33. Tenenbaum JB, De Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323

    Article  Google Scholar 

  34. Wang H, Nie F, Huang H, Makedon F (2011) Fast nonnegative matrix tri-factorization for large-scale data co-clustering. In: Proceedings of the 22nd international joint conference on artificial intelligence. AAAI, Cambridge, pp 1553–1558

  35. Wang S, Tang J, Liu H (2015) Embedded unsupervised feature selection. In: Proceedings of the 29th AAAI conference on artificial intelligence. AAAI, Cambridge, pp 470–476

  36. Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 267–273

  37. Xu Z, King I, Lyu MRT, Jin R (2010) Discriminative semi-supervised feature selection via manifold regularization. IEEE Trans Neural Netw 21(7):1033–1047

    Article  Google Scholar 

  38. Yoo J, Choi S (2010) Orthogonal nonnegative matrix tri-factorization for co-clustering: multiplicative updates on stiefel manifolds. Inf Process Manag 46(5):559–570

    Article  Google Scholar 

  39. Zhang L, Zhang Q, Du B, You J, Tao D (2017) Adaptive manifold regularized matrix factorization for data clustering. AAAI, Cambridge

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Meng Wang.

Appendices

Appendix A: Proof of Theorem 2

Proof

We rewrite (33) as

$$\begin{array}{@{}rcl@{}} L(\textbf{U})=tr(-2\textbf{V}\textbf{X}^{T}\textbf{U} +\textbf{U}\textbf{V}\textbf{V}^{T}\textbf{U}^{T}). \end{array} $$
(39)

By applying Lemma 2, we have

$$\begin{array}{@{}rcl@{}} tr(\textbf{U}\textbf{V}\textbf{V}^{T}\textbf{U}^{T}) \leq\sum\limits_{ij}\frac{(\textbf{U}^{\prime}\textbf{V}\textbf{V}^{T})_{ij}\textbf{U}^{2}_{ij}} {\textbf{U}^{\prime}_{ij}} . \end{array} $$

To obatin the lower bound for the remaining terms, we use the inequality that

$$\begin{array}{@{}rcl@{}} z\geq 1+log z, \forall z\geq 0. \end{array} $$
(40)

Then

$$\begin{array}{@{}rcl@{}} tr(\textbf{V}\textbf{X}^{T}\textbf{U})\geq \sum\limits_{ij}(\textbf{X}\textbf{V}^{T})_{ij}\textbf{U}^{\prime}_{ij}\left( 1+log\frac{\textbf{U}_{ij}}{\textbf{U}_{ij}^{\prime}}\right) . \end{array} $$

By summing over all the bounds, we can get g(U, U), which obviously satisfies: (1) g(U, U) ≥ JALLRNMF(U); (2) g(U, U) = JALLRNMF(U).

To find the minimum of g(U, U), we take the Hessian matrix of g(U, U)

$$\begin{array}{@{}rcl@{}} \frac{\partial^{2}g(\textbf{U}, \textbf{U}^{\prime})}{\partial \textbf{U}_{ij}\partial \textbf{U}_{kl}}=\delta_{ik} \delta_{jl}\left( \frac{2(\textbf{U}^{\prime}\textbf{V}\textbf{V}^{T})_{ij}}{\textbf{U}^{\prime}_{ij}} + 2(\textbf{X}\textbf{V}^{T})_{ij}\frac{\textbf{U}^{\prime}_{ij}}{\textbf{U}^{2}_{ij}}\right) \end{array} $$

which is a diagonal matrix with positive diagonal elements. So g(U, U) is a convex function of U, and we can obtain the global minimum of g(U, U) by setting \( \frac {\partial g(\textbf {U}, \textbf {U}^{\prime })}{\partial \textbf {U}_{ij}}= 0\) and solving for U, from which we can get (34). □

Appendix B: Proof of Theorem 4

Proof

We rewrite (35) as

$$\begin{array}{@{}rcl@{}} L(\textbf{V})\!&=&\!tr\left( -2\textbf{X}^{T}\textbf{U}\textbf{V} +\textbf{V}^{T}\textbf{U}^{T}\textbf{U}\textbf{V}\right.\\ &&\qquad\left.-\lambda \textbf{V}\textbf{L}_{S}^{+}\textbf{V}^{T} +\lambda \textbf{V}\textbf{L}_{S}^{-}\textbf{V}^{T}\right). \end{array} $$
(41)

By applying Lemma 2, we have

$$\begin{array}{@{}rcl@{}} tr(\textbf{V}^{T}\textbf{U}^{T}\textbf{U}\textbf{V})&\leq&\sum\limits_{ij} \frac{(\textbf{U}^{T}\textbf{U}\textbf{V}^{\prime})_{ij}\textbf{V}^{2}_{ij}}{\textbf{V}^{\prime}_{ij}},\\ tr(\textbf{V}\textbf{L}^{-}_{S}\textbf{V}^{T})&\leq&\sum\limits_{ij} \frac{(\textbf{V}^{\prime}\textbf{L}_{S}^{-})_{ij}\textbf{V}^{2}_{ij}}{\textbf{V}^{\prime}_{ij}}. \end{array} $$

To obatin the lower bound for the remaining terms, we use the inequality in (40), then

$$\begin{array}{@{}rcl@{}} tr(\textbf{X}^{T}\textbf{U}\textbf{V})&\geq& \sum\limits_{ij}(\textbf{U}^{T}\textbf{X})_{ij}\textbf{V}^{\prime}_{ij} \left( 1+log\frac{\textbf{V}_{ij}}{\textbf{V}_{ij}^{\prime}}\right),\\ tr(\textbf{V}\textbf{L}^{+}_{S}\textbf{V}^{T})&\geq& \sum\limits_{ijk}(\textbf{L}^{+}_{S})_{jk}\textbf{V}_{ij}^{\prime}\textbf{V}^{\prime}_{ik}\left( 1+log\frac{\textbf{V}_{ij} \textbf{V}_{ik}} {\textbf{V}_{ij}^{\prime}\textbf{V}_{ik}^{\prime}}\right), \end{array} $$

By summing over all the bounds, we can get g(V, V), which obviously satisfies: (1) g(V, V) ≥ JALLRNMF(V); (2) g(V, V) = JALLRNMF(V).

To find the minimum of g(V, V), we take the Hessian matrix of g(V, V)

$$\begin{array}{@{}rcl@{}} \frac{\partial^{2}g(\textbf{V}, \textbf{V}^{\prime})}{\partial \textbf{V}_{ij}\partial \textbf{V}_{kl}}&=&\delta_{ik} \delta_{jl} \left( \frac{2(\textbf{U}^{T}\textbf{X}+ 2\lambda\textbf{L}^{+}_{S})_{ij}\textbf{V}^{\prime}_{ij}}{\textbf{V}^{2}_{ij}}\right.\\ &&\qquad\qquad\left.+\frac{2(\textbf{U}^{T}\textbf{U}\textbf{V}^{\prime}+\lambda\textbf{V}^{\prime}\textbf{L}^{-}_{S})_{ij}}{\textbf{V}^{\prime}_{ij}}\right) \end{array} $$

which is a diagonal matrix with positive diagonal elements. So g(V, V) is a convex function of V, and we can obtain the global minimum of g(V, V) by setting \( \frac {\partial g(\textbf {V}, \textbf {V}^{\prime })}{\partial \textbf {V}_{ij}}= 0\) and solving for V, from which we can get (36). □

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sheng, Y., Wang, M., Wu, T. et al. Adaptive local learning regularized nonnegative matrix factorization for data clustering. Appl Intell 49, 2151–2168 (2019). https://doi.org/10.1007/s10489-018-1380-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-018-1380-2

Keywords

Navigation