Adaptive local learning regularized nonnegative matrix factorization for data clustering

Sheng, Yongpan; Wang, Meng; Wu, Tianxing; Xu, Han

doi:10.1007/s10489-018-1380-2

Adaptive local learning regularized nonnegative matrix factorization for data clustering

Published: 03 January 2019

Volume 49, pages 2151–2168, (2019)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Yongpan Sheng¹,
Meng Wang²,
Tianxing Wu³ &
…
Han Xu¹

744 Accesses
15 Citations
Explore all metrics

Abstract

Data clustering aims to group the input data instances into certain clusters according to the high similarity to each other, and it could be regarded as a fundamental and essential immediate or intermediate task that appears in areas of machine learning, pattern recognition, and information retrieval. Clustering algorithms based on graph regularized extensions have accumulated much interest for a couple of decades, and the performance of this category of approaches is largely determined by the data similarity matrix, which is usually calculated by the predefined model with carefully tuned parameters combination. However, they may lack a more flexible ability and not be optimal in practice. In this paper, we consider both discriminative information as well as the data manifold in a matrix factorization point of view, and propose an adaptive local learning regularized nonnegative matrix factorization (ALLRNMF) approach for data clustering, which assumes that similar instance pairs with a smaller distance should have a larger probability to be assigned to the probabilistic neighbors. ALLRNMF simultaneously learns the data similarity matrix under the assumption and performs the nonnegative matrix factorization. The constraint of the similarity matrix encodes both the discriminative information as well as the learned adaptive local structure and benefits the data clustering on manifold. In order to solve the optimization problem of our approach, an effective alternative optimization algorithm is proposed such that our objective function could be decomposed into several subproblems that each has an optimal solution, and its convergence is theoretically guaranteed. Experiments on real-world benchmark datasets demonstrate the superior performance of our approach against the existing clustering approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive graph regularized non-negative matrix factorization with self-weighted learning for data clustering

Article 22 September 2023

Dual local learning regularized nonnegative matrix factorization and its semi-supervised extension for clustering

Article 04 October 2020

Orthogonal Dual Graph-Regularized Nonnegative Matrix Factorization for Co-Clustering

Article 20 April 2021

Notes

Wap, La1 and tr12 are publicly available from http://archive.ics.uci.edu/ml/datasets.html.
Vote, Diag-Bcw, Abalone, Krvs and Caltech101 Silhouettes, which are publicly available from http://glaros.dtc.umn.edu/gkhome/views/cluto/.
tr12 is publicly available from http://trec.nist.gov.
TDT2-20 is publicly available from http://www.nist.gov/speech/tests/tdt/tdt98/index.htm.

References

Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7:2399–2434
MathSciNet MATH Google Scholar
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge
Book MATH Google Scholar
Cai D, He X, Han J, Huang TS (2011) Graph regularized nonnegative matrix factorization for data representation. IEEE Trans Pattern Anal Mach Intell 33(8):1548–1560
Article Google Scholar
Cai D, He X, Wu X, Han J (2008) Non-negative matrix factorization on manifold. In: Proceedings of the 8th international conference on data mining. IEEE, Piscataway, pp 63–72
Cai X, Nie F, Huang H (2013) Multi-view k-means clustering on big data. In: Proceedings of the 25th international joint conference on artificial intelligence. AAAI, Cambridge, pp 2598–2604
Chung FR (1997) Spectral graph theory, vol 92 American Mathematical Soc
Ding C, Li T, Jordan MI (2009) Convex and semi-nonnegative matrix factorizations. IEEE Trans Pattern Anal Mach Intell 32(1):45–55
Article Google Scholar
Ding C, Li T, Peng W, Park H (2006) Orthogonal nonnegative matrix t-factorizations for clustering. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 126–135
Elhamifar E, Vidal R (2015) Sparse subspace clustering: algorithm, theory, and applications. IEEE Trans Pattern Anal Mach Intell 35(11):2765–2781
Article Google Scholar
Gokcay E, Principe JC (2002) Information theoretic clustering. IEEE Trans Pattern Anal Mach Intell 24 (2):158–171
Article Google Scholar
Gu Q, Ding C, Han J (2011) On trivial solution and scale transfer problems in graph regularized nmf. In: Proceedings of the 23rd international joint conference on artificial intelligence, vol 22. AAAI, Cambridge, pp 1288–1295
Gu Q, Zhou J (2009) Local learning regularized nonnegative matrix factorization. In: Proceedings of the 21st international joint conference on artificial intelligence. AAAI, Cambridge, pp 1046–1051
Guo X (2015) Robust subspace segmentation by simultaneously learning data representations and their affinity matrix Proceeding of the 24nd international joint conference on artificial intelligence. AAAI, Cambridge, pp 3547–3553
Hagen L, Kahng AB (2006) New spectral methods for ratio cut partitioning and clustering. IEEE Trans Comput Aided Des Integr Circuits Syst 11(9):1074–1085
Article Google Scholar
Han EH, Boley D, Gini M, Gross R, Hastings K, Karypis G, Kumar V, Mobasher B, Moore J (1998) Webace:a web agent for document categorization and exploration. In: Proceedings of the 2nd international conference on autonomous agents, pp 408– 415
Huang J, Nie F, Huang H, Ding C (2014) Robust manifold nonnegative matrix factorization. ACM Trans Knowl Discov Data 8(3):11
Article Google Scholar
Huang S, Wang H, Li T, Li T, Xu Z (2018) Robust graph regularized nonnegative matrix factorization for clustering. Data Min Knowl Disc 32(2):483–503
Article MathSciNet Google Scholar
Huang S, Xu Z, Lv J (2018) Adaptive local structure learning for document co-clustering. Knowl-Based Syst 148:74–84
Article Google Scholar
Jain AK (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
Article Google Scholar
Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. In: Proceedings of the 14th advances in neural information processing systems. MIT Press, Cambridge, pp 556–562
Liu G, Lin Z, Yan S, Sun J, Yu Y, Ma Y (2013) Robust recovery of subspace structures by low-rank representation. IEEE Trans Pattern Anal Mach Intell 35(1):171–184
Article Google Scholar
Liu Y, Jiao L, Shang F (2013) A fast tri-factorization method for low-rank matrix recovery and completion. Pattern Recogn 46(1):163–173
Article MATH Google Scholar
Luxburg UV (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416
Article MathSciNet Google Scholar
MacQueen J, et al. (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 50th Berkeley symposium on mathematical statistics and probability, vol 1, pp 281-297, Oakland, USA
Nie F, Wang X, Huang H (2014) Clustering and projected clustering with adaptive neighbors. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 977–986
Nie F, Wang X, Jordan MI, Huang H (2016) The constrained laplacian rank algorithm for graph-based clustering. In: Proceedings of the 30th AAAI conference on artificial intelligence. AAAI, Cambridge, pp 1969–1976
Peng C, Kang Z, Hu Y, Cheng J, Cheng Q (2017) Robust graph regularized nonnegative matrix factorization for clustering. ACM Trans Knowl Discov Data (TKDD) 11(3):33
Google Scholar
Rai N, Negi S, Chaudhury S, Deshmukh O (2016) Partial multi-view clustering using graph regularized nmf. In: Proceeding of 23rd international conference on pattern recognition (ICPR). IEEE, Piscataway, pp 2192–2197
Seung HS, Lee DD (2000) The manifold ways of perception. Science 290(5500):2268–2269
Article Google Scholar
Shang F, Jiao L, Wang F (2012) Graph dual regularization non-negative matrix factorization for co-clustering. Pattern Recogn 45(6):2237–2250
Article MATH Google Scholar
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22 (8):888–905
Article Google Scholar
Shlens J (2014) A tutorial on principal component analysis. arXiv:1404.1100
Tenenbaum JB, De Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323
Article Google Scholar
Wang H, Nie F, Huang H, Makedon F (2011) Fast nonnegative matrix tri-factorization for large-scale data co-clustering. In: Proceedings of the 22nd international joint conference on artificial intelligence. AAAI, Cambridge, pp 1553–1558
Wang S, Tang J, Liu H (2015) Embedded unsupervised feature selection. In: Proceedings of the 29th AAAI conference on artificial intelligence. AAAI, Cambridge, pp 470–476
Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 267–273
Xu Z, King I, Lyu MRT, Jin R (2010) Discriminative semi-supervised feature selection via manifold regularization. IEEE Trans Neural Netw 21(7):1033–1047
Article Google Scholar
Yoo J, Choi S (2010) Orthogonal nonnegative matrix tri-factorization for co-clustering: multiplicative updates on stiefel manifolds. Inf Process Manag 46(5):559–570
Article Google Scholar
Zhang L, Zhang Q, Du B, You J, Tao D (2017) Adaptive manifold regularized matrix factorization for data clustering. AAAI, Cambridge
Book Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China
Yongpan Sheng & Han Xu
School of Computer Science and Engineering, Southeast University, Nanjing, China
Meng Wang
School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
Tianxing Wu

Authors

Yongpan Sheng
View author publications
You can also search for this author in PubMed Google Scholar
Meng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Tianxing Wu
View author publications
You can also search for this author in PubMed Google Scholar
Han Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Meng Wang.

Appendices

Appendix A: Proof of Theorem 2

Proof

We rewrite (33) as

$$\begin{array}{@{}rcl@{}} L(\textbf{U})=tr(-2\textbf{V}\textbf{X}^{T}\textbf{U} +\textbf{U}\textbf{V}\textbf{V}^{T}\textbf{U}^{T}). \end{array} $$

(39)

By applying Lemma 2, we have

$$\begin{array}{@{}rcl@{}} tr(\textbf{U}\textbf{V}\textbf{V}^{T}\textbf{U}^{T}) \leq\sum\limits_{ij}\frac{(\textbf{U}^{\prime}\textbf{V}\textbf{V}^{T})_{ij}\textbf{U}^{2}_{ij}} {\textbf{U}^{\prime}_{ij}} . \end{array} $$

To obatin the lower bound for the remaining terms, we use the inequality that

$$\begin{array}{@{}rcl@{}} z\geq 1+log z, \forall z\geq 0. \end{array} $$

(40)

Then

$$\begin{array}{@{}rcl@{}} tr(\textbf{V}\textbf{X}^{T}\textbf{U})\geq \sum\limits_{ij}(\textbf{X}\textbf{V}^{T})_{ij}\textbf{U}^{\prime}_{ij}\left( 1+log\frac{\textbf{U}_{ij}}{\textbf{U}_{ij}^{\prime}}\right) . \end{array} $$

By summing over all the bounds, we can get g(U, U^′), which obviously satisfies: (1) g(U, U^′) ≥ J_ALLRNMF(U); (2) g(U, U) = J_ALLRNMF(U).

To find the minimum of g(U, U^′), we take the Hessian matrix of g(U, U^′)

$$\begin{array}{@{}rcl@{}} \frac{\partial^{2}g(\textbf{U}, \textbf{U}^{\prime})}{\partial \textbf{U}_{ij}\partial \textbf{U}_{kl}}=\delta_{ik} \delta_{jl}\left( \frac{2(\textbf{U}^{\prime}\textbf{V}\textbf{V}^{T})_{ij}}{\textbf{U}^{\prime}_{ij}} + 2(\textbf{X}\textbf{V}^{T})_{ij}\frac{\textbf{U}^{\prime}_{ij}}{\textbf{U}^{2}_{ij}}\right) \end{array} $$

which is a diagonal matrix with positive diagonal elements. So g(U, U^′) is a convex function of U, and we can obtain the global minimum of g(U, U^′) by setting $ \frac {\partial g(\textbf {U}, \textbf {U}^{\prime })}{\partial \textbf {U}_{ij}}= 0$ and solving for U, from which we can get (34). □

Appendix B: Proof of Theorem 4

Proof

We rewrite (35) as

$$\begin{array}{@{}rcl@{}} L(\textbf{V})\!&=&\!tr\left( -2\textbf{X}^{T}\textbf{U}\textbf{V} +\textbf{V}^{T}\textbf{U}^{T}\textbf{U}\textbf{V}\right.\\ &&\qquad\left.-\lambda \textbf{V}\textbf{L}_{S}^{+}\textbf{V}^{T} +\lambda \textbf{V}\textbf{L}_{S}^{-}\textbf{V}^{T}\right). \end{array} $$

(41)

By applying Lemma 2, we have

$$\begin{array}{@{}rcl@{}} tr(\textbf{V}^{T}\textbf{U}^{T}\textbf{U}\textbf{V})&\leq&\sum\limits_{ij} \frac{(\textbf{U}^{T}\textbf{U}\textbf{V}^{\prime})_{ij}\textbf{V}^{2}_{ij}}{\textbf{V}^{\prime}_{ij}},\\ tr(\textbf{V}\textbf{L}^{-}_{S}\textbf{V}^{T})&\leq&\sum\limits_{ij} \frac{(\textbf{V}^{\prime}\textbf{L}_{S}^{-})_{ij}\textbf{V}^{2}_{ij}}{\textbf{V}^{\prime}_{ij}}. \end{array} $$

To obatin the lower bound for the remaining terms, we use the inequality in (40), then

$$\begin{array}{@{}rcl@{}} tr(\textbf{X}^{T}\textbf{U}\textbf{V})&\geq& \sum\limits_{ij}(\textbf{U}^{T}\textbf{X})_{ij}\textbf{V}^{\prime}_{ij} \left( 1+log\frac{\textbf{V}_{ij}}{\textbf{V}_{ij}^{\prime}}\right),\\ tr(\textbf{V}\textbf{L}^{+}_{S}\textbf{V}^{T})&\geq& \sum\limits_{ijk}(\textbf{L}^{+}_{S})_{jk}\textbf{V}_{ij}^{\prime}\textbf{V}^{\prime}_{ik}\left( 1+log\frac{\textbf{V}_{ij} \textbf{V}_{ik}} {\textbf{V}_{ij}^{\prime}\textbf{V}_{ik}^{\prime}}\right), \end{array} $$

By summing over all the bounds, we can get g(V, V^′), which obviously satisfies: (1) g(V, V^′) ≥ J_ALLRNMF(V); (2) g(V, V) = J_ALLRNMF(V).

To find the minimum of g(V, V^′), we take the Hessian matrix of g(V, V^′)

$$\begin{array}{@{}rcl@{}} \frac{\partial^{2}g(\textbf{V}, \textbf{V}^{\prime})}{\partial \textbf{V}_{ij}\partial \textbf{V}_{kl}}&=&\delta_{ik} \delta_{jl} \left( \frac{2(\textbf{U}^{T}\textbf{X}+ 2\lambda\textbf{L}^{+}_{S})_{ij}\textbf{V}^{\prime}_{ij}}{\textbf{V}^{2}_{ij}}\right.\\ &&\qquad\qquad\left.+\frac{2(\textbf{U}^{T}\textbf{U}\textbf{V}^{\prime}+\lambda\textbf{V}^{\prime}\textbf{L}^{-}_{S})_{ij}}{\textbf{V}^{\prime}_{ij}}\right) \end{array} $$

which is a diagonal matrix with positive diagonal elements. So g(V, V^′) is a convex function of V, and we can obtain the global minimum of g(V, V^′) by setting $ \frac {\partial g(\textbf {V}, \textbf {V}^{\prime })}{\partial \textbf {V}_{ij}}= 0$ and solving for V, from which we can get (36). □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sheng, Y., Wang, M., Wu, T. et al. Adaptive local learning regularized nonnegative matrix factorization for data clustering. Appl Intell 49, 2151–2168 (2019). https://doi.org/10.1007/s10489-018-1380-2

Download citation

Published: 03 January 2019
Issue Date: 15 June 2019
DOI: https://doi.org/10.1007/s10489-018-1380-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive local learning regularized nonnegative matrix factorization for data clustering

Abstract

Access this article

Similar content being viewed by others

Adaptive graph regularized non-negative matrix factorization with self-weighted learning for data clustering

Dual local learning regularized nonnegative matrix factorization and its semi-supervised extension for clustering

Orthogonal Dual Graph-Regularized Nonnegative Matrix Factorization for Co-Clustering

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: Proof of Theorem 2

Proof

Appendix B: Proof of Theorem 4

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Adaptive local learning regularized nonnegative matrix factorization for data clustering

Abstract

Access this article

Similar content being viewed by others

Adaptive graph regularized non-negative matrix factorization with self-weighted learning for data clustering

Dual local learning regularized nonnegative matrix factorization and its semi-supervised extension for clustering

Orthogonal Dual Graph-Regularized Nonnegative Matrix Factorization for Co-Clustering

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: Proof of Theorem 2

Proof

Appendix B: Proof of Theorem 4

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation