Abstract
In machine learning and data mining, concept factorization (CF) has achieved great success for its powerful capability in data representation. To learn an adaptive inherent graph structure of data space, and to ease the burden brought by the explicit orthogonality constraint, we propose a concept factorization with adaptive graph learning on the Stiefel manifold (AGCF-SM). The method essentially integrates concept factorization and manifold learning into a unified framework. Therein the adaptive similarity graph is learned by iterative locally linear embedding, which is free from dependence on neighbor sets. An iterative updating algorithm is developed and the convergence and complexity analyses of the algorithm are provided. The numerical experiments on ten benchmark datasets have demonstrated that the proposed algorithm outperforms other state-of-the-art algorithms.






Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
All data used during the study were provided freely on the website (http://www.cad.zju.edu.cn/home/dengcai/Data/data.html, https://archive.ics.uci.edu/datasets, and https://jundongl.github.io/scikit-feature/index.html).
Notes
https://github.com/huangsd/NMFAN
References
Lin X, Chen X, Zheng Z (2023) Deep manifold matrix factorization autoencoder using global connectivity for link prediction. Appl Intell 53(21):25816–25835. https://doi.org/10.1007/s10489-023-04887-9
Gao X, Zhang Z, Mu T et al (2020) Self-attention driven adversarial similarity learning network. Pattern Recogn, 105:107331. https://doi.org/10.1016/j.patcog.2020.107331
Wu W, Hou J, Wang S et al (2023) Semi-supervised adaptive kernel concept factorization. Pattern Recogn, 134:109114. https://doi.org/10.1016/j.patcog.2022.109114
Rahiche A, Cheriet M (2021) Blind decomposition of multispectral document images using orthogonal non-negative matrix factorization. IEEE Trans Image Process, 30:5997–6012. https://doi.org/10.1109/TIP.2021.3088266
Lee D, Seung H (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791. https://doi.org/10.1038/44565
Tang J, Wan Z (2021) Orthogonal dual graph-regularized non-negative matrix factorization for co-clustering. J Sci Comput 87(3):1–37. https://doi.org/10.1007/s10915-021-01489-w
Hien LTK, Gillis N (2021) Algorithms for non-negative matrix factorization with the Kullback-Leibler divergence. J Sci Comput 87(3):1–32. https://doi.org/10.1007/s10915-021-01504-0
Shu Z, Weng Z, Yu Z et al (2022) Correntropy-based dual graph regularized non-negative matrix factorization with \({L}_{p}\) smoothness for data representation. Appl Intell 52(7):7653–669. https://doi.org/10.1007/s10489-021-02826-0
Xu W, Gong Y (2004) Document clustering by concept factorization. In: Proceed 27th Ann Int ACM SIGIR Conf Res Dev Inf Retri, pp 202–209. https://doi.org/10.1145/1008992.1009029
Zhang Z, Zhang Y, Liu G et al (2020) Joint label prediction based semi-supervised adaptive concept factorization for robust data representation. IEEE Trans Knowl Data Eng 32(5):952–970. https://doi.org/10.1109/TKDE.2019.2893956
Zhou N, Chen B, Du Y et al (2020) Maximum correntropy criterion-based robust semisupervised concept factorization for image representation. IEEE Trans Neural Netw Learn Syst 31(10):3877–3891. https://doi.org/10.1109/TNNLS.2019.2947156
Peng S, Yang Z, Nie F et al (2022) Correntropy based semi-supervised concept factorization with adaptive neighbors for clustering. Neural Netw, 154:203–217. https://doi.org/10.1016/j.neunet.2022.07.021
Li Z, Yang Y (2023) Structurally incoherent adaptive weighted low-rank matrix decomposition for image classification. Appl Intell 53(21):25028–25041. https://doi.org/10.1007/s10489-023-04875-z
Deng P, Li T, Wang H et al (2021) Tri-regularized non-negative matrix tri-factorization for co-clustering. Knowl-Based Syst, 226:107101. https://doi.org/10.1016/j.knosys.2021.107101
Zhang L, Liu Z, Pu J et al (2020) Adaptive graph regularized non-negative matrix factorization for data representation. Appl Intell, 50:438–447. https://doi.org/10.1007/s10489-019-01539-9
Shu Z, Zuo F, Wu W et al (2023) Dual local learning regularized NMF with sparse and orthogonal constraints. Appl Intell 53(7):7713–7727. https://doi.org/10.1007/s10489-022-03881-x
Yang X, Che H, Leung MF et al (2023) Adaptive graph non-negative matrix factorization with the self-paced regularization. Appl Intell 53(12):15818–15835. https://doi.org/10.1007/s10489-022-04339-w
Tang J, Feng H (2022) Robust local-coordinate non-negative matrix factorization with adaptive graph for robust clustering. Inf Sci, 610:1058–1077. https://doi.org/10.1016/j.ins.2022.08.023
Chen M, Li X (2021) Concept factorization with local centroids. IEEE Trans Neural Netw Learn Syst 32(11):5247–5253. https://doi.org/10.1109/TNNLS.2020.3027068
Wu W, Chen Y, Wang R et al (2023) Self-representative kernel concept factorization. Knowl-Based Syst, 259:110051. https://doi.org/10.1016/j.knosys.2022.110051
Mu J, Song P, Liu X et al (2023) Dual-graph regularized concept factorization for multi-view clustering. Expert Syst Appl, 223:119949. https://doi.org/10.1016/j.eswa.2023.119949
Pei X, Chen C, Gong W (2018) Concept factorization with adaptive neighbors for document clustering. IEEE Trans Neural Netw Learn Syst 29(2):343–352. https://doi.org/10.1109/TNNLS.2016.2626311
Guo Y, Ding G, Zhou J et. al (2015) Robust and discriminative concept factorization for image representation. In: Proceed 5th ACM Int Conf Multimed Retr, pp 115-122. https://doi.org/10.1145/2671188.2749317
Yang B, Zhang X, Nie F et al (2023) ECCA: Efficient correntropy-based clustering algorithm with orthogonal concept factorization. IEEE Trans Neural Netw Learn Syst 34(10):7377–7390. https://doi.org/10.1109/TNNLS.2022.3142806
Ding C, Li T, Peng W et. al (2006) Orthogonal non-negative matrix tri-factorizations for clustering. In: Proceed ACM SIGKDD Int Conf Knowl Discov Data Min, pp 126–135. https://doi.org/10.1145/1150402.1150420
Huang K, Sidiropoulos ND, Swami A (2014) Non-negative matrix factorization revisited: Uniqueness and algorithm for symmetric decomposition. IEEE Trans Signal Process 62(1):211–224. https://doi.org/10.1109/TSP.2013.2285514
He P, Xu X, Ding J et al (2020) Low-rank non-negative matrix factorization on Stiefel manifold. Inf Sci, 514:131–148. https://doi.org/10.1016/j.ins.2019.12.004
Wang Q, He X, Jiang X et al (2022) Robust bi-stochastic graph regularized matrix factorization for data clustering. IEEE Trans Pattern Anal Mach Intell 44(1):390–403. https://doi.org/10.1109/TPAMI.2020.3007673
Wang S, Chang TH, Cui Y et al (2021) Clustering by orthogonal NMF model and non-convex penalty optimization. IEEE Trans Signal Process, 69:5273–5288. https://doi.org/10.1109/TSP.2021.3102106
Yang B, Zhang X, Nie F et al (2021) Fast multi-view clustering via non-negative and orthogonal factorization. IEEE Trans Image Process, 30:2575–2586
Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7(85):2399–2434
Huang S, Xu Z, Kang Z et al (2020) Regularized non-negative matrix factorization with adaptive local structure learning. Neurocomputing, 382:196–209. https://doi.org/10.1016/j.neucom.2019.11.070
Bai L, Cui L, Wang Y et. al (2022) HAQJSK: Hierarchical-aligned quantum Jensen-Shannon kernels for graph classification. 10.48550/arXiv.2211.02904
Li J, Zheng R, Feng H et. al (2024) Permutation equivariant graph framelets for heterophilous graph learning. IEEE Trans Neural Netw Learn Syst, pp 1–15. https://doi.org/10.1109/TNNLS.2024.3370918
Li M, Zhang L, Cui L et al (2023) Blog: Bootstrapped graph representation learning with local and global regularization for recommendation. Pattern Recogn, 144:109874. https://doi.org/10.1016/j.patcog.2023.109874
Cai D, He X, Han J (2010) Locally consistent concept factorization for document clustering. IEEE Trans Knowl Data Eng 23(6):902–913. https://doi.org/10.1109/TKDE.2010.165
Ye J, Jin Z (2014) Dual-graph regularized concept factorization for clustering. Neurocomputing, 138:120–130. https://doi.org/10.1016/j.neucom.2014.02.029
Ye J, Jin Z (2017) Graph-regularized local coordinate concept factorization for image representation. Neural Process Lett 46(2):427–449. https://doi.org/10.1007/s11063-017-9598-2
Li N, Leng C, Cheng I et al (2024) Dual-graph global and local concept factorization for data clustering. IEEE Trans Neural Netw Learn Syst 35(1):803–816. https://doi.org/10.1109/TNNLS.2022.3177433
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326. https://doi.org/10.1126/science.290.5500.2323
Yi Y, Wang J, Zhou W et al (2020) Non-negative matrix factorization with locality constrained adaptive graph. IEEE Trans Circuits Syst Video Technol 30(2):427–441. https://doi.org/10.1109/TCSVT.2019.2892971
Edelman A, Arias TA, Smith ST (1999) The geometry of algorithms with orthogonality constraints. SIAM J Matrix Anal Appl 20(2):303–353. https://doi.org/10.1137/S0895479895290954
Wei D, Shen X, Sun Q et al (2021) Adaptive graph guided concept factorization on Grassmann manifold. Inf Sci, 576:725–742. https://doi.org/10.1016/j.ins.2021.08.040
Lee D, Seung H (2001) Algorithms for non-negative matrix factorization. In: Proceed 13th Int Conf Neural Inf Process Syst, pp 535–541
Zhang Z, Zhang Y, Xu M et al (2021) A survey on concept factorization: From shallow to deep representation learning. Information Processing & Management 58(3):102534. https://doi.org/10.1016/j.ipm.2021.102534
Jannesari V, Keshvari M, Berahmand K (2024) A novel non-negative matrix factorization-based model for attributed graph clustering by incorporating complementary information. Expert Syst Appl, 242. https://doi.org/10.1016/j.eswa.2023.122799
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Grant 62173259) and the Natural Science Foundation of Hubei Province (No. 2022CFB110).
Author information
Authors and Affiliations
Contributions
Xuemin Hu: Conceptualization, Methodology, Software, Validation, Writing-Original draft preparation; Dan Xiong: Writing-Reviewing and Editing, Funding acquisition; Li Chai: Supervision and Funding acquisition.
Corresponding author
Ethics declarations
Ethical and informed consent for data used
No human participants and animals are involved in the research described in the article.
Competing Interests
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A The Derivation of (12a) - (12c)
The derivation includes the following three steps.
1) S-minimization
Fixing W and V , updating S : Optimizing (11) with respect to S is equivalent to solve
Let \( \mathrm{\Phi } \in \mathbb {R}^{n \times n}\) be the Lagrangian multiplier. Then, the Lagrange function \(\mathcal {L}(S, \mathrm{\Phi })\) of (A1) is written as
For the second equation, \(\textrm{Tr}(V^{T}LV)=\frac{1}{2}\sum _{i, j =1}^{n}\Vert V_{i}-V_{j}\Vert ^{2}R_{ij}\), and denote \( T_{ij} = \frac{1}{2}\Vert V_{i}-V_{j}\Vert ^{2}\), thus \( \textrm{Tr}(V^{T}LV) = \textrm{Tr}(TR)\).
Taking the derivative of \(\mathcal {L}(S, \mathrm{\Phi })\) with respect to S and setting it to zero, we have
Using the KKT condition, \(\mathrm{\Phi }_{ij}S_{ij}=0\), we have
Then, we obtain
2) V-minimization
Fixing W and S , updating V : Optimizing (11) w.r.t. V is equivalent to solve
As V lies on Stiefel manifold, we compute the natural gradient \(\widetilde{\nabla }_{V} \mathcal {O}_{1} \) on Stiefel manifold. Firstly, we calculate the derivative of \(\mathcal {O}_{1}\) with respect to V in the Euclidean space,
Secondly, we use \(\nabla _{V} \mathcal {O}_{1}\) to compute the natural gradient \(\widetilde{\nabla }_{V} \mathcal {O}_{1} \) by (9),
Using the KKT conditions, \( V \odot \widetilde{\nabla }_{V} \mathcal {O}_{1} = \textbf{0} \). The updating rule of V is
3) W-minimization
Fixing S and V , updating W : Optimizing (11) w.r.t. W is equivalent to solve
By introducing the Lagrangian multiplier \( \mathrm{\Psi }\in \mathbb {R}^{n \times c}\), the Lagrange function \(\mathcal {L}(W, \mathrm{\Psi })\) of (A3) is written as
Taking the derivative of the Lagrange function \(\mathcal {L}(W, \mathrm \Psi )\) with respect to W and setting it to zero, we have
Using the KKT conditions, \( \mathrm{\Psi }_{jk} W_{jk} = 0 \), we have
Then, the updating rule of W is
Appendix B Proof of Theorem 1
In order to prove Theorem 1, we use the property of auxiliary function [44].
Definition 1
\(G(h, h^{*})\) is an auxiliary function for F(h) under the conditions
Lemma 1
If \(G(h, h^{*})\) is an auxiliary function of F(h), then F(h) is non-increasing under the following updating rule
Considering the orthogonality of V, the objective function (11) is reformulated as
where \(\mathrm{\Gamma }\) is the Lagrange multiplier.
Denote \(F_{S}(S_{i j})\), \(F_{V}(V_{j k})\) and \(F_{W}(W_{j k})\) as the corresponding objective function only with respect to S, V and W, respectively. Specifically, they are
To prove the theorem, we construct auxiliary functions for \(F_{S}(S_{ij})\), \(F_{V}(V_{j k})\) and \(F_{W}(W_{j k})\), which are presented as Lemma 2, 3 and 4, respectively.
Lemma 2
Define
Then, \(G_{S}(S_{i j}, S_{i j}^{t})\) is an auxiliary function for \(F_{S}(S_{i j})\).
Proof
\(G_{S}(S_{i j}, S_{i j}) = F_{S}(S_{i j})\) is obvious. Next, we prove \(F_{S}(S_{i j}) \le G_{S}(S_{i j}, S_{i j}^{t})\).
The derivatives of the function \(F_{S}(S_{ij})\) are
Therefore, the Taylor series expansion of the function \(F_{S}(S_{i j})\) is written as
Compared with (B3), we find that \(F_{S}(S_{i j})\le G_{S}(S_{i j}, S_{i j}^{t})\) is equivalent to
It is easy to check that
Thus, \(F_{S}(S_{i j})\le G_{S}(S_{i j}, S_{i j}^{t})\). \(\square \)
Lemma 3
Define
where
Then, \(G_{V}(V_{j k}, V_{j k}^{t})\) is an auxiliary function for \(F_{V}(V_{j k})\).
Proof
Obviously, \(F_{V}(V_{j k})=G_{V}(V_{j k}, V_{j k})\) holds. Next, we prove \(F_{V}(V_{j k})\le G_{V}(V_{j k}, V_{j k}^{t})\).
To do this, we need to use the Taylor series expansion of the function \(F_{V}(V_{j k})\). The derivatives of the function \(F_{V}(V_{j k})\) are
Therefore, the Taylor series expansion of the function \(F_{V}(V_{j k})\) is
Compared with (B4), it is sufficient to show that
We obtain the following inequalities,
Thus, \(F_{V}(V_{j k})\le G_{V}(V_{j k}, V_{j k}^{t})\).\(\square \)
Lemma 4
Define
Then, \(G_{W}(W_{j k}, W_{j k}^{t})\) is an auxiliary function for \(F_{W}(W_{j k})\).
Proof
The proof follows the same line as Lemma 2 and Lemma 3. That is, \(G_{W}(W_{j k}, W_{j k}) = F_{W}(W_{j k})\) is obvious, we only need to prove \(G_{W}\left( W_{j k}, W_{j k}^{t}\right) \ge F_{W}(W_{j k})\).
To do this, we need to obtain the Taylor series expansion of the function \(F_{W}(W_{j k})\). The derivatives of the function \(F_{W}(W_{j k})\) are
Therefore, the Taylor series expansion of the function \(F_{W}(W_{j k})\) is obtained as
Compared with (B6), we only need to show that
It is easy to check that
Thus \(G_{W}(W_{j k}, W_{j k}^{t}) \ge F_{W}(W_{j k}).\)
Therefore, \(G_{W}(W_{j k}, W_{j k}^{t})\) is an auxiliary function for \(F_{W}(W_{j k})\).\(\square \)
Now we give the proof of Theorem 1 based on the above lemmas.
Proof of Theorem
1 According to Lemma 2 - Lemma 4, (B3), (B4) and (B6) are auxiliary functions for \(F_{S}(S_{i j})\), \(F_{V}(V_{j k})\) and \(F_{W}(W_{j k})\) respectively.
Now we prove that the updating rules are exactly the optimal solutions for the auxiliary function. For variable S, by minimizing (B3), we get
When \(V^{t}\) and \(W^{t}\) are fixed, \(F_{S}(S_{ij})\) is non-increasing under the rule in (12a).
For variable V, by minimizing (B4), we have
By fixing \(W^{t}\) and \(S^{t+1}\), \(F_{V}(V_{jk})\) is non-increasing under the rule in (12b).
For variable W, by minimizing (B6), we get the updating rule
when \(V^{t+1}\) and \(S^{t+1}\) are fixed, \(F_{W}(W_{j k})\) is non-increasing under the rule in (12c).
Therefore, in the t-th iteration, when \(V^t\) and \(W^t\) are fixed, the following inequality holds
and when \(S^{t+1}\) and \(W^t\) are fixed, we have
and when \(V^{t+1}\) and \(S^{t+1}\) are fixed, we get
As a result, we have
Therefore, the objective function in (11) is non-increasing under the updating rules (12a), (12b) and (12c).\(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hu, X., Xiong, D. & Chai, L. Concept factorization with adaptive graph learning on Stiefel manifold. Appl Intell 54, 8224–8240 (2024). https://doi.org/10.1007/s10489-024-05606-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-024-05606-8