Skip to main content
Log in

Concept factorization with adaptive graph learning on Stiefel manifold

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In machine learning and data mining, concept factorization (CF) has achieved great success for its powerful capability in data representation. To learn an adaptive inherent graph structure of data space, and to ease the burden brought by the explicit orthogonality constraint, we propose a concept factorization with adaptive graph learning on the Stiefel manifold (AGCF-SM). The method essentially integrates concept factorization and manifold learning into a unified framework. Therein the adaptive similarity graph is learned by iterative locally linear embedding, which is free from dependence on neighbor sets. An iterative updating algorithm is developed and the convergence and complexity analyses of the algorithm are provided. The numerical experiments on ten benchmark datasets have demonstrated that the proposed algorithm outperforms other state-of-the-art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

All data used during the study were provided freely on the website (http://www.cad.zju.edu.cn/home/dengcai/Data/data.html, https://archive.ics.uci.edu/datasets, and https://jundongl.github.io/scikit-feature/index.html).

Notes

  1. https://github.com/huangsd/NMFAN

References

  1. Lin X, Chen X, Zheng Z (2023) Deep manifold matrix factorization autoencoder using global connectivity for link prediction. Appl Intell 53(21):25816–25835. https://doi.org/10.1007/s10489-023-04887-9

    Article  Google Scholar 

  2. Gao X, Zhang Z, Mu T et al (2020) Self-attention driven adversarial similarity learning network. Pattern Recogn, 105:107331. https://doi.org/10.1016/j.patcog.2020.107331

  3. Wu W, Hou J, Wang S et al (2023) Semi-supervised adaptive kernel concept factorization. Pattern Recogn, 134:109114. https://doi.org/10.1016/j.patcog.2022.109114

  4. Rahiche A, Cheriet M (2021) Blind decomposition of multispectral document images using orthogonal non-negative matrix factorization. IEEE Trans Image Process, 30:5997–6012. https://doi.org/10.1109/TIP.2021.3088266

  5. Lee D, Seung H (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791. https://doi.org/10.1038/44565

    Article  Google Scholar 

  6. Tang J, Wan Z (2021) Orthogonal dual graph-regularized non-negative matrix factorization for co-clustering. J Sci Comput 87(3):1–37. https://doi.org/10.1007/s10915-021-01489-w

    Article  MathSciNet  Google Scholar 

  7. Hien LTK, Gillis N (2021) Algorithms for non-negative matrix factorization with the Kullback-Leibler divergence. J Sci Comput 87(3):1–32. https://doi.org/10.1007/s10915-021-01504-0

    Article  MathSciNet  Google Scholar 

  8. Shu Z, Weng Z, Yu Z et al (2022) Correntropy-based dual graph regularized non-negative matrix factorization with \({L}_{p}\) smoothness for data representation. Appl Intell 52(7):7653–669. https://doi.org/10.1007/s10489-021-02826-0

    Article  Google Scholar 

  9. Xu W, Gong Y (2004) Document clustering by concept factorization. In: Proceed 27th Ann Int ACM SIGIR Conf Res Dev Inf Retri, pp 202–209. https://doi.org/10.1145/1008992.1009029

  10. Zhang Z, Zhang Y, Liu G et al (2020) Joint label prediction based semi-supervised adaptive concept factorization for robust data representation. IEEE Trans Knowl Data Eng 32(5):952–970. https://doi.org/10.1109/TKDE.2019.2893956

    Article  Google Scholar 

  11. Zhou N, Chen B, Du Y et al (2020) Maximum correntropy criterion-based robust semisupervised concept factorization for image representation. IEEE Trans Neural Netw Learn Syst 31(10):3877–3891. https://doi.org/10.1109/TNNLS.2019.2947156

    Article  MathSciNet  Google Scholar 

  12. Peng S, Yang Z, Nie F et al (2022) Correntropy based semi-supervised concept factorization with adaptive neighbors for clustering. Neural Netw, 154:203–217. https://doi.org/10.1016/j.neunet.2022.07.021

  13. Li Z, Yang Y (2023) Structurally incoherent adaptive weighted low-rank matrix decomposition for image classification. Appl Intell 53(21):25028–25041. https://doi.org/10.1007/s10489-023-04875-z

    Article  Google Scholar 

  14. Deng P, Li T, Wang H et al (2021) Tri-regularized non-negative matrix tri-factorization for co-clustering. Knowl-Based Syst, 226:107101. https://doi.org/10.1016/j.knosys.2021.107101

  15. Zhang L, Liu Z, Pu J et al (2020) Adaptive graph regularized non-negative matrix factorization for data representation. Appl Intell, 50:438–447. https://doi.org/10.1007/s10489-019-01539-9

  16. Shu Z, Zuo F, Wu W et al (2023) Dual local learning regularized NMF with sparse and orthogonal constraints. Appl Intell 53(7):7713–7727. https://doi.org/10.1007/s10489-022-03881-x

    Article  Google Scholar 

  17. Yang X, Che H, Leung MF et al (2023) Adaptive graph non-negative matrix factorization with the self-paced regularization. Appl Intell 53(12):15818–15835. https://doi.org/10.1007/s10489-022-04339-w

    Article  Google Scholar 

  18. Tang J, Feng H (2022) Robust local-coordinate non-negative matrix factorization with adaptive graph for robust clustering. Inf Sci, 610:1058–1077. https://doi.org/10.1016/j.ins.2022.08.023

  19. Chen M, Li X (2021) Concept factorization with local centroids. IEEE Trans Neural Netw Learn Syst 32(11):5247–5253. https://doi.org/10.1109/TNNLS.2020.3027068

    Article  MathSciNet  Google Scholar 

  20. Wu W, Chen Y, Wang R et al (2023) Self-representative kernel concept factorization. Knowl-Based Syst, 259:110051. https://doi.org/10.1016/j.knosys.2022.110051

  21. Mu J, Song P, Liu X et al (2023) Dual-graph regularized concept factorization for multi-view clustering. Expert Syst Appl, 223:119949. https://doi.org/10.1016/j.eswa.2023.119949

  22. Pei X, Chen C, Gong W (2018) Concept factorization with adaptive neighbors for document clustering. IEEE Trans Neural Netw Learn Syst 29(2):343–352. https://doi.org/10.1109/TNNLS.2016.2626311

    Article  MathSciNet  Google Scholar 

  23. Guo Y, Ding G, Zhou J et. al (2015) Robust and discriminative concept factorization for image representation. In: Proceed 5th ACM Int Conf Multimed Retr, pp 115-122. https://doi.org/10.1145/2671188.2749317

  24. Yang B, Zhang X, Nie F et al (2023) ECCA: Efficient correntropy-based clustering algorithm with orthogonal concept factorization. IEEE Trans Neural Netw Learn Syst 34(10):7377–7390. https://doi.org/10.1109/TNNLS.2022.3142806

    Article  MathSciNet  Google Scholar 

  25. Ding C, Li T, Peng W et. al (2006) Orthogonal non-negative matrix tri-factorizations for clustering. In: Proceed ACM SIGKDD Int Conf Knowl Discov Data Min, pp 126–135. https://doi.org/10.1145/1150402.1150420

  26. Huang K, Sidiropoulos ND, Swami A (2014) Non-negative matrix factorization revisited: Uniqueness and algorithm for symmetric decomposition. IEEE Trans Signal Process 62(1):211–224. https://doi.org/10.1109/TSP.2013.2285514

    Article  MathSciNet  Google Scholar 

  27. He P, Xu X, Ding J et al (2020) Low-rank non-negative matrix factorization on Stiefel manifold. Inf Sci, 514:131–148. https://doi.org/10.1016/j.ins.2019.12.004

  28. Wang Q, He X, Jiang X et al (2022) Robust bi-stochastic graph regularized matrix factorization for data clustering. IEEE Trans Pattern Anal Mach Intell 44(1):390–403. https://doi.org/10.1109/TPAMI.2020.3007673

    Article  Google Scholar 

  29. Wang S, Chang TH, Cui Y et al (2021) Clustering by orthogonal NMF model and non-convex penalty optimization. IEEE Trans Signal Process, 69:5273–5288. https://doi.org/10.1109/TSP.2021.3102106

  30. Yang B, Zhang X, Nie F et al (2021) Fast multi-view clustering via non-negative and orthogonal factorization. IEEE Trans Image Process, 30:2575–2586

  31. Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7(85):2399–2434

    MathSciNet  Google Scholar 

  32. Huang S, Xu Z, Kang Z et al (2020) Regularized non-negative matrix factorization with adaptive local structure learning. Neurocomputing, 382:196–209. https://doi.org/10.1016/j.neucom.2019.11.070

  33. Bai L, Cui L, Wang Y et. al (2022) HAQJSK: Hierarchical-aligned quantum Jensen-Shannon kernels for graph classification. 10.48550/arXiv.2211.02904

  34. Li J, Zheng R, Feng H et. al (2024) Permutation equivariant graph framelets for heterophilous graph learning. IEEE Trans Neural Netw Learn Syst, pp 1–15. https://doi.org/10.1109/TNNLS.2024.3370918

  35. Li M, Zhang L, Cui L et al (2023) Blog: Bootstrapped graph representation learning with local and global regularization for recommendation. Pattern Recogn, 144:109874. https://doi.org/10.1016/j.patcog.2023.109874

  36. Cai D, He X, Han J (2010) Locally consistent concept factorization for document clustering. IEEE Trans Knowl Data Eng 23(6):902–913. https://doi.org/10.1109/TKDE.2010.165

    Article  Google Scholar 

  37. Ye J, Jin Z (2014) Dual-graph regularized concept factorization for clustering. Neurocomputing, 138:120–130. https://doi.org/10.1016/j.neucom.2014.02.029

  38. Ye J, Jin Z (2017) Graph-regularized local coordinate concept factorization for image representation. Neural Process Lett 46(2):427–449. https://doi.org/10.1007/s11063-017-9598-2

  39. Li N, Leng C, Cheng I et al (2024) Dual-graph global and local concept factorization for data clustering. IEEE Trans Neural Netw Learn Syst 35(1):803–816. https://doi.org/10.1109/TNNLS.2022.3177433

    Article  MathSciNet  Google Scholar 

  40. Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326. https://doi.org/10.1126/science.290.5500.2323

    Article  Google Scholar 

  41. Yi Y, Wang J, Zhou W et al (2020) Non-negative matrix factorization with locality constrained adaptive graph. IEEE Trans Circuits Syst Video Technol 30(2):427–441. https://doi.org/10.1109/TCSVT.2019.2892971

    Article  Google Scholar 

  42. Edelman A, Arias TA, Smith ST (1999) The geometry of algorithms with orthogonality constraints. SIAM J Matrix Anal Appl 20(2):303–353. https://doi.org/10.1137/S0895479895290954

    Article  MathSciNet  Google Scholar 

  43. Wei D, Shen X, Sun Q et al (2021) Adaptive graph guided concept factorization on Grassmann manifold. Inf Sci, 576:725–742. https://doi.org/10.1016/j.ins.2021.08.040

  44. Lee D, Seung H (2001) Algorithms for non-negative matrix factorization. In: Proceed 13th Int Conf Neural Inf Process Syst, pp 535–541

  45. Zhang Z, Zhang Y, Xu M et al (2021) A survey on concept factorization: From shallow to deep representation learning. Information Processing & Management 58(3):102534. https://doi.org/10.1016/j.ipm.2021.102534

    Article  Google Scholar 

  46. Jannesari V, Keshvari M, Berahmand K (2024) A novel non-negative matrix factorization-based model for attributed graph clustering by incorporating complementary information. Expert Syst Appl, 242. https://doi.org/10.1016/j.eswa.2023.122799

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant 62173259) and the Natural Science Foundation of Hubei Province (No. 2022CFB110).

Author information

Authors and Affiliations

Authors

Contributions

Xuemin Hu: Conceptualization, Methodology, Software, Validation, Writing-Original draft preparation; Dan Xiong: Writing-Reviewing and Editing, Funding acquisition; Li Chai: Supervision and Funding acquisition.

Corresponding author

Correspondence to Dan Xiong.

Ethics declarations

Ethical and informed consent for data used

No human participants and animals are involved in the research described in the article.

Competing Interests

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A    The Derivation of (12a) - (12c)

  The derivation includes the following three steps.

1) S-minimization

Fixing W and V , updating S : Optimizing (11) with respect to S is equivalent to solve

$$\begin{aligned} \min _{S}\ \lambda \textrm{Tr}(V^{T} L V)+ \Vert X-XS\Vert _{F}^{2}+\alpha \Vert S\Vert _{F}^{2}+\beta \Vert S\Vert _{1,1},\ \text {s.t.}\ \ S \ge \textbf{0}.~ \end{aligned}$$
(A1)

Let \( \mathrm{\Phi } \in \mathbb {R}^{n \times n}\) be the Lagrangian multiplier. Then, the Lagrange function \(\mathcal {L}(S, \mathrm{\Phi })\) of (A1) is written as

$$\begin{aligned} \mathcal {L}(S, \mathrm{\Phi })&= \lambda \textrm{Tr}(V^{T} L V)+\Vert X-XS\Vert _{F}^{2}+\alpha \Vert S\Vert _{F}^{2}+\beta \Vert S\Vert _{1,1}-\textrm{Tr}( \mathrm{\Phi }S )\\&= \lambda \textrm{Tr}(TR)+\Vert X-XS\Vert _{F}^{2}+\alpha \textrm{Tr}(S^{T}S)+\beta \textrm{Tr}(\textrm{E} S)-\textrm{Tr}( \mathrm{\Phi } S).~ \end{aligned}$$

For the second equation, \(\textrm{Tr}(V^{T}LV)=\frac{1}{2}\sum _{i, j =1}^{n}\Vert V_{i}-V_{j}\Vert ^{2}R_{ij}\), and denote \( T_{ij} = \frac{1}{2}\Vert V_{i}-V_{j}\Vert ^{2}\), thus \( \textrm{Tr}(V^{T}LV) = \textrm{Tr}(TR)\).

Taking the derivative of \(\mathcal {L}(S, \mathrm{\Phi })\) with respect to S and setting it to zero, we have

$$\begin{aligned} \lambda T-2 X^{T} X + 2 X^{T} X S + 2\alpha S+\beta \textrm{E} -\mathrm{\Phi } = \textbf{0}. \end{aligned}$$

Using the KKT condition, \(\mathrm{\Phi }_{ij}S_{ij}=0\), we have

$$\begin{aligned} (\lambda T-2 X^{T} X + 2X^{T} X S + 2\alpha S+\beta \textrm{E})_{ij}S_{ij}=0. \end{aligned}$$

Then, we obtain

$$\begin{aligned} S_{ij}^{t+1}\leftarrow S_{ij}^{t}\frac{(X^{T} X)_{ij}}{(\frac{\lambda }{2} T+ X^{T} X S + \alpha S+\frac{\beta }{2}\textrm{E})}_{ij}. \end{aligned}$$

2) V-minimization

Fixing W and S , updating V : Optimizing (11) w.r.t. V is equivalent to solve

$$\begin{aligned} \min _{V} \mathcal {O}_{1}=\Vert X-X W V^{T}\Vert _{F}^{2}+\lambda \textrm{Tr}( V^{T} L V ), \ \text {s.t.} \ V \ge \textbf{0},\ \ V\in \mathcal {M}. \end{aligned}$$
(A2)

As V lies on Stiefel manifold, we compute the natural gradient \(\widetilde{\nabla }_{V} \mathcal {O}_{1} \) on Stiefel manifold. Firstly, we calculate the derivative of \(\mathcal {O}_{1}\) with respect to V in the Euclidean space,

$$\begin{aligned} \nabla _{V}\mathcal {O}_{1}{} & {} = -2 X^{T} X W +2V W^{T} X^{T} X W+2\lambda L V\\{} & {} \!=\!(2VW^{T} X^{T} X W\!\!+\!\!2\lambda DV)\!-\!(2 X^{T} X W\!\!+\!\!2\lambda RV). \end{aligned}$$

Secondly, we use \(\nabla _{V} \mathcal {O}_{1}\) to compute the natural gradient \(\widetilde{\nabla }_{V} \mathcal {O}_{1} \) by (9),

$$\begin{aligned} \widetilde{\nabla }_{V}\mathcal {O}_{1}&= \nabla _{V} \mathcal {O}_{1}- V(\nabla _{V} \mathcal {O}_{1})^{T}V \\&\!=\! 2 V W^{T} X^{T} X V \!-\!2 X^{T} X W \!\!+\!\! 2\lambda (D V\!-\! R V \!-\! V V^{T}D^{T} V\!\!+\!\! VV^{T}R^{T}V). \end{aligned}$$

Using the KKT conditions, \( V \odot \widetilde{\nabla }_{V} \mathcal {O}_{1} = \textbf{0} \). The updating rule of V is

$$\begin{aligned} V_{jk}^{t+1}\leftarrow V_{jk}^{t} \frac{(X^{T} X W+\lambda (R V+V{V}^{T}{D}^{T}V))_{jk}}{(V{W}^{T} X^{T} X V+\lambda (D V+V{V}^{T} {R}^{T} V))_{jk}}. \end{aligned}$$

3) W-minimization

Fixing S and V , updating W : Optimizing (11) w.r.t. W is equivalent to solve

$$\begin{aligned} \min _{W} \Vert X-X W V^{T}\Vert _{F}^{2}, \ \text {s.t.}\ W \ge \textbf{0}.~ \end{aligned}$$
(A3)

By introducing the Lagrangian multiplier \( \mathrm{\Psi }\in \mathbb {R}^{n \times c}\), the Lagrange function \(\mathcal {L}(W, \mathrm{\Psi })\) of (A3) is written as

$$\begin{aligned} \mathcal {L}(W, \mathrm{\Psi })= \Vert X-X W V^{T}\Vert _{F}^{2} - \textrm{Tr}(\mathrm{\Psi } W^{T}).~ \end{aligned}$$

Taking the derivative of the Lagrange function \(\mathcal {L}(W, \mathrm \Psi )\) with respect to W and setting it to zero, we have

$$\begin{aligned} - 2 X^{T} X V +2 X^{T} X W V^{T} V - \mathrm{\Psi } = \textbf{0}. \end{aligned}$$

Using the KKT conditions, \( \mathrm{\Psi }_{jk} W_{jk} = 0 \), we have

$$\begin{aligned} (- 2 X^{T} X V +2 X^{T} X W V^{T}V)_{jk} W_{jk}=0. \end{aligned}$$

Then, the updating rule of W is

$$\begin{aligned} W_{jk}^{t+1}\leftarrow W_{jk}^{t}\frac{(X^{T} X V)_{jk}}{(X^{T} X W {V}^{T} V)_{jk}}. \end{aligned}$$

Appendix B    Proof of Theorem 1

In order to prove Theorem 1, we use the property of auxiliary function [44].

Definition 1

\(G(h, h^{*})\) is an auxiliary function for F(h) under the conditions

$$G(h, h^{*})\ge F(h),\ \ G(h, h)= F(h).$$

Lemma 1

If \(G(h, h^{*})\) is an auxiliary function of F(h), then F(h) is non-increasing under the following updating rule

$$\begin{aligned} h^{t+1} = \mathop {\arg \min }_{h} G(h,h^{t}). \end{aligned}$$
(B1)

Considering the orthogonality of V, the objective function (11) is reformulated as

$$\begin{aligned} F(W,V,S) = \Vert X-X W V^{T}\Vert _{F}^{2} + \lambda \textrm{Tr}(V^{T}LV) + \Vert X-XS\Vert _{F}^{2} \nonumber \\ + \alpha \Vert S\Vert _{F}^{2} + \beta \Vert S\Vert _{1,1}+\textrm{Tr}(\mathrm{\Gamma }(V^{T} V - {I})), \end{aligned}$$
(B2)

where \(\mathrm{\Gamma }\) is the Lagrange multiplier.

Denote \(F_{S}(S_{i j})\), \(F_{V}(V_{j k})\) and \(F_{W}(W_{j k})\) as the corresponding objective function only with respect to S, V and W, respectively. Specifically, they are

$$\begin{aligned} F_{S}(S_{i j})= & {} (\lambda T R-2 X^{T} X S + S^{T} X^{T} X S + \alpha S^{T}S + \beta \mathrm{{E}}S)_{i j},\\ F_{V}(V_{j k})= & {} (-2 X^{T} X W V^{T} + V W^{T} X^{T} X W V^{T}+ V^{T} L V+ \mathrm{\Gamma }( V^{T} V - {I} ))_{j k},\\ F_{W}(W_{j k})= & {} (-2 X^{T} X W V^{T}+V W^{T} X^{T} X W V^{T})_{j k}. \end{aligned}$$

To prove the theorem, we construct auxiliary functions for \(F_{S}(S_{ij})\), \(F_{V}(V_{j k})\) and \(F_{W}(W_{j k})\), which are presented as Lemma 2, 3 and 4, respectively.

Lemma 2

Define

$$\begin{aligned} G_{S}(S_{i j}, S_{i j}^{t}) = F_{S}(S_{i j}^{t})&+ F_{S}^{\prime }(S_{i j}^{t}) (S_{i j}-S_{i j}^{t})\nonumber \\&+ \frac{(\frac{\lambda }{2} T+ X^{T} X S^t +\alpha S^t + \frac{\beta }{2} {\mathrm{{E}}})_{i j}}{S_{i j}^{t}}(S_{i j}-S_{i j}^{t})^{2}. \end{aligned}$$
(B3)

Then, \(G_{S}(S_{i j}, S_{i j}^{t})\) is an auxiliary function for \(F_{S}(S_{i j})\).

Proof

\(G_{S}(S_{i j}, S_{i j}) = F_{S}(S_{i j})\) is obvious. Next, we prove \(F_{S}(S_{i j}) \le G_{S}(S_{i j}, S_{i j}^{t})\).

The derivatives of the function \(F_{S}(S_{ij})\) are

$$\begin{aligned} F^{'}_{S}(S_{ij})&= (\lambda T - 2 X^{T} X + 2 X^{T} X S + 2 \alpha S+\beta \mathrm{{E}})_{ij}, \\ F^{''}_{S}(S_{ij})&= 2 (X^{T} X)_{ii}+2 \alpha ,\\ F^{(k)}_{S}(S_{ij})&= 0, \ k \ge 3. \end{aligned}$$

Therefore, the Taylor series expansion of the function \(F_{S}(S_{i j})\) is written as

$$\begin{aligned} F_{S}(S_{i j})= & {} F_{S}(S_{i j}^{t})+F_{S}^{'}(S_{i j}^{t})(S_{i j}-S_{i j}^{t})+\frac{ F_{S}^{''}(S_{i j}^{t})}{2}(S_{i j}-S_{i j}^{t})^{2}\\= & {} F_{S}(S_{i j}^{t})+F_{S}^{\prime }(S_{i j}^{t})(S_{i j}-S_{i j}^{t})+((X^{T} X)_{i i}+\alpha )(S_{i j}-S_{i j}^{t})^{2}.\\ \end{aligned}$$

Compared with (B3), we find that \(F_{S}(S_{i j})\le G_{S}(S_{i j}, S_{i j}^{t})\) is equivalent to

$$\begin{aligned} \frac{(\frac{\lambda }{2} T+ X^{T} X S^t +\alpha S^t + \frac{\beta }{2} \mathrm{{E}})_{i j}}{S_{i j}^{t}}\ge (X^{T} X)_{i i}+ \alpha . \end{aligned}$$

It is easy to check that

$$ (X^{T} X S^t)_{ij} = \sum _{\nu }(X^{T} X)_{i \nu }S_{\nu j}^{t}\ge (X^{T} X)_{ii} S_{ij}^{t}.$$

Thus, \(F_{S}(S_{i j})\le G_{S}(S_{i j}, S_{i j}^{t})\). \(\square \)

Lemma 3

Define

$$\begin{aligned} G_{V}(V_{j k}, V_{j k}^{t}) \!\!=\!\! F_{V}(V_{j k}^{t}){} & {} +F_{V}^{\prime }(V_{j k}^{t})(V_{j k}-V_{j k}^{t})\nonumber \\{} & {} \!+\!\frac{(V^t W^{T} X^{T} X W\!+\!\lambda D V^t \!+\! V^t \mathrm{\Gamma }_{1})_{j k}}{V_{j k}^{t}}(V_{j k}\!-\!V_{j k}^{t})^{2}, \end{aligned}$$
(B4)

where

$$\begin{aligned} \begin{array}{l} \mathrm{\Gamma }_{1}\!=\!\mathrm{\Gamma }\!+\!\mathrm{\Gamma }_{2}\!=\!W^{T} X^{T} X V^t\!+\!\lambda (V^t)^{T} R V^t\!-\!W^{T} X^{T} X W ,\\ \mathrm{\Gamma }_{2}=\lambda (V^t)^{T} D V^t \ge 0. \end{array} \end{aligned}$$
(B5)

Then, \(G_{V}(V_{j k}, V_{j k}^{t})\) is an auxiliary function for \(F_{V}(V_{j k})\).

Proof

Obviously, \(F_{V}(V_{j k})=G_{V}(V_{j k}, V_{j k})\) holds. Next, we prove \(F_{V}(V_{j k})\le G_{V}(V_{j k}, V_{j k}^{t})\).

To do this, we need to use the Taylor series expansion of the function \(F_{V}(V_{j k})\). The derivatives of the function \(F_{V}(V_{j k})\) are

$$\begin{aligned} F_{V}^{'}(V_{j k})&\!=\!2(-X^{T} X V + V W^{T} X^{T} X W\!+\!\lambda L V \!+\! \mathrm{\Gamma }V )_{j k},\\ F_{V}^{''}(V_{j k})&=2( (W^{T} X^{T} X W)_{kk}+\lambda L_{jj} + \mathrm{\Gamma }_{kk}),\\ F_{V}^{(k)}(V_{j k})&= 0,\ \ k \ge 3. \end{aligned}$$

Therefore, the Taylor series expansion of the function \(F_{V}(V_{j k})\) is

$$\begin{aligned} F_{V}(V_{j k})= & {} F_{V}(V_{j k}^{t}) +F_{V}^{\prime }(V_{j k}^{t})(V_{j k}-V_{j k}^{t})\nonumber \\{} & {} + ((W^{T} X^{T} X W)_{k k}+\lambda L_{j j}+\mathrm{\Gamma }_{k k})(V_{j k}-V_{j k}^{t})^{2}. \end{aligned}$$

Compared with (B4), it is sufficient to show that

$$\begin{aligned} \frac{(V^{t} W^{T} X^{T} X W+\lambda D V^{t}+ V^{t} \mathrm{\Gamma }_{1})_{j k}}{V_{j k}^{t}} \ge (W^{T} X^{T} X W)_{k k}+\lambda L_{j j}+\mathrm{\Gamma }_{k k}. \end{aligned}$$

We obtain the following inequalities,

$$\begin{aligned} (V^{t} W^{T} X^{T} X W)_{j k}= & {} \sum _{\omega } V_{j \omega }^{t}(W^{T} X^{T} X W)_{\omega k} \ge V_{j k}^{t}(W^{T} X^{T} X W)_{k k}, \\ (D V^{t})_{j k}= & {} \sum _{h} D_{j h} V_{h k}^{t} \ge D_{j j} V_{j k}^{t} \ge (D- R )_{j j} V_{j k}^{t}=L_{j j} V_{j k}^{t}, \\ ( V^{t}\mathrm{\Gamma }_{1})_{j k}= & {} \sum _{\tau } V_{j \tau }^{t}(\mathrm{\Gamma }_{1})_{\tau k} \ge V_{j k}^{t} (\mathrm{\Gamma }_{1})_{k k} \ge V_{j k}^{t}(\mathrm{\Gamma }_{1}-\mathrm{\Gamma }_{2})_{k k}=V_{j k}^{t}\mathrm{\Gamma }_{k k}. \end{aligned}$$

Thus, \(F_{V}(V_{j k})\le G_{V}(V_{j k}, V_{j k}^{t})\).\(\square \)

Lemma 4

Define

$$\begin{aligned} G_{W}(W_{j k}, W_{j k}^{t}) = F_{W}(W_{j k}^{t})+F_{W}^{\prime }(W_{j k}^{t})(W_{j k}-W_{j k}^{t})+\frac{( X^{T} X W^t V^{T}V)_{j k}}{W_{j k}^{t}}(W_{j k}-W_{j k}^{t})^{2}. \end{aligned}$$
(B6)

Then, \(G_{W}(W_{j k}, W_{j k}^{t})\) is an auxiliary function for \(F_{W}(W_{j k})\).

Proof

The proof follows the same line as Lemma 2 and Lemma 3. That is, \(G_{W}(W_{j k}, W_{j k}) = F_{W}(W_{j k})\) is obvious, we only need to prove \(G_{W}\left( W_{j k}, W_{j k}^{t}\right) \ge F_{W}(W_{j k})\).

To do this, we need to obtain the Taylor series expansion of the function \(F_{W}(W_{j k})\). The derivatives of the function \(F_{W}(W_{j k})\) are

$$\begin{aligned} F_{W}^{'}(W_{j k})&= 2(-X^{T} X W + X^{T} X W V V^{T})_{j k},\\ F_{W}^{''}(W_{j k})&= 2 (X^{T} X)_{j j}(V V^{T})_{kk},\\ F_{W}^{(k)}(W_{j k})&=0,\ k\ge 3. \end{aligned}$$

Therefore, the Taylor series expansion of the function \(F_{W}(W_{j k})\) is obtained as

$$\begin{aligned} F_{W}(W_{j k})= & {} F_{W}(W_{j k}^{t})+F_{W}^{\prime }(W_{j k}^{t})(W_{j k}-W_{j k}^{t})\nonumber \\{} & {} + ( (X^{T} X)_{j j}\ (V V^{T})_{kk})(W_{j k}-W_{j k}^{t})^{2}. \end{aligned}$$

Compared with (B6), we only need to show that

$$\begin{aligned} \frac{(X^{T} X W^t V V^{T})_{j k}}{W_{j k}^{t}} \ge (X^{T} X)_{jj}\ (V V^{T})_{kk}. \end{aligned}$$
(B7)

It is easy to check that

$$\begin{aligned} (X^{T} X W^t V V^{T})_{j k}&=\sum _{l}(X^{T} X W^t)_{j l}(V V^{T})_{l k} \\&\ge (X^{T} X W^t)_{j k}(V V^{T})_{k k} \\&= \sum _{r} (X^{T} X)_{jr} W_{r k}^{t}(V V^{T})_{k k} \\&\ge W_{j k}^{t} (X^{T} X)_{jj}\ (V V^{T})_{kk}. \end{aligned}$$

Thus \(G_{W}(W_{j k}, W_{j k}^{t}) \ge F_{W}(W_{j k}).\)

Therefore, \(G_{W}(W_{j k}, W_{j k}^{t})\) is an auxiliary function for \(F_{W}(W_{j k})\).\(\square \)

Now we give the proof of Theorem 1 based on the above lemmas.

Proof of Theorem

1 According to Lemma 2 - Lemma 4, (B3), (B4) and (B6) are auxiliary functions for \(F_{S}(S_{i j})\), \(F_{V}(V_{j k})\) and \(F_{W}(W_{j k})\) respectively.

Now we prove that the updating rules are exactly the optimal solutions for the auxiliary function. For variable S, by minimizing (B3), we get

$$\begin{aligned} S_{i j}^{t+1}&=\arg \min _{S_{i j}} G_{S}(S_{i j}, S_{i j}^{t})\\&=S_{i j}^{t}-S_{i j}^{t} \frac{F'_{S}(S_{i j}^{t})}{2(\frac{\lambda }{2} T+ X^{T} X S^{t} +\alpha S^{t} + \frac{\beta }{2} \mathrm{{E}})}_{i j} \\&=S_{i j}^{t}\frac{ (X^{T} X)_{i j}}{(\frac{\lambda }{2} T+ X^{T} X S^{t} +\alpha S^{t} + \frac{\beta }{2} \mathrm{{E}})}_{i j} \end{aligned}$$

When \(V^{t}\) and \(W^{t}\) are fixed, \(F_{S}(S_{ij})\) is non-increasing under the rule in (12a).

For variable V, by minimizing (B4), we have

$$\begin{aligned} V_{j k}^{t+1}&=\arg \min _{V_{j k}} G_{V}(V_{j k}, V_{j k}^{t}) \\&=V_{j k}^{t}-V_{j k}^{t}\frac{F'_{V}(V_{j k}^{t})}{2(V^{t} W^{T} X^{T} X S+\lambda D V^{t}+ V^{t} \mathrm{\Gamma }_{1} )_{j k}} \\&=V_{j k}^{t} \frac{(X^{T} X W+\lambda (R V^{t} + V^{t} (V^{t})^{T} D^{T} V^{t}))_{j k}}{(V^{t} W^{T} X^{T} X V^{t}+\lambda (D V^{t}+ V^{t} (V^{t})^{T} R^{T} V^{t}))_{j k}}. \end{aligned}$$

By fixing \(W^{t}\) and \(S^{t+1}\), \(F_{V}(V_{jk})\) is non-increasing under the rule in (12b).

For variable W, by minimizing (B6), we get the updating rule

$$\begin{aligned} \begin{aligned} W_{j k}^{t+1}&=\arg \min _{W_{j k}} G_{W}(W_{j k}, W_{j k}^{t}) \\&=W_{j k}^{t}-W_{j k}^{t} \frac{F_{W}^{'}(W_{j k}^{t})}{2(X^{T} X W^{t} V^{T} V)_{j k}} \\&=W_{j k}^{t} \frac{(X^{T} X V )_{j k}}{( X^{T} X W^{t} V^{T} V)_{j k}}. \end{aligned} \end{aligned}$$

when \(V^{t+1}\) and \(S^{t+1}\) are fixed, \(F_{W}(W_{j k})\) is non-increasing under the rule in (12c).

Therefore, in the t-th iteration, when \(V^t\) and \(W^t\) are fixed, the following inequality holds

$$ F(W^{t},\ V^t,\ S^{t+1})\le F(W^{t},\ V^t,\ S^t),$$

and when \(S^{t+1}\) and \(W^t\) are fixed, we have

$$ F(W^{t},\ V^{t+1},\ S^{t+1})\le F(W^{t},\ V^t,\ S^{t+1}),$$

and when \(V^{t+1}\) and \(S^{t+1}\) are fixed, we get

$$F(W^{t+1},V^{t+1},\ S^{t+1})\le F(W^{t},\ V^{t+1},\ S^{t+1}).$$

As a result, we have

$$\begin{aligned} F(W^{t+1},\ {}&V^{t+1},\ S^{t+1})\le F(W^{t},\ V^{t+1},\ S^{t+1})\le F(W^{t},\ V^{t},\ S^{t+1}) \\&\le F(W^{t},\ V^t,\ S^t)\le \cdots \le F(W^{0},\ V^0,\ S^0). \end{aligned}$$

Therefore, the objective function in (11) is non-increasing under the updating rules (12a), (12b) and (12c).\(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, X., Xiong, D. & Chai, L. Concept factorization with adaptive graph learning on Stiefel manifold. Appl Intell 54, 8224–8240 (2024). https://doi.org/10.1007/s10489-024-05606-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-024-05606-8

Keywords