Concept factorization with adaptive graph learning on Stiefel manifold

Hu, Xuemin; Xiong, Dan; Chai, Li

doi:10.1007/s10489-024-05606-8

Concept factorization with adaptive graph learning on Stiefel manifold

Published: 24 June 2024

Volume 54, pages 8224–8240, (2024)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

171 Accesses
Explore all metrics

Abstract

In machine learning and data mining, concept factorization (CF) has achieved great success for its powerful capability in data representation. To learn an adaptive inherent graph structure of data space, and to ease the burden brought by the explicit orthogonality constraint, we propose a concept factorization with adaptive graph learning on the Stiefel manifold (AGCF-SM). The method essentially integrates concept factorization and manifold learning into a unified framework. Therein the adaptive similarity graph is learned by iterative locally linear embedding, which is free from dependence on neighbor sets. An iterative updating algorithm is developed and the convergence and complexity analyses of the algorithm are provided. The numerical experiments on ten benchmark datasets have demonstrated that the proposed algorithm outperforms other state-of-the-art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Algorithm 1

Concept Factorization with Optimal Graph Learning for Data Representation

Feature Selection for Adaptive Dual-Graph Regularized Concept Factorization for Data Representation

Article 31 August 2016

Hyper-graph regularized discriminative concept factorization for data representation

Article 16 May 2017

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

All data used during the study were provided freely on the website (http://www.cad.zju.edu.cn/home/dengcai/Data/data.html, https://archive.ics.uci.edu/datasets, and https://jundongl.github.io/scikit-feature/index.html).

Notes

https://github.com/huangsd/NMFAN

References

Lin X, Chen X, Zheng Z (2023) Deep manifold matrix factorization autoencoder using global connectivity for link prediction. Appl Intell 53(21):25816–25835. https://doi.org/10.1007/s10489-023-04887-9
Article Google Scholar
Gao X, Zhang Z, Mu T et al (2020) Self-attention driven adversarial similarity learning network. Pattern Recogn, 105:107331. https://doi.org/10.1016/j.patcog.2020.107331
Wu W, Hou J, Wang S et al (2023) Semi-supervised adaptive kernel concept factorization. Pattern Recogn, 134:109114. https://doi.org/10.1016/j.patcog.2022.109114
Rahiche A, Cheriet M (2021) Blind decomposition of multispectral document images using orthogonal non-negative matrix factorization. IEEE Trans Image Process, 30:5997–6012. https://doi.org/10.1109/TIP.2021.3088266
Lee D, Seung H (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791. https://doi.org/10.1038/44565
Article Google Scholar
Tang J, Wan Z (2021) Orthogonal dual graph-regularized non-negative matrix factorization for co-clustering. J Sci Comput 87(3):1–37. https://doi.org/10.1007/s10915-021-01489-w
Article MathSciNet Google Scholar
Hien LTK, Gillis N (2021) Algorithms for non-negative matrix factorization with the Kullback-Leibler divergence. J Sci Comput 87(3):1–32. https://doi.org/10.1007/s10915-021-01504-0
Article MathSciNet Google Scholar
Shu Z, Weng Z, Yu Z et al (2022) Correntropy-based dual graph regularized non-negative matrix factorization with ${L}_{p}$ smoothness for data representation. Appl Intell 52(7):7653–669. https://doi.org/10.1007/s10489-021-02826-0
Article Google Scholar
Xu W, Gong Y (2004) Document clustering by concept factorization. In: Proceed 27th Ann Int ACM SIGIR Conf Res Dev Inf Retri, pp 202–209. https://doi.org/10.1145/1008992.1009029
Zhang Z, Zhang Y, Liu G et al (2020) Joint label prediction based semi-supervised adaptive concept factorization for robust data representation. IEEE Trans Knowl Data Eng 32(5):952–970. https://doi.org/10.1109/TKDE.2019.2893956
Article Google Scholar
Zhou N, Chen B, Du Y et al (2020) Maximum correntropy criterion-based robust semisupervised concept factorization for image representation. IEEE Trans Neural Netw Learn Syst 31(10):3877–3891. https://doi.org/10.1109/TNNLS.2019.2947156
Article MathSciNet Google Scholar
Peng S, Yang Z, Nie F et al (2022) Correntropy based semi-supervised concept factorization with adaptive neighbors for clustering. Neural Netw, 154:203–217. https://doi.org/10.1016/j.neunet.2022.07.021
Li Z, Yang Y (2023) Structurally incoherent adaptive weighted low-rank matrix decomposition for image classification. Appl Intell 53(21):25028–25041. https://doi.org/10.1007/s10489-023-04875-z
Article Google Scholar
Deng P, Li T, Wang H et al (2021) Tri-regularized non-negative matrix tri-factorization for co-clustering. Knowl-Based Syst, 226:107101. https://doi.org/10.1016/j.knosys.2021.107101
Zhang L, Liu Z, Pu J et al (2020) Adaptive graph regularized non-negative matrix factorization for data representation. Appl Intell, 50:438–447. https://doi.org/10.1007/s10489-019-01539-9
Shu Z, Zuo F, Wu W et al (2023) Dual local learning regularized NMF with sparse and orthogonal constraints. Appl Intell 53(7):7713–7727. https://doi.org/10.1007/s10489-022-03881-x
Article Google Scholar
Yang X, Che H, Leung MF et al (2023) Adaptive graph non-negative matrix factorization with the self-paced regularization. Appl Intell 53(12):15818–15835. https://doi.org/10.1007/s10489-022-04339-w
Article Google Scholar
Tang J, Feng H (2022) Robust local-coordinate non-negative matrix factorization with adaptive graph for robust clustering. Inf Sci, 610:1058–1077. https://doi.org/10.1016/j.ins.2022.08.023
Chen M, Li X (2021) Concept factorization with local centroids. IEEE Trans Neural Netw Learn Syst 32(11):5247–5253. https://doi.org/10.1109/TNNLS.2020.3027068
Article MathSciNet Google Scholar
Wu W, Chen Y, Wang R et al (2023) Self-representative kernel concept factorization. Knowl-Based Syst, 259:110051. https://doi.org/10.1016/j.knosys.2022.110051
Mu J, Song P, Liu X et al (2023) Dual-graph regularized concept factorization for multi-view clustering. Expert Syst Appl, 223:119949. https://doi.org/10.1016/j.eswa.2023.119949
Pei X, Chen C, Gong W (2018) Concept factorization with adaptive neighbors for document clustering. IEEE Trans Neural Netw Learn Syst 29(2):343–352. https://doi.org/10.1109/TNNLS.2016.2626311
Article MathSciNet Google Scholar
Guo Y, Ding G, Zhou J et. al (2015) Robust and discriminative concept factorization for image representation. In: Proceed 5th ACM Int Conf Multimed Retr, pp 115-122. https://doi.org/10.1145/2671188.2749317
Yang B, Zhang X, Nie F et al (2023) ECCA: Efficient correntropy-based clustering algorithm with orthogonal concept factorization. IEEE Trans Neural Netw Learn Syst 34(10):7377–7390. https://doi.org/10.1109/TNNLS.2022.3142806
Article MathSciNet Google Scholar
Ding C, Li T, Peng W et. al (2006) Orthogonal non-negative matrix tri-factorizations for clustering. In: Proceed ACM SIGKDD Int Conf Knowl Discov Data Min, pp 126–135. https://doi.org/10.1145/1150402.1150420
Huang K, Sidiropoulos ND, Swami A (2014) Non-negative matrix factorization revisited: Uniqueness and algorithm for symmetric decomposition. IEEE Trans Signal Process 62(1):211–224. https://doi.org/10.1109/TSP.2013.2285514
Article MathSciNet Google Scholar
He P, Xu X, Ding J et al (2020) Low-rank non-negative matrix factorization on Stiefel manifold. Inf Sci, 514:131–148. https://doi.org/10.1016/j.ins.2019.12.004
Wang Q, He X, Jiang X et al (2022) Robust bi-stochastic graph regularized matrix factorization for data clustering. IEEE Trans Pattern Anal Mach Intell 44(1):390–403. https://doi.org/10.1109/TPAMI.2020.3007673
Article Google Scholar
Wang S, Chang TH, Cui Y et al (2021) Clustering by orthogonal NMF model and non-convex penalty optimization. IEEE Trans Signal Process, 69:5273–5288. https://doi.org/10.1109/TSP.2021.3102106
Yang B, Zhang X, Nie F et al (2021) Fast multi-view clustering via non-negative and orthogonal factorization. IEEE Trans Image Process, 30:2575–2586
Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7(85):2399–2434
MathSciNet Google Scholar
Huang S, Xu Z, Kang Z et al (2020) Regularized non-negative matrix factorization with adaptive local structure learning. Neurocomputing, 382:196–209. https://doi.org/10.1016/j.neucom.2019.11.070
Bai L, Cui L, Wang Y et. al (2022) HAQJSK: Hierarchical-aligned quantum Jensen-Shannon kernels for graph classification. 10.48550/arXiv.2211.02904
Li J, Zheng R, Feng H et. al (2024) Permutation equivariant graph framelets for heterophilous graph learning. IEEE Trans Neural Netw Learn Syst, pp 1–15. https://doi.org/10.1109/TNNLS.2024.3370918
Li M, Zhang L, Cui L et al (2023) Blog: Bootstrapped graph representation learning with local and global regularization for recommendation. Pattern Recogn, 144:109874. https://doi.org/10.1016/j.patcog.2023.109874
Cai D, He X, Han J (2010) Locally consistent concept factorization for document clustering. IEEE Trans Knowl Data Eng 23(6):902–913. https://doi.org/10.1109/TKDE.2010.165
Article Google Scholar
Ye J, Jin Z (2014) Dual-graph regularized concept factorization for clustering. Neurocomputing, 138:120–130. https://doi.org/10.1016/j.neucom.2014.02.029
Ye J, Jin Z (2017) Graph-regularized local coordinate concept factorization for image representation. Neural Process Lett 46(2):427–449. https://doi.org/10.1007/s11063-017-9598-2
Li N, Leng C, Cheng I et al (2024) Dual-graph global and local concept factorization for data clustering. IEEE Trans Neural Netw Learn Syst 35(1):803–816. https://doi.org/10.1109/TNNLS.2022.3177433
Article MathSciNet Google Scholar
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326. https://doi.org/10.1126/science.290.5500.2323
Article Google Scholar
Yi Y, Wang J, Zhou W et al (2020) Non-negative matrix factorization with locality constrained adaptive graph. IEEE Trans Circuits Syst Video Technol 30(2):427–441. https://doi.org/10.1109/TCSVT.2019.2892971
Article Google Scholar
Edelman A, Arias TA, Smith ST (1999) The geometry of algorithms with orthogonality constraints. SIAM J Matrix Anal Appl 20(2):303–353. https://doi.org/10.1137/S0895479895290954
Article MathSciNet Google Scholar
Wei D, Shen X, Sun Q et al (2021) Adaptive graph guided concept factorization on Grassmann manifold. Inf Sci, 576:725–742. https://doi.org/10.1016/j.ins.2021.08.040
Lee D, Seung H (2001) Algorithms for non-negative matrix factorization. In: Proceed 13th Int Conf Neural Inf Process Syst, pp 535–541
Zhang Z, Zhang Y, Xu M et al (2021) A survey on concept factorization: From shallow to deep representation learning. Information Processing & Management 58(3):102534. https://doi.org/10.1016/j.ipm.2021.102534
Article Google Scholar
Jannesari V, Keshvari M, Berahmand K (2024) A novel non-negative matrix factorization-based model for attributed graph clustering by incorporating complementary information. Expert Syst Appl, 242. https://doi.org/10.1016/j.eswa.2023.122799

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant 62173259) and the Natural Science Foundation of Hubei Province (No. 2022CFB110).

Author information

Authors and Affiliations

School of Information Science and Engineering, Wuhan University of Science and Technology, Wuhan, 430081, China
Xuemin Hu & Dan Xiong
State Key Laboratory of Industrial Control Technology, College of Control Science and Engineering, Zhejiang University, Hangzhou, 310027, China
Li Chai

Authors

Xuemin Hu
View author publications
You can also search for this author inPubMed Google Scholar
Dan Xiong
View author publications
You can also search for this author inPubMed Google Scholar
Li Chai
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Xuemin Hu: Conceptualization, Methodology, Software, Validation, Writing-Original draft preparation; Dan Xiong: Writing-Reviewing and Editing, Funding acquisition; Li Chai: Supervision and Funding acquisition.

Corresponding author

Correspondence to Dan Xiong.

Ethics declarations

Ethical and informed consent for data used

No human participants and animals are involved in the research described in the article.

Competing Interests

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A The Derivation of (12a) - (12c)

The derivation includes the following three steps.

1) S-minimization

Fixing W and V , updating S : Optimizing (11) with respect to S is equivalent to solve

$$\begin{aligned} \min _{S}\ \lambda \textrm{Tr}(V^{T} L V)+ \Vert X-XS\Vert _{F}^{2}+\alpha \Vert S\Vert _{F}^{2}+\beta \Vert S\Vert _{1,1},\ \text {s.t.}\ \ S \ge \textbf{0}.~ \end{aligned}$$

(A1)

Let $ \mathrm{\Phi } \in \mathbb {R}^{n \times n}$ be the Lagrangian multiplier. Then, the Lagrange function $\mathcal {L}(S, \mathrm{\Phi })$ of (A1) is written as

$$\begin{aligned} \mathcal {L}(S, \mathrm{\Phi })&= \lambda \textrm{Tr}(V^{T} L V)+\Vert X-XS\Vert _{F}^{2}+\alpha \Vert S\Vert _{F}^{2}+\beta \Vert S\Vert _{1,1}-\textrm{Tr}( \mathrm{\Phi }S )\\&= \lambda \textrm{Tr}(TR)+\Vert X-XS\Vert _{F}^{2}+\alpha \textrm{Tr}(S^{T}S)+\beta \textrm{Tr}(\textrm{E} S)-\textrm{Tr}( \mathrm{\Phi } S).~ \end{aligned}$$

For the second equation, $\textrm{Tr}(V^{T}LV)=\frac{1}{2}\sum _{i, j =1}^{n}\Vert V_{i}-V_{j}\Vert ^{2}R_{ij}$, and denote $ T_{ij} = \frac{1}{2}\Vert V_{i}-V_{j}\Vert ^{2}$, thus $ \textrm{Tr}(V^{T}LV) = \textrm{Tr}(TR)$.

Taking the derivative of $\mathcal {L}(S, \mathrm{\Phi })$ with respect to S and setting it to zero, we have

$$\begin{aligned} \lambda T-2 X^{T} X + 2 X^{T} X S + 2\alpha S+\beta \textrm{E} -\mathrm{\Phi } = \textbf{0}. \end{aligned}$$

Using the KKT condition, $\mathrm{\Phi }_{ij}S_{ij}=0$, we have

$$\begin{aligned} (\lambda T-2 X^{T} X + 2X^{T} X S + 2\alpha S+\beta \textrm{E})_{ij}S_{ij}=0. \end{aligned}$$

Then, we obtain

$$\begin{aligned} S_{ij}^{t+1}\leftarrow S_{ij}^{t}\frac{(X^{T} X)_{ij}}{(\frac{\lambda }{2} T+ X^{T} X S + \alpha S+\frac{\beta }{2}\textrm{E})}_{ij}. \end{aligned}$$

2) V-minimization

Fixing W and S , updating V : Optimizing (11) w.r.t. V is equivalent to solve

$$\begin{aligned} \min _{V} \mathcal {O}_{1}=\Vert X-X W V^{T}\Vert _{F}^{2}+\lambda \textrm{Tr}( V^{T} L V ), \ \text {s.t.} \ V \ge \textbf{0},\ \ V\in \mathcal {M}. \end{aligned}$$

(A2)

As V lies on Stiefel manifold, we compute the natural gradient $\widetilde{\nabla }_{V} \mathcal {O}_{1} $ on Stiefel manifold. Firstly, we calculate the derivative of $\mathcal {O}_{1}$ with respect to V in the Euclidean space,

$$\begin{aligned} \nabla _{V}\mathcal {O}_{1}{} & {} = -2 X^{T} X W +2V W^{T} X^{T} X W+2\lambda L V\\{} & {} \!=\!(2VW^{T} X^{T} X W\!\!+\!\!2\lambda DV)\!-\!(2 X^{T} X W\!\!+\!\!2\lambda RV). \end{aligned}$$

Secondly, we use $\nabla _{V} \mathcal {O}_{1}$ to compute the natural gradient $\widetilde{\nabla }_{V} \mathcal {O}_{1} $ by (9),

$$\begin{aligned} \widetilde{\nabla }_{V}\mathcal {O}_{1}&= \nabla _{V} \mathcal {O}_{1}- V(\nabla _{V} \mathcal {O}_{1})^{T}V \\&\!=\! 2 V W^{T} X^{T} X V \!-\!2 X^{T} X W \!\!+\!\! 2\lambda (D V\!-\! R V \!-\! V V^{T}D^{T} V\!\!+\!\! VV^{T}R^{T}V). \end{aligned}$$

Using the KKT conditions, $ V \odot \widetilde{\nabla }_{V} \mathcal {O}_{1} = \textbf{0} $. The updating rule of V is

$$\begin{aligned} V_{jk}^{t+1}\leftarrow V_{jk}^{t} \frac{(X^{T} X W+\lambda (R V+V{V}^{T}{D}^{T}V))_{jk}}{(V{W}^{T} X^{T} X V+\lambda (D V+V{V}^{T} {R}^{T} V))_{jk}}. \end{aligned}$$

3) W-minimization

Fixing S and V , updating W : Optimizing (11) w.r.t. W is equivalent to solve

$$\begin{aligned} \min _{W} \Vert X-X W V^{T}\Vert _{F}^{2}, \ \text {s.t.}\ W \ge \textbf{0}.~ \end{aligned}$$

(A3)

By introducing the Lagrangian multiplier $ \mathrm{\Psi }\in \mathbb {R}^{n \times c}$, the Lagrange function $\mathcal {L}(W, \mathrm{\Psi })$ of (A3) is written as

$$\begin{aligned} \mathcal {L}(W, \mathrm{\Psi })= \Vert X-X W V^{T}\Vert _{F}^{2} - \textrm{Tr}(\mathrm{\Psi } W^{T}).~ \end{aligned}$$

Taking the derivative of the Lagrange function $\mathcal {L}(W, \mathrm \Psi )$ with respect to W and setting it to zero, we have

$$\begin{aligned} - 2 X^{T} X V +2 X^{T} X W V^{T} V - \mathrm{\Psi } = \textbf{0}. \end{aligned}$$

Using the KKT conditions, $ \mathrm{\Psi }_{jk} W_{jk} = 0 $, we have

$$\begin{aligned} (- 2 X^{T} X V +2 X^{T} X W V^{T}V)_{jk} W_{jk}=0. \end{aligned}$$

Then, the updating rule of W is

$$\begin{aligned} W_{jk}^{t+1}\leftarrow W_{jk}^{t}\frac{(X^{T} X V)_{jk}}{(X^{T} X W {V}^{T} V)_{jk}}. \end{aligned}$$

Appendix B Proof of Theorem 1

In order to prove Theorem 1, we use the property of auxiliary function [44].

Definition 1

$G(h, h^{*})$ is an auxiliary function for F(h) under the conditions

$$G(h, h^{*})\ge F(h),\ \ G(h, h)= F(h).$$

Lemma 1

If $G(h, h^{*})$ is an auxiliary function of F(h), then F(h) is non-increasing under the following updating rule

$$\begin{aligned} h^{t+1} = \mathop {\arg \min }_{h} G(h,h^{t}). \end{aligned}$$

(B1)

Considering the orthogonality of V, the objective function (11) is reformulated as

$$\begin{aligned} F(W,V,S) = \Vert X-X W V^{T}\Vert _{F}^{2} + \lambda \textrm{Tr}(V^{T}LV) + \Vert X-XS\Vert _{F}^{2} \nonumber \\ + \alpha \Vert S\Vert _{F}^{2} + \beta \Vert S\Vert _{1,1}+\textrm{Tr}(\mathrm{\Gamma }(V^{T} V - {I})), \end{aligned}$$

(B2)

where $\mathrm{\Gamma }$ is the Lagrange multiplier.

Denote $F_{S}(S_{i j})$, $F_{V}(V_{j k})$ and $F_{W}(W_{j k})$ as the corresponding objective function only with respect to S, V and W, respectively. Specifically, they are

$$\begin{aligned} F_{S}(S_{i j})= & {} (\lambda T R-2 X^{T} X S + S^{T} X^{T} X S + \alpha S^{T}S + \beta \mathrm{{E}}S)_{i j},\\ F_{V}(V_{j k})= & {} (-2 X^{T} X W V^{T} + V W^{T} X^{T} X W V^{T}+ V^{T} L V+ \mathrm{\Gamma }( V^{T} V - {I} ))_{j k},\\ F_{W}(W_{j k})= & {} (-2 X^{T} X W V^{T}+V W^{T} X^{T} X W V^{T})_{j k}. \end{aligned}$$

To prove the theorem, we construct auxiliary functions for $F_{S}(S_{ij})$, $F_{V}(V_{j k})$ and $F_{W}(W_{j k})$, which are presented as Lemma 2, 3 and 4, respectively.

Lemma 2

Define

$$\begin{aligned} G_{S}(S_{i j}, S_{i j}^{t}) = F_{S}(S_{i j}^{t})&+ F_{S}^{\prime }(S_{i j}^{t}) (S_{i j}-S_{i j}^{t})\nonumber \\&+ \frac{(\frac{\lambda }{2} T+ X^{T} X S^t +\alpha S^t + \frac{\beta }{2} {\mathrm{{E}}})_{i j}}{S_{i j}^{t}}(S_{i j}-S_{i j}^{t})^{2}. \end{aligned}$$

(B3)

Then, $G_{S}(S_{i j}, S_{i j}^{t})$ is an auxiliary function for $F_{S}(S_{i j})$.

Proof

$G_{S}(S_{i j}, S_{i j}) = F_{S}(S_{i j})$ is obvious. Next, we prove $F_{S}(S_{i j}) \le G_{S}(S_{i j}, S_{i j}^{t})$.

The derivatives of the function $F_{S}(S_{ij})$ are

$$\begin{aligned} F^{'}_{S}(S_{ij})&= (\lambda T - 2 X^{T} X + 2 X^{T} X S + 2 \alpha S+\beta \mathrm{{E}})_{ij}, \\ F^{''}_{S}(S_{ij})&= 2 (X^{T} X)_{ii}+2 \alpha ,\\ F^{(k)}_{S}(S_{ij})&= 0, \ k \ge 3. \end{aligned}$$

Therefore, the Taylor series expansion of the function $F_{S}(S_{i j})$ is written as

$$\begin{aligned} F_{S}(S_{i j})= & {} F_{S}(S_{i j}^{t})+F_{S}^{'}(S_{i j}^{t})(S_{i j}-S_{i j}^{t})+\frac{ F_{S}^{''}(S_{i j}^{t})}{2}(S_{i j}-S_{i j}^{t})^{2}\\= & {} F_{S}(S_{i j}^{t})+F_{S}^{\prime }(S_{i j}^{t})(S_{i j}-S_{i j}^{t})+((X^{T} X)_{i i}+\alpha )(S_{i j}-S_{i j}^{t})^{2}.\\ \end{aligned}$$

Compared with (B3), we find that $F_{S}(S_{i j})\le G_{S}(S_{i j}, S_{i j}^{t})$ is equivalent to

$$\begin{aligned} \frac{(\frac{\lambda }{2} T+ X^{T} X S^t +\alpha S^t + \frac{\beta }{2} \mathrm{{E}})_{i j}}{S_{i j}^{t}}\ge (X^{T} X)_{i i}+ \alpha . \end{aligned}$$

It is easy to check that

$$ (X^{T} X S^t)_{ij} = \sum _{\nu }(X^{T} X)_{i \nu }S_{\nu j}^{t}\ge (X^{T} X)_{ii} S_{ij}^{t}.$$

Thus, $F_{S}(S_{i j})\le G_{S}(S_{i j}, S_{i j}^{t})$. $\square $

Lemma 3

Define

$$\begin{aligned} G_{V}(V_{j k}, V_{j k}^{t}) \!\!=\!\! F_{V}(V_{j k}^{t}){} & {} +F_{V}^{\prime }(V_{j k}^{t})(V_{j k}-V_{j k}^{t})\nonumber \\{} & {} \!+\!\frac{(V^t W^{T} X^{T} X W\!+\!\lambda D V^t \!+\! V^t \mathrm{\Gamma }_{1})_{j k}}{V_{j k}^{t}}(V_{j k}\!-\!V_{j k}^{t})^{2}, \end{aligned}$$

(B4)

where

$$\begin{aligned} \begin{array}{l} \mathrm{\Gamma }_{1}\!=\!\mathrm{\Gamma }\!+\!\mathrm{\Gamma }_{2}\!=\!W^{T} X^{T} X V^t\!+\!\lambda (V^t)^{T} R V^t\!-\!W^{T} X^{T} X W ,\\ \mathrm{\Gamma }_{2}=\lambda (V^t)^{T} D V^t \ge 0. \end{array} \end{aligned}$$

(B5)

Then, $G_{V}(V_{j k}, V_{j k}^{t})$ is an auxiliary function for $F_{V}(V_{j k})$.

Proof

Obviously, $F_{V}(V_{j k})=G_{V}(V_{j k}, V_{j k})$ holds. Next, we prove $F_{V}(V_{j k})\le G_{V}(V_{j k}, V_{j k}^{t})$.

To do this, we need to use the Taylor series expansion of the function $F_{V}(V_{j k})$. The derivatives of the function $F_{V}(V_{j k})$ are

$$\begin{aligned} F_{V}^{'}(V_{j k})&\!=\!2(-X^{T} X V + V W^{T} X^{T} X W\!+\!\lambda L V \!+\! \mathrm{\Gamma }V )_{j k},\\ F_{V}^{''}(V_{j k})&=2( (W^{T} X^{T} X W)_{kk}+\lambda L_{jj} + \mathrm{\Gamma }_{kk}),\\ F_{V}^{(k)}(V_{j k})&= 0,\ \ k \ge 3. \end{aligned}$$

Therefore, the Taylor series expansion of the function $F_{V}(V_{j k})$ is

$$\begin{aligned} F_{V}(V_{j k})= & {} F_{V}(V_{j k}^{t}) +F_{V}^{\prime }(V_{j k}^{t})(V_{j k}-V_{j k}^{t})\nonumber \\{} & {} + ((W^{T} X^{T} X W)_{k k}+\lambda L_{j j}+\mathrm{\Gamma }_{k k})(V_{j k}-V_{j k}^{t})^{2}. \end{aligned}$$

Compared with (B4), it is sufficient to show that

$$\begin{aligned} \frac{(V^{t} W^{T} X^{T} X W+\lambda D V^{t}+ V^{t} \mathrm{\Gamma }_{1})_{j k}}{V_{j k}^{t}} \ge (W^{T} X^{T} X W)_{k k}+\lambda L_{j j}+\mathrm{\Gamma }_{k k}. \end{aligned}$$

We obtain the following inequalities,

$$\begin{aligned} (V^{t} W^{T} X^{T} X W)_{j k}= & {} \sum _{\omega } V_{j \omega }^{t}(W^{T} X^{T} X W)_{\omega k} \ge V_{j k}^{t}(W^{T} X^{T} X W)_{k k}, \\ (D V^{t})_{j k}= & {} \sum _{h} D_{j h} V_{h k}^{t} \ge D_{j j} V_{j k}^{t} \ge (D- R )_{j j} V_{j k}^{t}=L_{j j} V_{j k}^{t}, \\ ( V^{t}\mathrm{\Gamma }_{1})_{j k}= & {} \sum _{\tau } V_{j \tau }^{t}(\mathrm{\Gamma }_{1})_{\tau k} \ge V_{j k}^{t} (\mathrm{\Gamma }_{1})_{k k} \ge V_{j k}^{t}(\mathrm{\Gamma }_{1}-\mathrm{\Gamma }_{2})_{k k}=V_{j k}^{t}\mathrm{\Gamma }_{k k}. \end{aligned}$$

Thus, $F_{V}(V_{j k})\le G_{V}(V_{j k}, V_{j k}^{t})$.$\square $

Lemma 4

Define

$$\begin{aligned} G_{W}(W_{j k}, W_{j k}^{t}) = F_{W}(W_{j k}^{t})+F_{W}^{\prime }(W_{j k}^{t})(W_{j k}-W_{j k}^{t})+\frac{( X^{T} X W^t V^{T}V)_{j k}}{W_{j k}^{t}}(W_{j k}-W_{j k}^{t})^{2}. \end{aligned}$$

(B6)

Then, $G_{W}(W_{j k}, W_{j k}^{t})$ is an auxiliary function for $F_{W}(W_{j k})$.

Proof

The proof follows the same line as Lemma 2 and Lemma 3. That is, $G_{W}(W_{j k}, W_{j k}) = F_{W}(W_{j k})$ is obvious, we only need to prove $G_{W}\left( W_{j k}, W_{j k}^{t}\right) \ge F_{W}(W_{j k})$.

To do this, we need to obtain the Taylor series expansion of the function $F_{W}(W_{j k})$. The derivatives of the function $F_{W}(W_{j k})$ are

$$\begin{aligned} F_{W}^{'}(W_{j k})&= 2(-X^{T} X W + X^{T} X W V V^{T})_{j k},\\ F_{W}^{''}(W_{j k})&= 2 (X^{T} X)_{j j}(V V^{T})_{kk},\\ F_{W}^{(k)}(W_{j k})&=0,\ k\ge 3. \end{aligned}$$

Therefore, the Taylor series expansion of the function $F_{W}(W_{j k})$ is obtained as

$$\begin{aligned} F_{W}(W_{j k})= & {} F_{W}(W_{j k}^{t})+F_{W}^{\prime }(W_{j k}^{t})(W_{j k}-W_{j k}^{t})\nonumber \\{} & {} + ( (X^{T} X)_{j j}\ (V V^{T})_{kk})(W_{j k}-W_{j k}^{t})^{2}. \end{aligned}$$

Compared with (B6), we only need to show that

$$\begin{aligned} \frac{(X^{T} X W^t V V^{T})_{j k}}{W_{j k}^{t}} \ge (X^{T} X)_{jj}\ (V V^{T})_{kk}. \end{aligned}$$

(B7)

It is easy to check that

$$\begin{aligned} (X^{T} X W^t V V^{T})_{j k}&=\sum _{l}(X^{T} X W^t)_{j l}(V V^{T})_{l k} \\&\ge (X^{T} X W^t)_{j k}(V V^{T})_{k k} \\&= \sum _{r} (X^{T} X)_{jr} W_{r k}^{t}(V V^{T})_{k k} \\&\ge W_{j k}^{t} (X^{T} X)_{jj}\ (V V^{T})_{kk}. \end{aligned}$$

Thus $G_{W}(W_{j k}, W_{j k}^{t}) \ge F_{W}(W_{j k}).$

Therefore, $G_{W}(W_{j k}, W_{j k}^{t})$ is an auxiliary function for $F_{W}(W_{j k})$.$\square $

Now we give the proof of Theorem 1 based on the above lemmas.

Proof of Theorem

1 According to Lemma 2 - Lemma 4, (B3), (B4) and (B6) are auxiliary functions for $F_{S}(S_{i j})$, $F_{V}(V_{j k})$ and $F_{W}(W_{j k})$ respectively.

Now we prove that the updating rules are exactly the optimal solutions for the auxiliary function. For variable S, by minimizing (B3), we get

$$\begin{aligned} S_{i j}^{t+1}&=\arg \min _{S_{i j}} G_{S}(S_{i j}, S_{i j}^{t})\\&=S_{i j}^{t}-S_{i j}^{t} \frac{F'_{S}(S_{i j}^{t})}{2(\frac{\lambda }{2} T+ X^{T} X S^{t} +\alpha S^{t} + \frac{\beta }{2} \mathrm{{E}})}_{i j} \\&=S_{i j}^{t}\frac{ (X^{T} X)_{i j}}{(\frac{\lambda }{2} T+ X^{T} X S^{t} +\alpha S^{t} + \frac{\beta }{2} \mathrm{{E}})}_{i j} \end{aligned}$$

When $V^{t}$ and $W^{t}$ are fixed, $F_{S}(S_{ij})$ is non-increasing under the rule in (12a).

For variable V, by minimizing (B4), we have

$$\begin{aligned} V_{j k}^{t+1}&=\arg \min _{V_{j k}} G_{V}(V_{j k}, V_{j k}^{t}) \\&=V_{j k}^{t}-V_{j k}^{t}\frac{F'_{V}(V_{j k}^{t})}{2(V^{t} W^{T} X^{T} X S+\lambda D V^{t}+ V^{t} \mathrm{\Gamma }_{1} )_{j k}} \\&=V_{j k}^{t} \frac{(X^{T} X W+\lambda (R V^{t} + V^{t} (V^{t})^{T} D^{T} V^{t}))_{j k}}{(V^{t} W^{T} X^{T} X V^{t}+\lambda (D V^{t}+ V^{t} (V^{t})^{T} R^{T} V^{t}))_{j k}}. \end{aligned}$$

By fixing $W^{t}$ and $S^{t+1}$, $F_{V}(V_{jk})$ is non-increasing under the rule in (12b).

For variable W, by minimizing (B6), we get the updating rule

$$\begin{aligned} \begin{aligned} W_{j k}^{t+1}&=\arg \min _{W_{j k}} G_{W}(W_{j k}, W_{j k}^{t}) \\&=W_{j k}^{t}-W_{j k}^{t} \frac{F_{W}^{'}(W_{j k}^{t})}{2(X^{T} X W^{t} V^{T} V)_{j k}} \\&=W_{j k}^{t} \frac{(X^{T} X V )_{j k}}{( X^{T} X W^{t} V^{T} V)_{j k}}. \end{aligned} \end{aligned}$$

when $V^{t+1}$ and $S^{t+1}$ are fixed, $F_{W}(W_{j k})$ is non-increasing under the rule in (12c).

Therefore, in the t-th iteration, when $V^t$ and $W^t$ are fixed, the following inequality holds

$$ F(W^{t},\ V^t,\ S^{t+1})\le F(W^{t},\ V^t,\ S^t),$$

and when $S^{t+1}$ and $W^t$ are fixed, we have

$$ F(W^{t},\ V^{t+1},\ S^{t+1})\le F(W^{t},\ V^t,\ S^{t+1}),$$

and when $V^{t+1}$ and $S^{t+1}$ are fixed, we get

$$F(W^{t+1},V^{t+1},\ S^{t+1})\le F(W^{t},\ V^{t+1},\ S^{t+1}).$$

As a result, we have

$$\begin{aligned} F(W^{t+1},\ {}&V^{t+1},\ S^{t+1})\le F(W^{t},\ V^{t+1},\ S^{t+1})\le F(W^{t},\ V^{t},\ S^{t+1}) \\&\le F(W^{t},\ V^t,\ S^t)\le \cdots \le F(W^{0},\ V^0,\ S^0). \end{aligned}$$

Therefore, the objective function in (11) is non-increasing under the updating rules (12a), (12b) and (12c).$\square $

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Hu, X., Xiong, D. & Chai, L. Concept factorization with adaptive graph learning on Stiefel manifold. Appl Intell 54, 8224–8240 (2024). https://doi.org/10.1007/s10489-024-05606-8

Download citation

Accepted: 10 June 2024
Published: 24 June 2024
Issue Date: September 2024
DOI: https://doi.org/10.1007/s10489-024-05606-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Concept factorization with adaptive graph learning on Stiefel manifold

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Concept Factorization with Optimal Graph Learning for Data Representation

Feature Selection for Adaptive Dual-Graph Regularized Concept Factorization for Data Representation

Hyper-graph regularized discriminative concept factorization for data representation

Explore related subjects

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical and informed consent for data used

Competing Interests

Additional information

Publisher's Note

Appendices

Appendix A The Derivation of (12a) - (12c)

Appendix B Proof of Theorem 1

Definition 1

Lemma 1

Lemma 2

Proof

Lemma 3

Proof

Lemma 4

Proof

Proof of Theorem

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now