A Converged Deep Graph Semi-NMF Algorithm for Learning Data Representation

Huang, Haonan; Yang, Zuyuan; Li, Zhenni; Sun, Weijun

doi:10.1007/s00034-021-01833-3

A Converged Deep Graph Semi-NMF Algorithm for Learning Data Representation

Published: 06 October 2021

Volume 41, pages 1146–1165, (2022)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Haonan Huang¹,
Zuyuan Yang ORCID: orcid.org/0000-0003-3646-2066²,
Zhenni Li³ &
…
Weijun Sun¹

542 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

Deep nonnegative matrix factorization (DMF) is a particularly useful technique for learning data representation in low-dimensional space. To further obtain the complex hidden information and keep the geometrical structures of the high-dimensional data, we propose a novel deep matrix factorization model with the graph regularization (called DGsnMF). For solving the model with multi-variables, we design a forward–backward splitting scheme. After that, the convergence analysis is attached to the proposed algorithm and it is proved to converge to a critical point. Empirical experiments on benchmark datasets show that the proposed method is superior to the compared methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Graph convolutional networks: a comprehensive review

Article Open access 10 November 2019

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

Article 13 April 2024

Attention-based graph neural networks: a survey

Article 21 August 2023

Notes

$C^{1}$ function: the first derivatives are continuous.

References

S. Arora, N. Cohen, W. Hu, Y. Luo, Implicit regularization in deep matrix factorization. Adv. Neural Inf. Proces. Syst. (NeurIPS), 7413–7424 (2019)
H. Attouch, J. Bolte, P. Redont, A. Soubeyran, Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)
Article MathSciNet Google Scholar
H. Attouch, J. Bolte, B.F. Svaiter, Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. 137(1–2), 91–129 (2013)
Article MathSciNet Google Scholar
Y. Bengio, Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)
Article MathSciNet Google Scholar
J. Bolte, S. Sabach, M. Teboulle, Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)
Article MathSciNet Google Scholar
D. Cai, X. He, J. Han, T.S. Huang, Graph regularized nonnegative matrix factorization for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1548–1560 (2010)
Google Scholar
D. Cai, X. He, X. Wu, J. Han, in Proceedings of IEEE International Conference on Data Mining (ICDM). Non-negative matrix factorization on manifold (2008), pp. 63–72
C.H. Ding, T. Li, M.I. Jordan, Convex and semi-nonnegative matrix factorizations. IEEE Trans. Pattern Anal. Mach. Intell. 32(1), 45–55 (2008)
Article Google Scholar
H. Fang, A. Li, H. Xu, T. Wang, Sparsity-constrained deep nonnegative matrix factorization for hyperspectral unmixing. IEEE Geosci. Remote. Sens. Lett. 15(7), 1105–1109 (2018)
Article Google Scholar
X.-R. Feng, H.-C. Li, J. Li, Q. Du, A. Plaza, W.J. Emery, Hyperspectral unmixing using sparsity-constrained deep nonnegative matrix factorization with total variation. IEEE Trans. Geosci. Remote Sens. 56(10), 6245–6257 (2018)
Article Google Scholar
N. Guan, D. Tao, Z. Luo, B. Yuan, Nenmf: an optimal gradient method for nonnegative matrix factorization. IEEE Trans. Sig. Process. 60(6), 2882–2898 (2012)
Article MathSciNet Google Scholar
G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Article MathSciNet Google Scholar
S. Huang, Z. Kang, I.W. Tsang, Z. Xu, Auto-weighted multi-view clustering via kernelized graph learning. Patt. Recogn. 97, 107015 (2020)
H. Huang, N. Liang, W. Yan, Z. Yang, Z. Li, W. Sun, in Proceedings of IEEE International Conference on Data Mining Workshops. Partially shared semi-supervised deep matrix factorization with multi-view data (2020), pp. 564–570
Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
D.D. Lee, H.S. Seung, Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)
Article Google Scholar
Z. Li, S. Ding, W. Chen, Z. Yang, S. Xie, Proximal alternating minimization for analysis dictionary learning and convergence analysis. IEEE Trans. Emerg. Topics Comput. Intel. 2(6), 439–449 (2018)
Article Google Scholar
Z. Li, S. Ding, Y. Li, Z. Yang, S. Xie, W. Chen, Manifold optimization-based analysis dictionary learning with an $l1/$ 2-norm regularizer. Neural Netw. 98, 212–222 (2018)
Article Google Scholar
Z. Li, J. Tang, X. He, Robust structured nonnegative matrix factorization for image representation. IEEE Trans. Neural Netw. Learn. Syst. 29(5), 1947–1960 (2018)
Article MathSciNet Google Scholar
Z. Li, M. Xu, J. Nie, J. Kang, W. Chen, S. Xie, NOMA-Enabled cooperative computation offloading for blockchain-empowered internet of things: a learning approach. IEEE Intern. Things J. 8(4), 2364–2378 (2021)
Article Google Scholar
H.-C. Li, G. Yang, W. Yang, Q. Du, W.J. Emery, Deep nonsmooth nonnegative matrix factorization network factorization network with semi-supervised learning for SAR image change detection. ISPRS J. Photogramm. Remote Sens. 160, 167–179 (2020)
Article Google Scholar
N. Liang, Z. Yang, Z. Li, W. Sun, S. Xie, Multi-view clustering by non-negative matrix factorization with co-orthogonal constraints. Knowl. Based Syst. 194, 105582 (2020)
Article Google Scholar
N. Liang, Z. Yang, Z. Li, S. Xie, C. Su, Semi-supervised multi-view clustering with graph-regularized partially shared non-negative matrix factorization. Knowl. Based Syst. 190, 105185 (2020)
Article Google Scholar
M. Lyons, S. Akamatsu, M. Kamachi, J. Gyoba, in International Conference on Automatic Face and Gesture Recognition. Coding facial expressions with gabor wavelets (1998), pp. 200–205
Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course (Kluwer Academic, Boston, MA, 2004)
Book Google Scholar
G. Peng, Joint and direct optimization for dictionary learning in convolutional sparse representation. IEEE Trans. Neural Netw. Learn. Syst. 31(2), 559–573 (2020)
Article MathSciNet Google Scholar
H.A. Song, B.-K. Kim, T.L. Xuan, S.-Y. Lee, Hierarchical feature extraction by multi-layer non-negative matrix factorization network for classification task. Neurocomputing 165, 63–74 (2015)
Article Google Scholar
G. Trigeorgis, K. Bousmalis, S. Zafeiriou, B.W. Schuller, A deep matrix factorization method for learning attribute representations. IEEE Trans. Pattern Anal. Mach. Intell. 39(3), 417–429 (2017)
Article Google Scholar
W. Xu, X. Liu, Y. Gong, in Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval. Document clustering based on non-negative matrix factorization (2003), pp. 267–273
W. Yan, B. Zhang, S. Ma, Z. Yang, A novel regularized concept factorization for document clustering. Knowl. Based Syst. 135, 147–158 (2017)
Article Google Scholar
Z. Yang, Y. Hu, N. Liang, J. Lv, Nonnegative matrix factorization with fixed L2-norm constraint. Circ. Syst. Sig. Process. 38(7), 3211–3226 (2019)
Article Google Scholar
Z. Yang, Y. Xiang, K. Xie, S. Xie, Adaptive method for nonsmooth nonnegative matrix factorization. IEEE Trans. Neural Netw. Learn. Syst. 28(4), 948–960 (2017)
Article Google Scholar
Z. Yang, Y. Zhang, Y. Xiang, W. Yan, S. Xie, Non-negative matrix factorization with dual constraints for image clustering. IEEE Trans. Syst. Man Cybern. Syst. 50(7), 2524–2533 (2020)
Article Google Scholar
F. Ye, C. Chen, Z. Zheng, in International Conference on Information and Knowledge Management (CKIM). Deep autoencoder-like nonnegative matrix factorization for community detection (2018), pp. 1393–1402
F. Ye, C. Chen, Z. Zheng, R. Li, J. X. Yu, in Proceedings of IEEE International Conference on Data Mining (ICDM). Discrete overlapping community detection with pseudo supervision (2019), pp. 708–717
J. Yu, G. Zhou, A. Cichocki, S. Xie, Learning the hierarchical parts of objects by deep non-smooth nonnegative matrix factorization. IEEE Access 6, 58096–58105 (2018)
Article Google Scholar
S. Zafeiriou, A. Tefas, I. Buciu, I. Pitas, Exploiting discriminant information in nonnegative matrix factorization with application to frontal face verification. IEEE Trans. Neural Netw. 17(3), 683–695 (2006)
Article Google Scholar
H. Zhao, Z. Ding, Y. Fu, in AAAI Conference on Artificial Intelligence (AAAI). Multi-view clustering via deep matrix factorization (2017), pp. 1288–1293
T. Zhu, Sparse dictionary learning by block proximal gradient with global convergence. Neurocomputing 367, 226–235 (2019)
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank Dr. Jinshi Yu for his helpful discussion and source code [36]. This work was supported by the Key-Area Research and Development Program of Guangdong Province under Grant 2019B010154002, the Science and Technology Plan Project of Guangzhou under Grant 202002030289, the National Natural Science Foundation of China under Grant Nos. 61801133, 61722304, and 61803096.

Author information

Authors and Affiliations

School of Automation, Guangdong Key Laboratory of IoT Information Technology, Guangdong University of Technology, Guangzhou, 510006, People’s Republic of China
Haonan Huang & Weijun Sun
School of Automation, Key Laboratory of iDetection and Manufacturing-IoT, Ministry of Education, Guangzhou, 510006, People’s Republic of China
Zuyuan Yang
School of Automation, Guangdong-HongKong-Macao Joint Laboratory for Smart Discrete Manufacturing, Guangzhou, 510006, People’s Republic of China
Zhenni Li

Authors

Haonan Huang
View author publications
You can also search for this author in PubMed Google Scholar
Zuyuan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Zhenni Li
View author publications
You can also search for this author in PubMed Google Scholar
Weijun Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zuyuan Yang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

To prove Theorem 1, we firstly define the critical points for nonconvex and nonsmooth functions, semi-algebraic functions, and KL functions used for the convergence analysis.

Definition 1

For a proper and lower semi-continuous function h,

The Fréchet subdifferential of h at x can be written as $\hat{\partial } h(x)$ is a set $\mathbf{u} $ and satisfies the following equation if $x\in $ dom h
$$\begin{aligned} \left\{ \mathbf{u }\in {\mathbb {R}}^d: \mathop {\lim \inf }\limits _{y\rightarrow x \ y\ne x} \frac{h(y)-h(x)-\langle u,y-x\rangle }{\Vert y-x \Vert }\ge 0\right\} \end{aligned}$$
(31)
and $\hat{\partial } h(x)= \emptyset $ when $x\not \in $ dom h
The limiting subdifferential of h at x, written as $\partial h(x)$, is a set $\mathbf{u} $ defined as follows:
$$\begin{aligned} \left\{ \mathbf{u }\in {\mathbb {R}}^d: \exists x^k \rightarrow x, h(x^k)\rightarrow h(x), \mathbf{u} ^k \in \hat{\partial } h(x)\rightarrow \mathbf{u} , k \rightarrow \infty \right\} \end{aligned}$$
(32)
Point x satisfies the following equation and it is called limiting-critical point of function h:
$$\begin{aligned} 0 \in \partial h(x) \end{aligned}$$
(33)

Definition 2

(Kurdyka–Łojasiewicz Property) A function h has the KL property at $x^{*}\in $ dom $\partial h$ if there exist:

$\eta \in (0,\infty ),s\in (0,\eta )$,a neighborhood V of $x^{*}$
a continuous concave function $\varphi : [0,\eta )\rightarrow {\mathbb {R}}_{+}$,such that the following holds:
1. a).
  $\varphi $ is $C^{1}$ on $(0,\eta )$, $\varphi (0)=0$ and $\varphi ^{\prime }(s)>0$
2. b).
  $\forall x \in V$ satisfies $h(x^{*})<h<h(x^{*}+\eta )$, the KL inequality holds:
  $$\begin{aligned} {\qquad \varphi ^{\prime }\left( h(x)-h\left( x^{*}\right) \right) \mathrm{dist}(0, \partial h(x)) \ge 1} \end{aligned}$$
  (34)
Function h satisfies the KL property at each dom $\partial h$

Definition 3

(Semi-algebraic sets and functions [2])

A subset S of ${\mathbb {R}}^n$ is called semi-algebraic set if there exists a finite number of real polynomial functions $g_{ij}$, $g_{ij}^{'}$: ${\mathbb {R}}^n\rightarrow {\mathbb {R}}$ such that
$$\begin{aligned} S=\bigcup _{j=1}\bigcap _{i=1}\{x\in {\mathbb {R}}^n:g_{ij}(x)=0, g_{ij}^{'}(x)<0\} \end{aligned}$$
(35)
A function $h: {\mathbb {R}}^n \rightarrow (-\infty ,+\infty ]$ is semi-algebraic if its graph $\{(x,y)\in {\mathbb {R}}^n\times {\mathbb {R}},y=h(x)\}$ is a semi-algebraic set.

Secondly, following the conditions as described in Lemma 1, we check the sequence $\{\mathbf{J }^{k}\} =\{\mathbf{W }_{1}^{k},\mathbf{W} _{2}^{k},\ldots ,\mathbf{W} _{l}^{k}, \mathbf{H} _{l}^{k}\}$ generated by Algorithm 1 satisfies the conditions V1-V4.

Proof

(Condition V1): Forward–backward splitting framework is to solve the DGsnMF problem as described in (9). As for weighted matrix $\mathbf{W} _{i}\ (i \in 1,2,\ldots ,l)$, it is updated through the iteration as described in (19) and we have

$$\begin{aligned}&\frac{\mu _{i}^{k}}{2}\left\| \mathbf{W }_{i}^{k+1} -\left( \mathbf{W} _{i}^{k}-\frac{1}{\mu _{i}^{k}} \nabla _\mathbf{W _{i}}Q(\mathbf{W} _{1}^{k+1},\ldots , \mathbf{W} _{i}^{k},\ldots , \mathbf{W} _{l}^{k}, \mathbf{H} _{l}^{k})\right) \right\| _{F}^{2} +F(\mathbf{W} _{i}^{k+1})\nonumber \\&\quad \le \frac{\mu _{i}^{k}}{2}\Vert \frac{1}{\mu _{i}^{k}} \nabla _\mathbf{W _{i}}Q(\mathbf{W} _{1}^{k+1},\ldots ,\mathbf{W} _{i}^{k},\ldots , \mathbf{W} _{l}^{k},\mathbf{H} _{l}^{k}) \Vert _{F}^{2}+F(\mathbf{W} _{i}^{k}) \end{aligned}$$

(36)

then,

$$\begin{aligned} F(\mathbf{W} _{i}^{k})&\ge F(\mathbf{W} _{i}^{k+1}) +\frac{\mu _{i}^{k}}{2}\Vert \mathbf{W }_{i}^{k+1} -\mathbf{W} _{i}^{k}\Vert _{F}^{2}\nonumber \\&\quad +Tr\langle \mathbf{W} _{i}^{k+1}-\mathbf{W} _{i}^{k}, \nabla _\mathbf{W _{i}}Q(\mathbf{W} _{1}^{k+1},\ldots , \mathbf{W} _{i}^{k},\ldots , \mathbf{W} _{l}^{k},\mathbf{H} _{l}^{k})\rangle . \end{aligned}$$

(37)

It is noteworthy that $\nabla _\mathbf{W _{i}}Q$ is Lipschitz continuous and Q is the $C^1$ function. Let $L_\mathbf{W _{i}}$ denote the Lipschitz constant of gradient $\nabla _\mathbf{W _{i}}Q$ and we have

$$\begin{aligned}&Q(\mathbf{W} _{1}^{k+1},\ldots ,\mathbf{W} _{i}^{k+1},\ldots , \mathbf{W} _{l}^{k},\mathbf{H} _{l}^{k})\le Q (\mathbf{W} _{1}^{k+1},\ldots ,\mathbf{W} _{i}^{k},\ldots ,\mathbf{W} _{l}^{k}, \mathbf{H} _{l}^{k})\nonumber \\&\quad + Tr\langle \mathbf{W} _{i}^{k+1} -\mathbf{W} _{i}^{k},\nabla _\mathbf{W _{i}}Q(\mathbf{W} _{1}^{k+1},\ldots , \mathbf{W} _{i}^{k},\ldots , \mathbf{W} _{l}^{k},\mathbf{H} _{l}^{k})\rangle \nonumber \\&\qquad + \frac{L_\mathbf{W _{i}}}{2}\Vert \mathbf{W }_{i}^{k+1} -\mathbf{W} _{i}^{k}\Vert _{F}^{2}. \end{aligned}$$

(38)

Combining (37) and (38), we obtain the following inequality:

$$\begin{aligned}&F(\mathbf{W} _{i}^{k+1})+Q(\mathbf{W} _{1}^{k+1},\ldots , \mathbf{W} _{i}^{k+1},\ldots ,\mathbf{W} _{l}^{k},\mathbf{H} _{l}^{k}) \le F(\mathbf{W} _{i}^{k})\nonumber \\&\quad + Q(\mathbf{W} _{1}^{k+1},\ldots ,\mathbf{W} _{i}^{k},\ldots , \mathbf{W} _{l}^{k},\mathbf{H} _{l}^{k}) - \frac{\mu _{i}^{k} -L_\mathbf{W _{i}}}{2}\Vert \mathbf{W }_{i}^{k+1}-\mathbf{W} _{i}^{k}\Vert _{F}^{2}. \end{aligned}$$

(39)

Similarly, as for feature matrix $\mathbf{H} _{l}$, we have,

$$\begin{aligned}&Q(\mathbf{W} _{1}^{k+1},\ldots ,\mathbf{W} _{i}^{k+1},\ldots , \mathbf{W} _{l}^{k+1},\mathbf{H} _{l}^{k+1}) +G(\mathbf{H} _{l}^{k+1}) \le G(\mathbf{H} _{i}^{k})\nonumber \\&\quad + Q(\mathbf{W} _{1}^{k+1},\mathbf{W} _{2}^{k+1},\ldots , \mathbf{W} _{l}^{k+1},\mathbf{H} _{l}^{k}) - \frac{\nu ^{k} -L_\mathbf{H _{l}}}{2}\Vert \mathbf{H }_{l}^{k+1}-\mathbf{H} _{l}^{k}\Vert _{F}^{2}. \end{aligned}$$

(40)

The step sizes sequences $\mu _{i}^{k}$ and $\nu ^{k}$ are chosen as $0 < \frac{1}{\mu _{i}^{k}} \le \frac{1}{L_{Q}}$ and $0<\frac{1}{\nu ^{k}} \le \frac{1}{L_{Q}}$. Summing up (39) and (40), we can thereby obtain the following equation:

$$\begin{aligned}&\phi (\mathbf{W} _{1}^{k},\mathbf{W} _{2}^{k},\ldots ,\mathbf{W} _{l}^{k}, \mathbf{H} _{l}^{k})-\phi (\mathbf{W} _{1}^{k+1},\ldots ,\mathbf{W} _{i}^{k+1}, \ldots ,\mathbf{W} _{l}^{k+1},\mathbf{H} _{l}^{k+1})\nonumber \\&\quad \ge \sum _{i=1}^{l}\frac{\mu _{i}^{k}-L_\mathbf{W _{i}}}{2} \Vert \mathbf{W }_{i}^{k+1}-\mathbf{W} _{i}^{k}\Vert _{F}^{2} +\frac{\nu ^{k} -L_\mathbf{H _{l}}}{2}\Vert \mathbf{H }_{l}^{k+1}-\mathbf{H} _{l}^{k}\Vert _{F}^{2}. \end{aligned}$$

(41)

Let $a_{1}=(\mu _{i}^{k}-L_\mathbf{W _{i}})/2$ and $a_{2}=(\nu ^{k}-L_\mathbf{H _{l}})/2$. There exists $a:=(a_{1},a_{2}) > 0$ since ${\mu _{i}^{k}} >{L_{Q}}$ and ${\nu ^{k}} >{L_{Q}}$. Thus, $\{\mathbf{J }^{k}\}=\{\mathbf{W }_{1}^{k}, \mathbf{W} _{2}^{k},\ldots ,\mathbf{W} _{l}^{k},\mathbf{H} _{l}^{k}\}$ satisfy the condition V1. $\square $

Proof

(Condition V2): The iteration of weighted matrix $\mathbf{W} _{i}$ is presented as follows:

$$\begin{aligned} \mathbf{W} _{i}^{k+1}=\mathop {\arg \min }\limits _\mathbf{W _{i}} F(\mathbf{W} _{i})+ \hat{Q}(\mathbf{W} _{i})+\frac{\mu _{i}^{k}}{2} \Vert \mathbf{W }_{i}-\mathbf{W} _{i}^{k+1}\Vert _{F}^{2} \end{aligned}$$

(42)

where

$$\begin{aligned} \hat{Q}(\mathbf{W} _{i})&=Q(\mathbf{W} _{1}^{k+1},\ldots ,\mathbf{W} _{i}^{k},\ldots , \mathbf{W} _{l}^{k},\mathbf{H} _{l}^{k})\nonumber \\&\quad +Tr\langle \mathbf{W} _{i} -\mathbf{W} _{i}^{k+1}, \nabla _\mathbf{H _{l}}Q(\mathbf{W} _{1}^{k+1},\ldots ,\mathbf{W} _{i}^{k},\ldots , \mathbf{W} _{l}^{k},\mathbf{H} _{l}^{k})\rangle . \end{aligned}$$

(43)

Therefore, we have

$$\begin{aligned} 0\in \partial F(\mathbf{W} _{i}^{k+1})+ \nabla _\mathbf{W _{i}} Q(\mathbf{W} _{1}^{k+1},\ldots ,\mathbf{W} _{i}^{k},\ldots ,\mathbf{W} _{l}^{k}, \mathbf{H} _{l}^{k})+\mu _{i}^{k}(\mathbf{W} _{i}^{k+1}-\mathbf{W} _{i}^{k}). \end{aligned}$$

(44)

Then,

$$\begin{aligned}&\nabla _\mathbf{W _{i}}Q(\mathbf{J} ^{k+1})-\nabla _\mathbf{W _{i}} Q(\mathbf{W} _{1}^{k+1},\ldots ,\mathbf{W} _{i}^{k},\ldots ,\mathbf{W} _{l}^{k}, \mathbf{H} _{l}^{k})\nonumber \\&\qquad -\mu _{i}^{k}(\mathbf{W} _{i}^{k+1}-\mathbf{W} _{i}^{k}) \in \partial _\mathbf{W _{i}}\phi (\mathbf{J} ^{k+1}). \end{aligned}$$

(45)

For the feature matrix $\mathbf{H} _{l}$, the iteration has been presented in (16), we can obtain:

$$\begin{aligned}&\nabla _\mathbf{H _{l}}Q(\mathbf{J} ^{k+1})-\nabla _\mathbf{H _{l}} Q(\mathbf{W} _{1}^{k+1},\ldots ,\mathbf{W} _{i}^{k+1},\ldots ,\mathbf{W} _{l}^{k+1}, \mathbf{H} _{l}^{k})\nonumber \\&\quad -\nu ^{k}(\mathbf{H} _{l}^{k+1}-\mathbf{H} _{l}^{k}) \in \partial _\mathbf{H _{l}}\phi (\mathbf{J} ^{k+1}). \end{aligned}$$

(46)

Define: $\mathbf{N} ^{k}:=(\mathbf{N} _\mathbf{W _{i}}^{k}, \mathbf{N} _\mathbf{H _{l}}^{k})$

$$\begin{aligned} \mathbf{N} _\mathbf{W _{i}}^{k+1}&: = \nabla _\mathbf{W _{i}} Q(\mathbf{J} ^{k+1})-\nabla _\mathbf{W _{i}}Q(\mathbf{W} _{1}^{k+1},\ldots , \mathbf{W} _{i}^{k},\ldots ,\mathbf{W} _{l}^{k},\mathbf{H} _{l}^{k})\nonumber \\&\quad -\mu _{i}^{k}(\mathbf{W} _{i}^{k+1}-\mathbf{W} _{i}^{k}) \in \partial _\mathbf{W _{i}}\phi (\mathbf{J} ^{k+1}) \end{aligned}$$

(47)

$$\begin{aligned} \mathbf{N} _\mathbf{H _{l}}^{k+1}&: = \nabla _\mathbf{H _{l}}Q (\mathbf{J} ^{k+1})-\nabla _\mathbf{H _{l}}Q(\mathbf{W} _{1}^{k+1},\ldots , \mathbf{W} _{i}^{k+1},\ldots ,\mathbf{W} _{l}^{k+1},\mathbf{H} _{l}^{k})\nonumber \\&\quad -\nu ^{k}(\mathbf{H} _{l}^{k+1}-\mathbf{H} _{l}^{k}) \in \partial _\mathbf{H _{l}}\phi (\mathbf{J} ^{k+1}). \end{aligned}$$

(48)

If $\{\mathbf{J }^{k}\}$ is bounded, since $\nabla Q$ is Lipschitz continuous on any bounded set. Then, $\mathbf{N} ^{k} :=(\mathbf{N} _\mathbf{W _{i}}^{k},\mathbf{N} _\mathbf{H _{l}}^{k})\in \partial \phi (\mathbf{J} ^{k})$, exist positive constants $ p>0$ and satisfy the following inequality:

$$\begin{aligned} \Vert \mathbf{N }^{k+1}\Vert _{F}^{2}\le p\Vert \mathbf{J }^{k+1} -\mathbf{J} ^{k} \Vert _{F}^{2} \end{aligned}$$

(49)

and Condition V2 is proved. $\square $

Proof

(Condition V3): From V1, $\phi (\mathbf{J} )$ is decreasing and $inf \{\phi \}>-\infty $. Given a positive integer j, we can obtain the following inequality:

$$\begin{aligned} \phi (\mathbf{J} ^{0})-\phi (\mathbf{J} ^{j})\ge a\sum _{k=0}^{j} \Vert \mathbf{J }^{k}-\mathbf{J} ^{k+1}\Vert _{F}^{2}. \end{aligned}$$

(50)

Therefore, there exists some $\phi (\tilde{\mathbf{J }})$ such that $\phi (\mathbf{J} ^{j})\rightarrow \phi (\tilde{\mathbf{J }})$ as $j\rightarrow +\infty $. When $j\rightarrow +\infty $, we have

$$\begin{aligned} \sum _{k=0}^{j\rightarrow +\infty } \Vert \mathbf{J }^{k}-\mathbf{J} ^{k+1} \Vert _{F}^{2}< \phi (\mathbf{J} ^{0})-\phi (\mathbf{J} ^{j})<+\infty \end{aligned}$$

(51)

which means $\text {lim}\Vert \mathbf{J }^{k}-\mathbf{J} ^{k+1} \Vert _{F}^{2}=0$.

For weight matrices $\mathbf{W} _{i}$, from (42), we assume $k= k_{j}-1(j\rightarrow +\infty )$ and $\mathbf{W} _{i} =\tilde{\mathbf{W }}_{i}$ and obtain

$$\begin{aligned}&F(\mathbf{W} _{i}^{k_{j}})+ \hat{Q}(\mathbf{W} _{i}^{k_{j}}) +\frac{\mu _{i}^{k_{j}-1}}{2}\Vert \mathbf{W }_{i}^{k_{j}} -\mathbf{W} _{i}^{k_{j}-1}\Vert _{F}^{2}\nonumber \\&\quad \le F(\tilde{\mathbf{W }}_{i})+ \hat{Q}(\tilde{\mathbf{W }}_{i}) +\frac{\mu _{i}^{k_{j}-1}}{2}\Vert \tilde{\mathbf{W }}_{i} -\mathbf{W} _{i}^{k_{j}-1}\Vert _{F}^{2}. \end{aligned}$$

(52)

From Condition V1 and Lipschitz continuity of $\nabla Q$, we can obtain

$$\begin{aligned} \text {lim}\ \text {sup}_{j\rightarrow +\infty } F(\mathbf{W} _{i}^{k_{j}}) \le F(\tilde{\mathbf{W }}_{i}). \end{aligned}$$

(53)

Similarly, for feature matrix $\mathbf{H} _{l}$, we get

$$\begin{aligned} \text {lim}\ \text {sup}_{j\rightarrow +\infty } G(\mathbf{H} _{l}^{k_{j}}) \le G(\tilde{\mathbf{H }}_{l}). \end{aligned}$$

(54)

It should be noted that function F and G are lower semi-continuous, we have,

$$\begin{aligned} \text {lim}_{j\rightarrow +\infty } F(\mathbf{W} _{i}^{k_{j}}) =F(\tilde{\mathbf{W }}_{i}), \text {lim}_{j\rightarrow +\infty } G(\mathbf{H} _{l}^{k_{j}})=G(\tilde{\mathbf{H }}_{l}). \end{aligned}$$

(55)

Moreover, Q is continuous, we can thereby obtain the following equation:

$$\begin{aligned} \text {lim}_{j\rightarrow +\infty } \phi (\mathbf{J} ^{k_{j}}) =\phi (\tilde{\mathbf{J }}), \end{aligned}$$

(56)

and complete the proof of Condition V3. $\square $

Proof

(Condition V4): Q is a real polynomial function, so it is naturally a semi-algebraic function. The graph of indicator function $\delta _{S}$ can be rewritten as $\{(u,0):u\in S\}\cup \{(v,0):v\in \bar{S}\}$, set $S_{F}$ and $S_{G}$ are both nonempty closed semi-algebraic sets according to their definitions, therefore, functions F and G are semi-algebraic . Our objective function $\phi (\mathbf{W} _{1},\ldots ,\mathbf{W} _{l},\mathbf{H} _{l}) =Q(\mathbf{W} _{1},\ldots ,\mathbf{W} _{l},\mathbf{H} _{l}) +I(\mathbf{W} _{1},\ldots ,\mathbf{W} _{l},\mathbf{H} _{l})$ defined in (9) satisfies the KL property. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, H., Yang, Z., Li, Z. et al. A Converged Deep Graph Semi-NMF Algorithm for Learning Data Representation. Circuits Syst Signal Process 41, 1146–1165 (2022). https://doi.org/10.1007/s00034-021-01833-3

Download citation

Received: 15 July 2020
Revised: 23 August 2021
Accepted: 24 August 2021
Published: 06 October 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s00034-021-01833-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Converged Deep Graph Semi-NMF Algorithm for Learning Data Representation

Abstract

Access this article

Similar content being viewed by others

Graph convolutional networks: a comprehensive review

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

Attention-based graph neural networks: a survey

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Definition 1

Definition 2

Definition 3

Proof

Proof

Proof

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Converged Deep Graph Semi-NMF Algorithm for Learning Data Representation

Abstract

Access this article

Similar content being viewed by others

Graph convolutional networks: a comprehensive review

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

Attention-based graph neural networks: a survey

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

Definition 1

Definition 2

Definition 3

Proof

Proof

Proof

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation