Skip to main content
Log in

A Converged Deep Graph Semi-NMF Algorithm for Learning Data Representation

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

Deep nonnegative matrix factorization (DMF) is a particularly useful technique for learning data representation in low-dimensional space. To further obtain the complex hidden information and keep the geometrical structures of the high-dimensional data, we propose a novel deep matrix factorization model with the graph regularization (called DGsnMF). For solving the model with multi-variables, we design a forward–backward splitting scheme. After that, the convergence analysis is attached to the proposed algorithm and it is proved to converge to a critical point. Empirical experiments on benchmark datasets show that the proposed method is superior to the compared methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. \(C^{1}\) function: the first derivatives are continuous.

References

  1. S. Arora, N. Cohen, W. Hu, Y. Luo, Implicit regularization in deep matrix factorization. Adv. Neural Inf. Proces. Syst. (NeurIPS), 7413–7424 (2019)

  2. H. Attouch, J. Bolte, P. Redont, A. Soubeyran, Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)

    Article  MathSciNet  Google Scholar 

  3. H. Attouch, J. Bolte, B.F. Svaiter, Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. 137(1–2), 91–129 (2013)

    Article  MathSciNet  Google Scholar 

  4. Y. Bengio, Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)

    Article  MathSciNet  Google Scholar 

  5. J. Bolte, S. Sabach, M. Teboulle, Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)

    Article  MathSciNet  Google Scholar 

  6. D. Cai, X. He, J. Han, T.S. Huang, Graph regularized nonnegative matrix factorization for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1548–1560 (2010)

    Google Scholar 

  7. D. Cai, X. He, X. Wu, J. Han, in Proceedings of IEEE International Conference on Data Mining (ICDM). Non-negative matrix factorization on manifold (2008), pp. 63–72

  8. C.H. Ding, T. Li, M.I. Jordan, Convex and semi-nonnegative matrix factorizations. IEEE Trans. Pattern Anal. Mach. Intell. 32(1), 45–55 (2008)

    Article  Google Scholar 

  9. H. Fang, A. Li, H. Xu, T. Wang, Sparsity-constrained deep nonnegative matrix factorization for hyperspectral unmixing. IEEE Geosci. Remote. Sens. Lett. 15(7), 1105–1109 (2018)

    Article  Google Scholar 

  10. X.-R. Feng, H.-C. Li, J. Li, Q. Du, A. Plaza, W.J. Emery, Hyperspectral unmixing using sparsity-constrained deep nonnegative matrix factorization with total variation. IEEE Trans. Geosci. Remote Sens. 56(10), 6245–6257 (2018)

    Article  Google Scholar 

  11. N. Guan, D. Tao, Z. Luo, B. Yuan, Nenmf: an optimal gradient method for nonnegative matrix factorization. IEEE Trans. Sig. Process. 60(6), 2882–2898 (2012)

    Article  MathSciNet  Google Scholar 

  12. G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)

    Article  MathSciNet  Google Scholar 

  13. S. Huang, Z. Kang, I.W. Tsang, Z. Xu, Auto-weighted multi-view clustering via kernelized graph learning. Patt. Recogn. 97, 107015 (2020)

  14. H. Huang, N. Liang, W. Yan, Z. Yang, Z. Li, W. Sun, in Proceedings of IEEE International Conference on Data Mining Workshops. Partially shared semi-supervised deep matrix factorization with multi-view data (2020), pp. 564–570

  15. Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521(7553), 436–444 (2015)

    Article  Google Scholar 

  16. D.D. Lee, H.S. Seung, Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)

    Article  Google Scholar 

  17. Z. Li, S. Ding, W. Chen, Z. Yang, S. Xie, Proximal alternating minimization for analysis dictionary learning and convergence analysis. IEEE Trans. Emerg. Topics Comput. Intel. 2(6), 439–449 (2018)

    Article  Google Scholar 

  18. Z. Li, S. Ding, Y. Li, Z. Yang, S. Xie, W. Chen, Manifold optimization-based analysis dictionary learning with an \(l1/\) 2-norm regularizer. Neural Netw. 98, 212–222 (2018)

    Article  Google Scholar 

  19. Z. Li, J. Tang, X. He, Robust structured nonnegative matrix factorization for image representation. IEEE Trans. Neural Netw. Learn. Syst. 29(5), 1947–1960 (2018)

    Article  MathSciNet  Google Scholar 

  20. Z. Li, M. Xu, J. Nie, J. Kang, W. Chen, S. Xie, NOMA-Enabled cooperative computation offloading for blockchain-empowered internet of things: a learning approach. IEEE Intern. Things J. 8(4), 2364–2378 (2021)

    Article  Google Scholar 

  21. H.-C. Li, G. Yang, W. Yang, Q. Du, W.J. Emery, Deep nonsmooth nonnegative matrix factorization network factorization network with semi-supervised learning for SAR image change detection. ISPRS J. Photogramm. Remote Sens. 160, 167–179 (2020)

    Article  Google Scholar 

  22. N. Liang, Z. Yang, Z. Li, W. Sun, S. Xie, Multi-view clustering by non-negative matrix factorization with co-orthogonal constraints. Knowl. Based Syst. 194, 105582 (2020)

    Article  Google Scholar 

  23. N. Liang, Z. Yang, Z. Li, S. Xie, C. Su, Semi-supervised multi-view clustering with graph-regularized partially shared non-negative matrix factorization. Knowl. Based Syst. 190, 105185 (2020)

    Article  Google Scholar 

  24. M. Lyons, S. Akamatsu, M. Kamachi, J. Gyoba, in International Conference on Automatic Face and Gesture Recognition. Coding facial expressions with gabor wavelets (1998), pp. 200–205

  25. Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course (Kluwer Academic, Boston, MA, 2004)

    Book  Google Scholar 

  26. G. Peng, Joint and direct optimization for dictionary learning in convolutional sparse representation. IEEE Trans. Neural Netw. Learn. Syst. 31(2), 559–573 (2020)

    Article  MathSciNet  Google Scholar 

  27. H.A. Song, B.-K. Kim, T.L. Xuan, S.-Y. Lee, Hierarchical feature extraction by multi-layer non-negative matrix factorization network for classification task. Neurocomputing 165, 63–74 (2015)

    Article  Google Scholar 

  28. G. Trigeorgis, K. Bousmalis, S. Zafeiriou, B.W. Schuller, A deep matrix factorization method for learning attribute representations. IEEE Trans. Pattern Anal. Mach. Intell. 39(3), 417–429 (2017)

    Article  Google Scholar 

  29. W. Xu, X. Liu, Y. Gong, in Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval. Document clustering based on non-negative matrix factorization (2003), pp. 267–273

  30. W. Yan, B. Zhang, S. Ma, Z. Yang, A novel regularized concept factorization for document clustering. Knowl. Based Syst. 135, 147–158 (2017)

    Article  Google Scholar 

  31. Z. Yang, Y. Hu, N. Liang, J. Lv, Nonnegative matrix factorization with fixed L2-norm constraint. Circ. Syst. Sig. Process. 38(7), 3211–3226 (2019)

    Article  Google Scholar 

  32. Z. Yang, Y. Xiang, K. Xie, S. Xie, Adaptive method for nonsmooth nonnegative matrix factorization. IEEE Trans. Neural Netw. Learn. Syst. 28(4), 948–960 (2017)

    Article  Google Scholar 

  33. Z. Yang, Y. Zhang, Y. Xiang, W. Yan, S. Xie, Non-negative matrix factorization with dual constraints for image clustering. IEEE Trans. Syst. Man Cybern. Syst. 50(7), 2524–2533 (2020)

    Article  Google Scholar 

  34. F. Ye, C. Chen, Z. Zheng, in International Conference on Information and Knowledge Management (CKIM). Deep autoencoder-like nonnegative matrix factorization for community detection (2018), pp. 1393–1402

  35. F. Ye, C. Chen, Z. Zheng, R. Li, J. X. Yu, in Proceedings of IEEE International Conference on Data Mining (ICDM). Discrete overlapping community detection with pseudo supervision (2019), pp. 708–717

  36. J. Yu, G. Zhou, A. Cichocki, S. Xie, Learning the hierarchical parts of objects by deep non-smooth nonnegative matrix factorization. IEEE Access 6, 58096–58105 (2018)

    Article  Google Scholar 

  37. S. Zafeiriou, A. Tefas, I. Buciu, I. Pitas, Exploiting discriminant information in nonnegative matrix factorization with application to frontal face verification. IEEE Trans. Neural Netw. 17(3), 683–695 (2006)

    Article  Google Scholar 

  38. H. Zhao, Z. Ding, Y. Fu, in AAAI Conference on Artificial Intelligence (AAAI). Multi-view clustering via deep matrix factorization (2017), pp. 1288–1293

  39. T. Zhu, Sparse dictionary learning by block proximal gradient with global convergence. Neurocomputing 367, 226–235 (2019)

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Dr. Jinshi Yu for his helpful discussion and source code [36]. This work was supported by the Key-Area Research and Development Program of Guangdong Province under Grant 2019B010154002, the Science and Technology Plan Project of Guangzhou under Grant 202002030289, the National Natural Science Foundation of China under Grant Nos. 61801133, 61722304, and 61803096.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zuyuan Yang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

To prove Theorem 1, we firstly define the critical points for nonconvex and nonsmooth functions, semi-algebraic functions, and KL functions used for the convergence analysis.

Definition 1

For a proper and lower semi-continuous function h,

  • The Fréchet subdifferential of h at x can be written as \(\hat{\partial } h(x)\) is a set \(\mathbf{u} \) and satisfies the following equation if \(x\in \) dom h

    $$\begin{aligned} \left\{ \mathbf{u }\in {\mathbb {R}}^d: \mathop {\lim \inf }\limits _{y\rightarrow x \ y\ne x} \frac{h(y)-h(x)-\langle u,y-x\rangle }{\Vert y-x \Vert }\ge 0\right\} \end{aligned}$$
    (31)

    and \(\hat{\partial } h(x)= \emptyset \) when \(x\not \in \) dom h

  • The limiting subdifferential of h at x, written as \(\partial h(x)\), is a set \(\mathbf{u} \) defined as follows:

    $$\begin{aligned} \left\{ \mathbf{u }\in {\mathbb {R}}^d: \exists x^k \rightarrow x, h(x^k)\rightarrow h(x), \mathbf{u} ^k \in \hat{\partial } h(x)\rightarrow \mathbf{u} , k \rightarrow \infty \right\} \end{aligned}$$
    (32)
  • Point x satisfies the following equation and it is called limiting-critical point of function h:

    $$\begin{aligned} 0 \in \partial h(x) \end{aligned}$$
    (33)

Definition 2

(Kurdyka–Łojasiewicz Property) A function h has the KL property at \(x^{*}\in \) dom \(\partial h\) if there exist:

  • \(\eta \in (0,\infty ),s\in (0,\eta )\),a neighborhood V of \(x^{*}\)

  • a continuous concave function \(\varphi : [0,\eta )\rightarrow {\mathbb {R}}_{+}\),such that the following holds:

    1. a).

      \(\varphi \) is \(C^{1}\) on \((0,\eta )\), \(\varphi (0)=0\) and \(\varphi ^{\prime }(s)>0\)

    2. b).

      \(\forall x \in V\) satisfies \(h(x^{*})<h<h(x^{*}+\eta )\), the KL inequality holds:

      $$\begin{aligned} {\qquad \varphi ^{\prime }\left( h(x)-h\left( x^{*}\right) \right) \mathrm{dist}(0, \partial h(x)) \ge 1} \end{aligned}$$
      (34)
  • Function h satisfies the KL property at each dom \(\partial h\)

Definition 3

(Semi-algebraic sets and functions [2])

  • A subset S of \({\mathbb {R}}^n\) is called semi-algebraic set if there exists a finite number of real polynomial functions \(g_{ij}\), \(g_{ij}^{'}\): \({\mathbb {R}}^n\rightarrow {\mathbb {R}}\) such that

    $$\begin{aligned} S=\bigcup _{j=1}\bigcap _{i=1}\{x\in {\mathbb {R}}^n:g_{ij}(x)=0, g_{ij}^{'}(x)<0\} \end{aligned}$$
    (35)
  • A function \(h: {\mathbb {R}}^n \rightarrow (-\infty ,+\infty ]\) is semi-algebraic if its graph \(\{(x,y)\in {\mathbb {R}}^n\times {\mathbb {R}},y=h(x)\}\) is a semi-algebraic set.

Secondly, following the conditions as described in Lemma 1, we check the sequence \(\{\mathbf{J }^{k}\} =\{\mathbf{W }_{1}^{k},\mathbf{W} _{2}^{k},\ldots ,\mathbf{W} _{l}^{k}, \mathbf{H} _{l}^{k}\}\) generated by Algorithm 1 satisfies the conditions V1-V4.

Proof

(Condition V1): Forward–backward splitting framework is to solve the DGsnMF problem as described in (9). As for weighted matrix \(\mathbf{W} _{i}\ (i \in 1,2,\ldots ,l)\), it is updated through the iteration as described in (19) and we have

$$\begin{aligned}&\frac{\mu _{i}^{k}}{2}\left\| \mathbf{W }_{i}^{k+1} -\left( \mathbf{W} _{i}^{k}-\frac{1}{\mu _{i}^{k}} \nabla _\mathbf{W _{i}}Q(\mathbf{W} _{1}^{k+1},\ldots , \mathbf{W} _{i}^{k},\ldots , \mathbf{W} _{l}^{k}, \mathbf{H} _{l}^{k})\right) \right\| _{F}^{2} +F(\mathbf{W} _{i}^{k+1})\nonumber \\&\quad \le \frac{\mu _{i}^{k}}{2}\Vert \frac{1}{\mu _{i}^{k}} \nabla _\mathbf{W _{i}}Q(\mathbf{W} _{1}^{k+1},\ldots ,\mathbf{W} _{i}^{k},\ldots , \mathbf{W} _{l}^{k},\mathbf{H} _{l}^{k}) \Vert _{F}^{2}+F(\mathbf{W} _{i}^{k}) \end{aligned}$$
(36)

then,

$$\begin{aligned} F(\mathbf{W} _{i}^{k})&\ge F(\mathbf{W} _{i}^{k+1}) +\frac{\mu _{i}^{k}}{2}\Vert \mathbf{W }_{i}^{k+1} -\mathbf{W} _{i}^{k}\Vert _{F}^{2}\nonumber \\&\quad +Tr\langle \mathbf{W} _{i}^{k+1}-\mathbf{W} _{i}^{k}, \nabla _\mathbf{W _{i}}Q(\mathbf{W} _{1}^{k+1},\ldots , \mathbf{W} _{i}^{k},\ldots , \mathbf{W} _{l}^{k},\mathbf{H} _{l}^{k})\rangle . \end{aligned}$$
(37)

It is noteworthy that \(\nabla _\mathbf{W _{i}}Q\) is Lipschitz continuous and Q is the \(C^1\) function. Let \(L_\mathbf{W _{i}}\) denote the Lipschitz constant of gradient \(\nabla _\mathbf{W _{i}}Q\) and we have

$$\begin{aligned}&Q(\mathbf{W} _{1}^{k+1},\ldots ,\mathbf{W} _{i}^{k+1},\ldots , \mathbf{W} _{l}^{k},\mathbf{H} _{l}^{k})\le Q (\mathbf{W} _{1}^{k+1},\ldots ,\mathbf{W} _{i}^{k},\ldots ,\mathbf{W} _{l}^{k}, \mathbf{H} _{l}^{k})\nonumber \\&\quad + Tr\langle \mathbf{W} _{i}^{k+1} -\mathbf{W} _{i}^{k},\nabla _\mathbf{W _{i}}Q(\mathbf{W} _{1}^{k+1},\ldots , \mathbf{W} _{i}^{k},\ldots , \mathbf{W} _{l}^{k},\mathbf{H} _{l}^{k})\rangle \nonumber \\&\qquad + \frac{L_\mathbf{W _{i}}}{2}\Vert \mathbf{W }_{i}^{k+1} -\mathbf{W} _{i}^{k}\Vert _{F}^{2}. \end{aligned}$$
(38)

Combining (37) and (38), we obtain the following inequality:

$$\begin{aligned}&F(\mathbf{W} _{i}^{k+1})+Q(\mathbf{W} _{1}^{k+1},\ldots , \mathbf{W} _{i}^{k+1},\ldots ,\mathbf{W} _{l}^{k},\mathbf{H} _{l}^{k}) \le F(\mathbf{W} _{i}^{k})\nonumber \\&\quad + Q(\mathbf{W} _{1}^{k+1},\ldots ,\mathbf{W} _{i}^{k},\ldots , \mathbf{W} _{l}^{k},\mathbf{H} _{l}^{k}) - \frac{\mu _{i}^{k} -L_\mathbf{W _{i}}}{2}\Vert \mathbf{W }_{i}^{k+1}-\mathbf{W} _{i}^{k}\Vert _{F}^{2}. \end{aligned}$$
(39)

Similarly, as for feature matrix \(\mathbf{H} _{l}\), we have,

$$\begin{aligned}&Q(\mathbf{W} _{1}^{k+1},\ldots ,\mathbf{W} _{i}^{k+1},\ldots , \mathbf{W} _{l}^{k+1},\mathbf{H} _{l}^{k+1}) +G(\mathbf{H} _{l}^{k+1}) \le G(\mathbf{H} _{i}^{k})\nonumber \\&\quad + Q(\mathbf{W} _{1}^{k+1},\mathbf{W} _{2}^{k+1},\ldots , \mathbf{W} _{l}^{k+1},\mathbf{H} _{l}^{k}) - \frac{\nu ^{k} -L_\mathbf{H _{l}}}{2}\Vert \mathbf{H }_{l}^{k+1}-\mathbf{H} _{l}^{k}\Vert _{F}^{2}. \end{aligned}$$
(40)

The step sizes sequences \(\mu _{i}^{k}\) and \(\nu ^{k}\) are chosen as \(0 < \frac{1}{\mu _{i}^{k}} \le \frac{1}{L_{Q}}\) and \(0<\frac{1}{\nu ^{k}} \le \frac{1}{L_{Q}}\). Summing up (39) and (40), we can thereby obtain the following equation:

$$\begin{aligned}&\phi (\mathbf{W} _{1}^{k},\mathbf{W} _{2}^{k},\ldots ,\mathbf{W} _{l}^{k}, \mathbf{H} _{l}^{k})-\phi (\mathbf{W} _{1}^{k+1},\ldots ,\mathbf{W} _{i}^{k+1}, \ldots ,\mathbf{W} _{l}^{k+1},\mathbf{H} _{l}^{k+1})\nonumber \\&\quad \ge \sum _{i=1}^{l}\frac{\mu _{i}^{k}-L_\mathbf{W _{i}}}{2} \Vert \mathbf{W }_{i}^{k+1}-\mathbf{W} _{i}^{k}\Vert _{F}^{2} +\frac{\nu ^{k} -L_\mathbf{H _{l}}}{2}\Vert \mathbf{H }_{l}^{k+1}-\mathbf{H} _{l}^{k}\Vert _{F}^{2}. \end{aligned}$$
(41)

Let \(a_{1}=(\mu _{i}^{k}-L_\mathbf{W _{i}})/2\) and \(a_{2}=(\nu ^{k}-L_\mathbf{H _{l}})/2\). There exists \(a:=(a_{1},a_{2}) > 0\) since \({\mu _{i}^{k}} >{L_{Q}}\) and \({\nu ^{k}} >{L_{Q}}\). Thus, \(\{\mathbf{J }^{k}\}=\{\mathbf{W }_{1}^{k}, \mathbf{W} _{2}^{k},\ldots ,\mathbf{W} _{l}^{k},\mathbf{H} _{l}^{k}\}\) satisfy the condition V1. \(\square \)

Proof

(Condition V2): The iteration of weighted matrix \(\mathbf{W} _{i}\) is presented as follows:

$$\begin{aligned} \mathbf{W} _{i}^{k+1}=\mathop {\arg \min }\limits _\mathbf{W _{i}} F(\mathbf{W} _{i})+ \hat{Q}(\mathbf{W} _{i})+\frac{\mu _{i}^{k}}{2} \Vert \mathbf{W }_{i}-\mathbf{W} _{i}^{k+1}\Vert _{F}^{2} \end{aligned}$$
(42)

where

$$\begin{aligned} \hat{Q}(\mathbf{W} _{i})&=Q(\mathbf{W} _{1}^{k+1},\ldots ,\mathbf{W} _{i}^{k},\ldots , \mathbf{W} _{l}^{k},\mathbf{H} _{l}^{k})\nonumber \\&\quad +Tr\langle \mathbf{W} _{i} -\mathbf{W} _{i}^{k+1}, \nabla _\mathbf{H _{l}}Q(\mathbf{W} _{1}^{k+1},\ldots ,\mathbf{W} _{i}^{k},\ldots , \mathbf{W} _{l}^{k},\mathbf{H} _{l}^{k})\rangle . \end{aligned}$$
(43)

Therefore, we have

$$\begin{aligned} 0\in \partial F(\mathbf{W} _{i}^{k+1})+ \nabla _\mathbf{W _{i}} Q(\mathbf{W} _{1}^{k+1},\ldots ,\mathbf{W} _{i}^{k},\ldots ,\mathbf{W} _{l}^{k}, \mathbf{H} _{l}^{k})+\mu _{i}^{k}(\mathbf{W} _{i}^{k+1}-\mathbf{W} _{i}^{k}). \end{aligned}$$
(44)

Then,

$$\begin{aligned}&\nabla _\mathbf{W _{i}}Q(\mathbf{J} ^{k+1})-\nabla _\mathbf{W _{i}} Q(\mathbf{W} _{1}^{k+1},\ldots ,\mathbf{W} _{i}^{k},\ldots ,\mathbf{W} _{l}^{k}, \mathbf{H} _{l}^{k})\nonumber \\&\qquad -\mu _{i}^{k}(\mathbf{W} _{i}^{k+1}-\mathbf{W} _{i}^{k}) \in \partial _\mathbf{W _{i}}\phi (\mathbf{J} ^{k+1}). \end{aligned}$$
(45)

For the feature matrix \(\mathbf{H} _{l}\), the iteration has been presented in (16), we can obtain:

$$\begin{aligned}&\nabla _\mathbf{H _{l}}Q(\mathbf{J} ^{k+1})-\nabla _\mathbf{H _{l}} Q(\mathbf{W} _{1}^{k+1},\ldots ,\mathbf{W} _{i}^{k+1},\ldots ,\mathbf{W} _{l}^{k+1}, \mathbf{H} _{l}^{k})\nonumber \\&\quad -\nu ^{k}(\mathbf{H} _{l}^{k+1}-\mathbf{H} _{l}^{k}) \in \partial _\mathbf{H _{l}}\phi (\mathbf{J} ^{k+1}). \end{aligned}$$
(46)

Define: \(\mathbf{N} ^{k}:=(\mathbf{N} _\mathbf{W _{i}}^{k}, \mathbf{N} _\mathbf{H _{l}}^{k})\)

$$\begin{aligned} \mathbf{N} _\mathbf{W _{i}}^{k+1}&: = \nabla _\mathbf{W _{i}} Q(\mathbf{J} ^{k+1})-\nabla _\mathbf{W _{i}}Q(\mathbf{W} _{1}^{k+1},\ldots , \mathbf{W} _{i}^{k},\ldots ,\mathbf{W} _{l}^{k},\mathbf{H} _{l}^{k})\nonumber \\&\quad -\mu _{i}^{k}(\mathbf{W} _{i}^{k+1}-\mathbf{W} _{i}^{k}) \in \partial _\mathbf{W _{i}}\phi (\mathbf{J} ^{k+1}) \end{aligned}$$
(47)
$$\begin{aligned} \mathbf{N} _\mathbf{H _{l}}^{k+1}&: = \nabla _\mathbf{H _{l}}Q (\mathbf{J} ^{k+1})-\nabla _\mathbf{H _{l}}Q(\mathbf{W} _{1}^{k+1},\ldots , \mathbf{W} _{i}^{k+1},\ldots ,\mathbf{W} _{l}^{k+1},\mathbf{H} _{l}^{k})\nonumber \\&\quad -\nu ^{k}(\mathbf{H} _{l}^{k+1}-\mathbf{H} _{l}^{k}) \in \partial _\mathbf{H _{l}}\phi (\mathbf{J} ^{k+1}). \end{aligned}$$
(48)

If \(\{\mathbf{J }^{k}\}\) is bounded, since \(\nabla Q\) is Lipschitz continuous on any bounded set. Then, \(\mathbf{N} ^{k} :=(\mathbf{N} _\mathbf{W _{i}}^{k},\mathbf{N} _\mathbf{H _{l}}^{k})\in \partial \phi (\mathbf{J} ^{k})\), exist positive constants \( p>0\) and satisfy the following inequality:

$$\begin{aligned} \Vert \mathbf{N }^{k+1}\Vert _{F}^{2}\le p\Vert \mathbf{J }^{k+1} -\mathbf{J} ^{k} \Vert _{F}^{2} \end{aligned}$$
(49)

and Condition V2 is proved. \(\square \)

Proof

(Condition V3): From V1, \(\phi (\mathbf{J} )\) is decreasing and \(inf \{\phi \}>-\infty \). Given a positive integer j, we can obtain the following inequality:

$$\begin{aligned} \phi (\mathbf{J} ^{0})-\phi (\mathbf{J} ^{j})\ge a\sum _{k=0}^{j} \Vert \mathbf{J }^{k}-\mathbf{J} ^{k+1}\Vert _{F}^{2}. \end{aligned}$$
(50)

Therefore, there exists some \(\phi (\tilde{\mathbf{J }})\) such that \(\phi (\mathbf{J} ^{j})\rightarrow \phi (\tilde{\mathbf{J }})\) as \(j\rightarrow +\infty \). When \(j\rightarrow +\infty \), we have

$$\begin{aligned} \sum _{k=0}^{j\rightarrow +\infty } \Vert \mathbf{J }^{k}-\mathbf{J} ^{k+1} \Vert _{F}^{2}< \phi (\mathbf{J} ^{0})-\phi (\mathbf{J} ^{j})<+\infty \end{aligned}$$
(51)

which means \(\text {lim}\Vert \mathbf{J }^{k}-\mathbf{J} ^{k+1} \Vert _{F}^{2}=0\).

For weight matrices \(\mathbf{W} _{i}\), from (42), we assume \(k= k_{j}-1(j\rightarrow +\infty )\) and \(\mathbf{W} _{i} =\tilde{\mathbf{W }}_{i}\) and obtain

$$\begin{aligned}&F(\mathbf{W} _{i}^{k_{j}})+ \hat{Q}(\mathbf{W} _{i}^{k_{j}}) +\frac{\mu _{i}^{k_{j}-1}}{2}\Vert \mathbf{W }_{i}^{k_{j}} -\mathbf{W} _{i}^{k_{j}-1}\Vert _{F}^{2}\nonumber \\&\quad \le F(\tilde{\mathbf{W }}_{i})+ \hat{Q}(\tilde{\mathbf{W }}_{i}) +\frac{\mu _{i}^{k_{j}-1}}{2}\Vert \tilde{\mathbf{W }}_{i} -\mathbf{W} _{i}^{k_{j}-1}\Vert _{F}^{2}. \end{aligned}$$
(52)

From Condition V1 and Lipschitz continuity of \(\nabla Q\), we can obtain

$$\begin{aligned} \text {lim}\ \text {sup}_{j\rightarrow +\infty } F(\mathbf{W} _{i}^{k_{j}}) \le F(\tilde{\mathbf{W }}_{i}). \end{aligned}$$
(53)

Similarly, for feature matrix \(\mathbf{H} _{l}\), we get

$$\begin{aligned} \text {lim}\ \text {sup}_{j\rightarrow +\infty } G(\mathbf{H} _{l}^{k_{j}}) \le G(\tilde{\mathbf{H }}_{l}). \end{aligned}$$
(54)

It should be noted that function F and G are lower semi-continuous, we have,

$$\begin{aligned} \text {lim}_{j\rightarrow +\infty } F(\mathbf{W} _{i}^{k_{j}}) =F(\tilde{\mathbf{W }}_{i}), \text {lim}_{j\rightarrow +\infty } G(\mathbf{H} _{l}^{k_{j}})=G(\tilde{\mathbf{H }}_{l}). \end{aligned}$$
(55)

Moreover, Q is continuous, we can thereby obtain the following equation:

$$\begin{aligned} \text {lim}_{j\rightarrow +\infty } \phi (\mathbf{J} ^{k_{j}}) =\phi (\tilde{\mathbf{J }}), \end{aligned}$$
(56)

and complete the proof of Condition V3. \(\square \)

Proof

(Condition V4): Q is a real polynomial function, so it is naturally a semi-algebraic function. The graph of indicator function \(\delta _{S}\) can be rewritten as \(\{(u,0):u\in S\}\cup \{(v,0):v\in \bar{S}\}\), set \(S_{F}\) and \(S_{G}\) are both nonempty closed semi-algebraic sets according to their definitions, therefore, functions F and G are semi-algebraic . Our objective function \(\phi (\mathbf{W} _{1},\ldots ,\mathbf{W} _{l},\mathbf{H} _{l}) =Q(\mathbf{W} _{1},\ldots ,\mathbf{W} _{l},\mathbf{H} _{l}) +I(\mathbf{W} _{1},\ldots ,\mathbf{W} _{l},\mathbf{H} _{l})\) defined in (9) satisfies the KL property. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, H., Yang, Z., Li, Z. et al. A Converged Deep Graph Semi-NMF Algorithm for Learning Data Representation. Circuits Syst Signal Process 41, 1146–1165 (2022). https://doi.org/10.1007/s00034-021-01833-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-021-01833-3

Keywords

Navigation