Skip to main content
Log in

A unified global convergence analysis of multiplicative update rules for nonnegative matrix factorization

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

Multiplicative update rules are a well-known computational method for nonnegative matrix factorization. Depending on the error measure between two matrices, various types of multiplicative update rules have been proposed so far. However, their convergence properties are not fully understood. This paper provides a sufficient condition for a general multiplicative update rule to have the global convergence property in the sense that any sequence of solutions has at least one convergent subsequence and the limit of any convergent subsequence is a stationary point of the optimization problem. Using this condition, it is proved that many of the existing multiplicative update rules have the global convergence property if they are modified slightly so that all variables take positive values. This paper also proposes new multiplicative update rules based on Kullback–Leibler, Gamma, and Rényi divergences. It is shown that these three rules have the global convergence property if the same modification as above is made.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

Notes

  1. The error function based on Kullback–Leibler divergence in Table 1 is slightly different from the one in [45]. Instead of assuming that \(\sum _{ij} X_{ij}=1\), \(X_{ij}\) has been replaced with \(X_{ij}/\sum _{pq}X_{pq}\) so that the result can be applied to a general nonnegative matrix \(\varvec{X}\).

  2. In the case of Kullback–Leibler divergence, \(X_{ij}\) in \(f_{ik}(\varvec{W},\varvec{H})\) must be replaced with \(X_{ij}/\sum _{pq}X_{pq}\).

References

  1. Badeau, R., Bertin, N., Vincent, E.: Stability analysis of multiplicative update algorithms and application to nonnegative matrix factorization. IEEE Trans. Neural Netw. 21(12), 1869–1881 (2010)

    Article  Google Scholar 

  2. Berman, A., Plemmons, R.: Nonnegative Matrices in the Mathematical Sciences. Academic Press, New York (1979)

    MATH  Google Scholar 

  3. Berry, M.W., Browne, M.: Email surveillance using non-negative matrix factorization. Comput. Math. Organ. Theory 11, 249–264 (2005)

    Article  MATH  Google Scholar 

  4. Campbell, S.L., Poole, G.D.: Computing nonnegative rank factorizations. Linear Algebra Appl. 35, 175–182 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  5. Chen, J.C.: The nonnegative rank factorizations of nonnegative matrices. Linear Algebra Appl. 62, 207–217 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  6. Chi, E.C., Kolda, T.G.: On tensors, sparsity, and nonnegative factorizations. SIAM J. Matrix Anal. Appl. 33(4), 1272–1299 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  7. Cichocki, A., Lee, H., Kim, Y.D., Choi, S.: Non-negative matrix factorization with \(\alpha \)-divergence. Pattern Recognit. Lett. 29(9), 1433–1440 (2008)

    Article  Google Scholar 

  8. Cichocki, A., Phan, A.H.: Fast local algorithms for large scale nonnegative matrix and tensor factorization. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. E92–A(3), 708–721 (2009)

    Article  Google Scholar 

  9. Cichocki, A., Zdunek, R., Amari, S.I.: Hierarchical ALS algorithms for nonnegative matrix and 3D tensor factorization. In: Lecture Notes in Computer Science, vol. 4666, pp. 169–176. Springer (2007)

  10. Cichocki, A., Zdunek, R., Phan, A.H., Amari, S.I.: Nonnegative Matrix and Tensor Factorizations. Wiley, West Sussex (2009)

    Book  Google Scholar 

  11. Dhillon, I.S., Sra, S.: Generalized nonnegative matrix approximations with Bregman divergences. In: Advances in Neural Information Processing Systems, pp. 283–290 (2005)

  12. Févotte, C., Bertin, N., Durrieu, J.L.: Nonnegative matrix factorization with the Itakura–Saito divergence: with application to music analysis. Neural Comput. 21(3), 793–830 (2009)

    Article  MATH  Google Scholar 

  13. Févotte, C., Idier, J.: Algorithms for nonnegative matrix factorization with the \(\beta \)-divergence. Neural Comput. 23(9), 2421–2456 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  14. Finesso, L., Spreij, P.: Nonnegative matrix factorization and I-divergence alternating minimization. Linear Algebra Appl. 416, 270–287 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  15. Gillis, N., Glineur, F.: Nonnegative factorization and the maximum edge biclique problem. arXiv e-prints (2008)

  16. Gonzalez, E.F., Zhang, Y.: Accelerating the Lee-Seung algorithm for non-negative matrix factorization. Dept. Comput. & Appl. Math., Rice Univ., Houston, TX, Tech. Rep. TR-05-02 (2005)

  17. Guan, N., Tao, D., Luo, Z., Yuan, B.: NeNMF: an optimal gradient method for nonnegative matrix factorization. IEEE Trans. Signal Process. 60(6), 2882–2898 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  18. Guillamet, D., Vitria, J.: Non-negative matrix factorization for face recognition. In: Lecture Notes in Artificial Intelligence, pp. 336–344. Springer (2002)

  19. Hansen, S., Plantenga, T., Kolda, T.G.: Newton-based optimization for Kullback–Leibler nonnegative tensor factorizations. Optim. Methods Softw. 30(5), 1002–1029 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  20. Holzapfel, A., Stylianou, Y.: Musical genre classification using nonnegative matrix factorization-based features. IEEE Trans. Audio Speech Lang. Process. 16(2), 424–434 (2008)

    Article  Google Scholar 

  21. Hsieh, C.J., Dhillon, I.S.: Fast coordinate descent methods with variable selection for non-negative matrix factorization. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1064–1072. ACM (2011)

  22. Katayama, J., Takahashi, N., Takeuchi, J.: Boundedness of modified multiplicative updates for nonnegative matrix factorization. In: Proceedings of the Fifth International Workshop on Computational Advances in Multi-Sensor Adaptive Processing, pp. 252–255 (2013)

  23. Kim, D., Sra, S., Dhillon, I.S.: Fast newton-type methods for the least squares nonnegative matrix approximation problem. In: Proceedings of the Sixth SIAM International Conference on Data Mining, pp. 343–354. SIAM (2007)

  24. Kim, H., Park, H.: Nonnegative matrix factorization based on alternating nonnegativity constrained least squares and active set method. SIAM J. Matrix Anal. Appl. 30(2), 713–730 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  25. Kim, J., He, Y., Park, H.: Algorithms for nonnegative matrix and tensor factorization: a unified view based on block coordinate descent framework. J. Global Optim. 58(2), 285–319 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  26. Kimura, T., Takahashi, N.: Global convergence of a modified HALS algorithm for nonnegative matrix factorization. In: Proceedings of 2015 IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing, pp. 21–24 (2015)

  27. Kompass, R.: A generalized divergence measure for nonnegative matrix factorization. Neural Comput. 19(3), 780–791 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  28. Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–792 (1999)

    Article  MATH  Google Scholar 

  29. Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: T.K. Leen, T.G. Dietterich, V. Tresp (eds.) Advances in Neural Information Processing Systems, vol. 13, pp. 556–562 (2001)

  30. Lin, C.J.: On the convergence of multiplicative update algorithms for nonnegative matrix factorization. IEEE Trans. Neural Netw. 18(6), 1589–1596 (2007)

    Article  Google Scholar 

  31. Lin, C.J.: Projected gradient methods for non-negative matrix factorization. Neural Comput. 19(10), 2756–2779 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  32. Paatero, P., Tapper, U.: Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5(2), 111–126 (1994)

    Article  Google Scholar 

  33. Panagakis, Y., Kotropoulos, C., Arce, G.R.: Non-negative multilinear principal component analysis of auditory temporal modulations for music genre classification. IEEE Trans. Audio Speech Lang. Process. 18(3), 576–588 (2010)

    Article  Google Scholar 

  34. Seki, M., Takahashi, N.: New updates based on Kullback-Leibler, gamma, and Rényi divergences for nonnegative matrix factorization. In: Proceedings of 2014 International Symposium on Nonlinear Theory and its Applications, pp. 48–51 (2014)

  35. Shahnaz, F., Berry, M.W., Pauca, V.P., Plemmons, R.J.: Document clustering using nonnegative matrix factorization. Inf. Process. Manag/ 42, 373–386 (2006)

    Article  MATH  Google Scholar 

  36. Takahashi, N., Hibi, R.: Global convergence of modified multiplicative updates for nonnegative matrix factorization. Comput. Optim. Appl. 57, 417–440 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  37. Takahashi, N., Katayama, J., Takeuchi, J.: A generalized sufficient condition for global convergence of modified multiplicative updates for NMF. In: Proceedings of 2014 International Symposium on Nonlinear Theory and its Applications, pp. 44–47 (2014)

  38. Takahashi, N., Nishi, T.: Global convergence of decomposition learning methods for support vector machines. IEEE Trans. Neural Netw. 17(6), 1362–1369 (2006)

    Article  Google Scholar 

  39. Vavasis, S.A.: On the complexity of nonnegative matrix factorization. SIAM J. Optim. 20(3), 1364–1377 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  40. Wang, R.S., Zhang, S., Wang, Y., Zhang, X.S., Chen, L.: Clustering complex networks and biological networks by nonnegative matrix factorization with various similarity measures. Neurocomputing 72, 134–141 (2008)

    Article  Google Scholar 

  41. Wang, Y.X., Zhang, Y.J.: Nonnegative matrix factorization: a comprehensive review. IEEE Trans. Knowl. Data Eng. 25(6), 1336–1353 (2013)

    Article  Google Scholar 

  42. Wu, C.F.J.: On the convergence properties of the EM algorithm. An. Stat. 11(1), 95–103 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  43. Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 267–273. ACM (2003)

  44. Yamauchi, S., Kawakita, M., Takeuchi, J.: Botnet detection based on non-negative matrix factorization and the MDL principle. In: Proceedings of 19th International Conference on Neural Information Processing, pp. 400–409. Springer (2012)

  45. Yang, Z., Oja, E.: Unified development of multiplicative algorithm for linear and quadratic nonnegative matrix factorization. IEEE Trans. Neural Netw. 22(12), 1878–1891 (2011)

    Article  Google Scholar 

  46. Zangwill, W.: Nonlinear Programming: A Unified Approach. Prentice-Hall, Englewood Cliffs (1969)

    MATH  Google Scholar 

  47. Zhao, R., Tan, V.Y.: A unified convergence analysis of the multiplicative update algorithm for nonnegative matrix factorization. arXiv preprint arXiv:1609.00951 (2016)

Download references

Acknowledgements

This work was partially supported by JSPS KAKENHI Grant Number JP15K00035.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Norikazu Takahashi.

Additional information

Part of this paper was presented in 2014 International Symposium on Nonlinear Theory and its Applications [34, 37].

Appendices

How to derive auxiliary functions

In the unified method of Yang and Oja [45], an auxiliary function is systematically derived from a given generalized polynomial by using three rules. They are described by the following lemmas. Because the mathematical expressions differ from those in [45] due to the introduction of the framework of a single auxiliary function, we provide proofs for the sake of readers’ convenience.

Lemma 8

Suppose that the error function is expressed as \(D(\varvec{W},\varvec{H})=a\big (\sum _{ij}b_{ij}(\varvec{WH})_{ij}^c\big )^d\) where a and c are nonzero constants, \(b_{ij}\) are positive constants, and d is a constant other than 0 or 1. If \(\xi (x) \triangleq ax^d\) is convex in \(\mathbb {R}_{++}\), let

$$\begin{aligned} \bar{D}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) = a \Biggl (\sum _{ij} b_{ij} (\varvec{\widetilde{W}\widetilde{H}})_{ij}^c \Biggr )^{d-1} \sum _{ij} b_{ij} (\varvec{\widetilde{W}\widetilde{H}})_{ij}^c \left( \frac{(\varvec{WH})_{ij}}{(\varvec{\widetilde{W}\widetilde{H}})_{ij}}\right) ^{cd} \,. \end{aligned}$$

If \(\xi (x)\) is concave in \(\mathbb {R}_{++}\), let

$$\begin{aligned} \bar{D}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})&=a\Biggl (\sum _{ij} b_{ij}(\varvec{\widetilde{W}\widetilde{H}})_{ij}^c\Biggr )^d +ad\Biggl (\sum _{ij} b_{ij}(\varvec{\widetilde{W}\widetilde{H}})_{ij}^c\Biggr )^{d-1} \\&\quad \times \Biggl (\sum _{ij}b_{ij}(\varvec{WH})_{ij}^c- \sum _{ij}b_{ij}(\varvec{\widetilde{W}\widetilde{H}})_{ij}^c\Biggr ) \,. \end{aligned}$$

Then \(\bar{D}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})\) is an auxiliary function of \(D(\varvec{W},\varvec{H})\), and satisfies the conditions in Assumptions 1 and 2.

Proof

There are two cases to consider: One is that \(\xi (x) \triangleq ax^d\) is convex, and the other is that \(\xi (x)\) is concave. In either case, it is easy to see that the following statements hold true:

  1. 1.

    \(\bar{D}(\hat{\varvec{W}},\hat{\varvec{H}},\hat{\varvec{W}},\hat{\varvec{H}}) =D(\hat{\varvec{W}},\hat{\varvec{H}})\) for all \((\hat{\varvec{W}},\hat{\varvec{H}}) \in \mathcal {F}_0\),

  2. 2.

    \(\bar{D}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})\) is differentiable at any point in \(\mathrm {int}\,\mathcal {F}_0 \times \mathrm {int}\,\mathcal {F}_0\), and

  3. 3.

    \(\nabla _{\varvec{W}}\bar{D}(\hat{\varvec{W}},\hat{\varvec{H}},\hat{\varvec{W}},\hat{\varvec{H}}) = \nabla _{\varvec{W}}D(\hat{\varvec{W}},\hat{\varvec{H}})\) and \(\nabla _{\varvec{H}}\bar{D}(\hat{\varvec{W}},\hat{\varvec{H}},\hat{\varvec{W}},\hat{\varvec{H}}) =\nabla _{\varvec{H}}D(\hat{\varvec{W}},\hat{\varvec{H}})\) for all \((\hat{\varvec{W}},\hat{\varvec{H}}) \in \mathrm {int}\,\mathcal {F}_0\).

Therefore, it suffices for us to show that

$$\begin{aligned} \bar{D}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) \ge D(\varvec{W},\varvec{H}) \end{aligned}$$

for all \((\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) \in \mathrm {int}\,\mathcal {F}_0 \times \mathrm {int}\,\mathcal {F}_0\). Suppose first that \(\xi (x)=ax^d\) is convex in \(\mathbb {R}_{++}\). Then, for any numbers \(x_{11},x_{12},\ldots ,x_{mn}\) and any positive numbers \(\lambda _{11}, \lambda _{12},\ldots ,\lambda _{mn}\) such that \(\sum _{ij}\lambda _{ij}=1\), it follows from Jensen’s inequality that

$$\begin{aligned} \xi \left( \sum _{ij}x_{ij}\right) = \xi \left( \sum _{ij}\lambda _{ij} \cdot \frac{x_{ij}}{\lambda _{ij}}\right) \le \sum _{ij} \lambda _{ij} \xi \left( \frac{x_{ij}}{\lambda _{ij}}\right) \, . \end{aligned}$$

Substituting \(x_{ij}=b_{ij}(\varvec{WH})_{ij}^c\) and \(\lambda _{ij}=b_{ij}(\varvec{\widetilde{W}\widetilde{H}})_{ij}^c/\sum _{pq} b_{pq}(\varvec{\widetilde{W}\widetilde{H}})_{pq}^c\) into this equation, we have

$$\begin{aligned} D(\varvec{W},\varvec{H})&= a\left( \sum _{ij}b_{ij}(\varvec{WH})_{ij}^c\right) ^d \nonumber \\&\le a\sum _{ij} \frac{b_{ij}(\varvec{\widetilde{W}\widetilde{H}})_{ij}^c}{\sum _{pq}b_{pq}(\varvec{\widetilde{W}\widetilde{H}})_{pq}^c} \left( \frac{b_{ij}(\varvec{WH})_{ij}^c}{b_{ij}\varvec{(\widetilde{W}\widetilde{H}})_{ij}^c/ \sum _{pq}b_{pq}(\varvec{\widetilde{W}\widetilde{H}})_{pq}^c}\right) ^d \nonumber \\&= a \left( \sum _{ij} b_{ij} (\varvec{\widetilde{W}\widetilde{H}})_{ij}^c \right) ^{d-1} \sum _{ij} b_{ij} (\varvec{\widetilde{W}\widetilde{H}})_{ij}^c \left( \frac{(\varvec{WH})_{ij}}{(\varvec{\widetilde{W}\widetilde{H}})_{ij}}\right) ^{cd} \nonumber \\&= \bar{D}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) \, . \end{aligned}$$

Suppose next that \(\xi (x)=ax^d\) is concave in \(\mathbb {R}_{++}\). Then, for any positive numbers x and \(\tilde{x}\), the following inequality holds:

$$\begin{aligned} \xi (x) \le \xi (\tilde{x})+\xi '(\tilde{x})(x-\tilde{x})\,. \end{aligned}$$

Substituting \(x=\sum _{ij}b_{ij}(\varvec{WH})_{ij}^c\) and \(\tilde{x}=\sum _{ij}b_{ij}(\varvec{\widetilde{W}\widetilde{H}})_{ij}^c\) into this inequality, we have

$$\begin{aligned} D(\varvec{W},\varvec{H})&= a\left( \sum _{ij} b_{ij}(\varvec{WH})_{ij}^c\right) ^d \nonumber \\&\le a\left( \sum _{ij} b_{ij}(\varvec{\widetilde{W}\widetilde{H}})_{ij}^c\right) ^d \nonumber \\&\quad +ad \left( \sum _{ij} b_{ij}(\varvec{\widetilde{W}\widetilde{H}})_{ij}^c\right) ^{d-1} \left( \sum _{ij}b_{ij}(\varvec{WH})_{ij}^c -\sum _{ij}b_{ij}(\varvec{\widetilde{W}\widetilde{H}})_{ij}^c \right) \nonumber \\&= \bar{D}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) \end{aligned}$$

which completes the proof. \(\square \)

Lemma 9

Suppose that the error function is expressed as \(D(\varvec{W},\varvec{H})=\sum _{ij}a_{ij}(\varvec{WH})_{ij}^b\) where \(a_{ij}\) are nonzero constants and b is a constant other than 0 or 1. If \(\xi _{ij}(x) \triangleq a_{ij}x^b\) is convex in \(\mathbb {R}_{++}\), let

$$\begin{aligned} \bar{D}_{ij}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) =a_{ij} (\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}^{b-1} \sum _k (\widetilde{W}_{ik}\widetilde{H}_{kj})^{1-b} (W_{ik}H_{kj})^b \,. \end{aligned}$$

If \(\xi _{ij}(x)\) is concave in \(\mathbb {R}_{++}\), let

$$\begin{aligned} \bar{D}_{ij}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) = a_{ij} (\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}^{b} +a_{ij}b (\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}^{b-1} \left( (\varvec{W}\varvec{H})_{ij} -(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij} \right) \,. \end{aligned}$$

Then \(\bar{D}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) =\sum _{ij}\bar{D}_{ij}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})\) is an auxiliary function of \(D(\varvec{W},\varvec{H})\), and satisfies the conditions in Assumptions 1 and 2.

Proof

Let \(D_{ij}(\varvec{W},\varvec{H})=a_{ij}(\varvec{W}\varvec{H})_{ij}^b\). There are two cases to consider: One is that \(\xi _{ij}(x)\triangleq a_{ij}x^b\) is convex and the other is that \(\xi _{ij}(x)\) is concave. In either case, we easily see that the following statements hold true:

  1. 1.

    \(\bar{D}_{ij}(\hat{\varvec{W}},\hat{\varvec{H}},\hat{\varvec{W}},\hat{\varvec{H}}) = D_{ij}(\hat{\varvec{W}},\hat{\varvec{H}})\) for all \((\hat{\varvec{W}},\hat{\varvec{H}}) \in \mathrm {int}\,\mathcal {F}_0\),

  2. 2.

    \(\bar{D}_{ij}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})\) is differentiable at any point in \(\mathrm {int}\,\mathcal {F}_0 \times \mathrm {int}\,\mathcal {F}_0\), and

  3. 3.

    \(\nabla _{\varvec{W}}\bar{D}_{ij}(\hat{\varvec{W}},\hat{\varvec{H}},\hat{\varvec{W}},\hat{\varvec{H}}) = \nabla _{\varvec{W}}D_{ij}(\hat{\varvec{W}},\hat{\varvec{H}})\) and \(\nabla _{\varvec{H}}\bar{D}_{ij}(\hat{\varvec{W}},\hat{\varvec{H}},\hat{\varvec{W}},\hat{\varvec{H}}) = \nabla _{\varvec{H}}D_{ij}(\hat{\varvec{W}},\hat{\varvec{H}})\) for all \((\hat{\varvec{W}},\hat{\varvec{H}}) \in \mathrm {int}\,\mathcal {F}_0\).

Therefore, it suffices for us to show that

$$\begin{aligned} \bar{D}_{ij}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) \ge D_{ij}(\varvec{W},\varvec{H}) \end{aligned}$$

for all \((\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) \in \mathrm {int}\,\mathcal {F}_0 \times \mathrm {int}\,\mathcal {F}_0\). Suppose first that \(\xi _{ij}(x)=a_{ij}x^b\) is convex in \(\mathbb {R}_{++}\). Then, for any numbers \(x_1,x_2,\ldots ,x_r\) and any positive numbers \(\lambda _1,\lambda _2,\ldots ,\lambda _r\) such that \(\sum _{k} \lambda _k=1\), it follows from Jensen’s inequality that

$$\begin{aligned} \xi _{ij}\left( \sum _k x_k\right) =\xi _{ij}\left( \sum _k \lambda _k \cdot \frac{x_k}{\lambda _k}\right) \le \sum _k \lambda _k \xi _{ij}\left( \frac{x_k}{\lambda _k} \right) \, . \end{aligned}$$

Substituting \(x_k=W_{ik}H_{kj}\) and \(\lambda _k=\widetilde{W}_{ik}\widetilde{H}_{kj} /(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}\) into this equation, we have

$$\begin{aligned} D_{ij}(\varvec{W},\varvec{H})&= a_{ij} (\varvec{W}\varvec{H})_{ij}^{b} \nonumber \\&\le \sum _k \frac{\widetilde{W}_{ik}\widetilde{H}_{kj}}{(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}} a_{ij} \left( \frac{W_{ik}H_{kj}}{\widetilde{W}_{ik}\widetilde{H}_{kj} /(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}}\right) ^b \nonumber \\&= a_{ij} (\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}^{b-1} \sum _k (\widetilde{W}_{ik}\widetilde{H}_{kj})^{1-b}(W_{ik}H_{kj})^b \nonumber \\&= \bar{D}_{ij}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) \,. \end{aligned}$$

Suppose next that \(\xi _{ij}(x)=a_{ij}x^b\) is concave in \(\mathbb {R}_{++}\). Then, for any positive numbers x and \(\tilde{x}\), the following inequality holds:

$$\begin{aligned} \xi _{ij}(x) \le \xi _{ij}(\tilde{x})+\xi _{ij}'(\tilde{x})(x-\tilde{x})\,. \end{aligned}$$

Substituting \(x=(\varvec{W}\varvec{H})_{ij}\) and \(\tilde{x}=(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}\) into this inequality, we have

$$\begin{aligned} D_{ij}(\varvec{W},\varvec{H})&= a_{ij}(\varvec{W}\varvec{H})_{ij}^b \nonumber \\&\le a_{ij}(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}^b +a_{ij}b(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}^{b-1} \left( (\varvec{W}\varvec{H})_{ij} -(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}\right) \nonumber \\&= \bar{D}_{ij}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) \end{aligned}$$

which completes the proof. \(\square \)

Lemma 10

Suppose that the error function is expressed as \(D(\varvec{W},\varvec{H})=\sum _{tijk} a_{tijk} (W_{ik}H_{kj})^{b_t}\) where \(a_{tijk}\) are nonzero constants and \(b_t\) are constants, \(a_{tijk}x^{b_t}\) is convex in \(\mathbb {R}_{++}\), and \(\{b_t\}\) contains at least two distinct nonzero numbers. Let \(b_{\max }=\max \{b_t\,|\,b_t \ne 0\}\) and \(b_{\min }=\min \{b_t\,|\,b_t \ne 0\}\). Let us define \(\bar{D}_{tijk}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})\) on \(\mathrm {int}\,\mathcal {F}_0 \times \mathrm {int}\,\mathcal {F}_0\) as follows:

  1. 1.

    If \(b_t \in \{b_{\min }, b_{\max },0\}\), let

    $$\begin{aligned} \bar{D}_{tijk}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})=a_{tijk} (W_{ik}H_{kj})^{b_t} \,. \end{aligned}$$
  2. 2.

    If \(b_t \not \in \{b_{\min }, b_{\max },0\}\) and

    1. (a)

      if \((b_t>1) \vee ((b_t=1) \wedge (a_{tijk}>0))\), let

      $$\begin{aligned} \bar{D}_{tijk}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})&= \frac{a_{tijk} b_t}{b_{\max }} (\widetilde{W}_{ik}\widetilde{H}_{kj})^{b_t-b_{\max }} \left( W_{ik}H_{kj}\right) ^{b_{\max }} \\&\quad +a_{tijk} (\widetilde{W}_{ik}\widetilde{H}_{kj})^{b_t} \left( 1-\frac{b_t}{b_{\max }}\right) \, , \end{aligned}$$
    2. (b)

      if \((b_t<1) \vee ((b_t=1) \wedge (a_{tijk}<0))\), let

      $$\begin{aligned} \bar{D}_{tijk}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})&= \frac{a_{tijk} b_t}{b_{\min }} (\widetilde{W}_{ik}\widetilde{H}_{kj})^{b_t-b_{\min }} \left( W_{ik}H_{kj}\right) ^{b_{\min }} \\&\quad +a_{tijk} (\widetilde{W}_{ik}\widetilde{H}_{kj})^{b_t} \left( 1-\frac{b_t}{b_{\min }}\right) \, . \end{aligned}$$

Then \(\bar{D}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) =\sum _{tijk} \bar{D}_{tijk}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})\) is an auxiliary function of \(D(\varvec{W},\varvec{H})\), and strictly convex in \(\mathrm {int}\,\mathcal {F}_0\). Furthermore, \(\bar{D}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})\) satisfies the conditions in Assumptions 1 and 2.

Proof

Let \(D_{tijk}(\varvec{W},\varvec{H})=a_{tijk}(W_{ik}H_{kj})^{b_t}\). There are three cases to consider depending on the values of \(a_{tijk}\) and \(b_t\). In either case, we easily see that \(\bar{D}_{tijk}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})\) is differentiable at any point in \(\mathrm {int}\,\mathcal {F}_0 \times \mathrm {int}\,\mathcal {F}_0\) and that

$$\begin{aligned} \nabla _{\varvec{W}}\bar{D}_{tijk}(\hat{\varvec{W}},\hat{\varvec{H}},\hat{\varvec{W}},\hat{\varvec{H}})&= \nabla _{\varvec{W}}D_{tijk}(\hat{\varvec{W}},\hat{\varvec{H}})\,, \\ \nabla _{\varvec{H}}\bar{D}_{tijk}(\hat{\varvec{W}},\hat{\varvec{H}},\hat{\varvec{W}},\hat{\varvec{H}})&= \nabla _{\varvec{H}}D_{tijk}(\hat{\varvec{W}},\hat{\varvec{H}}) \end{aligned}$$

hold for all \((\hat{\varvec{W}},\hat{\varvec{H}}) \in \mathcal {F}_0\). Also, it has already been shown by Yang and Oja [45, Lemma 2] that \(\bar{D}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})\) is an auxiliary function of \(D(\varvec{W},\varvec{H})\). \(\square \)

Derivation of auxiliary function for Kullback–Leibler divergence with regularization term

We derive an auxiliary function for Kullback–Leibler divergence with the regularization term by using the unified method of Yang and Oja. First of all, we rewrite the error function by using (33) as follows:

$$\begin{aligned} D(\varvec{W},\varvec{H})&=\lim _{\mu \rightarrow 0^{+}} \frac{1}{\mu } \Bigl ( D_1(\varvec{W},\varvec{H})+D_2(\varvec{W},\varvec{H}) +D_3(\varvec{W},\varvec{H})+D_4(\varvec{W},\varvec{H}) \Bigr ) \\&\qquad +\sum _{ij}\frac{X_{ij}}{\sum _{pq}X_{pq}} \ln \left( \frac{X_{ij}}{\sum _{pq}X_{pq}}\right) +\frac{C}{2}\Biggl (\sum _{ij}X_{ij}\Biggr )^2 \end{aligned}$$

where

$$\begin{aligned} D_1(\varvec{W},\varvec{H})&= \Biggl (\sum _{ij}(\varvec{W}\varvec{H})_{ij}\Biggr )^{\mu }, \\ D_2(\varvec{W},\varvec{H})&= -\sum _{ij}\frac{X_{ij}}{\sum _{pq}X_{pq}}(\varvec{W}\varvec{H})_{ij}^{\mu }, \\ D_3(\varvec{W},\varvec{H})&= -\mu C \sum _{ij}X_{ij} \cdot \sum _{ij}(\varvec{W}\varvec{H})_{ij}, \\ D_4(\varvec{W},\varvec{H})&= \frac{\mu C}{2} \Biggl (\sum _{ij}(\varvec{W}\varvec{H})_{ij}\Biggr )^2. \end{aligned}$$

Let us assume that \(\mu \) is a sufficiently small positive constant. Applying Lemmas 8 and 9 to these functions, we have the following auxiliary functions:

$$\begin{aligned} \overline{D_1}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})&= \mu \Biggl (\sum _{ij}(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}\Biggr )^{\mu -1} \sum _{ijk}W_{ik}H_{kj} +(1-\mu ) \Biggl (\sum _{ij}(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}\Biggr )^{\mu }, \\ \overline{D_2}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})&=-\sum _{ij}\frac{X_{ij}}{\sum _{pq}X_{pq}}(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}^{\mu -1} \sum _k (\widetilde{W}_{ik}\widetilde{H}_{kj})^{1-\mu } (W_{ik}H_{kj})^{\mu }, \\ \overline{D_3}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})&=-\mu C \sum _{ij}X_{ij} \cdot \sum _{ijk}W_{ik}H_{kj}, \\ \overline{D_4}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})&= \frac{\mu C}{2} \sum _{ij}(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij} \cdot \sum _{ijk}\left( \widetilde{W}_{ik}\widetilde{H}_{kj}\right) ^{-1} (W_{ik}H_{kj})^2. \end{aligned}$$

The exponents of \(W_{ik}H_{kj}\) in these auxiliary functions are 1, \(\mu \), 1 and 2. The minimum is \(\mu \) and the maximum is 2. So we apply Lemma 10 to some of these auxiliary functions to obtain an auxiliary function of \(D(\varvec{W},\varvec{H})\) such that the exponents of \(W_{ik}H_{kj}\) are restricted to \(\mu \) and 2. Applying Lemma 10 to \(\overline{D_1}\), we obtain another auxiliary function of \(D_1\) as follows:

$$\begin{aligned} \overline{\overline{D_1}}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})&= \frac{\mu }{2} \Biggl (\sum _{ij}(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}\Biggr )^{\mu -1} \sum _{ijk} (\widetilde{W}_{ik}\widetilde{H}_{kj})^{-1} (W_{ik}H_{kj})^2 \\&\quad +\left( 1-\frac{\mu }{2}\right) \Biggl (\sum _{ij}(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}\Biggr )^{\mu }. \end{aligned}$$

Applying Lemma 10 to \(\overline{D_3}\), we obtain another auxiliary function of \(D_3\) as follows:

$$\begin{aligned}&\overline{\overline{D_3}}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) \\&\quad = -C \sum _{ij}X_{ij} \cdot \sum _{ijk} (\widetilde{W}_{ik}\widetilde{H}_{kj})^{1-\mu } (W_{ik}H_{kj})^{\mu } +(1-\mu )C \sum _{ij}X_{ij} \cdot \sum _{ij}(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij} . \end{aligned}$$

As a result, we have the following auxiliary function:

$$\begin{aligned} \bar{D}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})&= \lim _{\mu \rightarrow 0^{+}} \frac{1}{\mu } \Biggl (\overline{\overline{D_1}}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) +\overline{D_2}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})\\&\qquad +\overline{\overline{D_3}}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) +\overline{D_4}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) \Biggr ) \\&\qquad +\sum _{ij}\frac{X_{ij}}{\sum _{pq}X_{pq}} \ln \left( \frac{X_{ij}}{\sum _{pq}X_{pq}}\right) +\frac{C}{2}\Biggl (\sum _{ij}X_{ij}\Biggr )^2. \end{aligned}$$

Because

$$\begin{aligned}&\lim _{\mu \rightarrow 0^{+}} \Biggl ( \overline{\overline{D_1}}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) +\overline{D_2}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) \\&\quad +\, \,\overline{\overline{D_3}}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) +\overline{D_4}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) \Biggr )=0, \end{aligned}$$

we apply L’Hôpital’s rule. Then we have

$$\begin{aligned} \bar{D}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})&= \frac{1}{2} \Biggl (\sum _{ij}(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}\Biggr )^{-1} \sum _{ijk} (\widetilde{W}_{ik}\widetilde{H}_{kj})^{-1} (W_{ik}H_{kj})^2 \\&\quad -\frac{1}{2}+\ln \Biggl ( \sum _{ij}(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij} \Biggr ) \\&\quad -\sum _{ij}\frac{X_{ij}}{\sum _{pq}X_{pq}}(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}^{-1} \ln \left( (\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}\right) \sum _k \widetilde{W}_{ik}\widetilde{H}_{kj} \\&\quad -\sum _{ij}\frac{X_{ij}}{\sum _{pq}X_{pq}}(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}^{-1} \sum _k \widetilde{W}_{ik}\widetilde{H}_{kj} \ln \left( \frac{W_{ik}H_{kj}}{\widetilde{W}_{ik}\widetilde{H}_{kj}}\right) \\&\quad -C \sum _{ij}X_{ij} \cdot \sum _{ijk}\widetilde{W}_{ik} \widetilde{H}_{kj} \ln \left( \frac{W_{ik}H_{kj}}{\widetilde{W}_{ik}\widetilde{H}_{kj}}\right) \\&\quad -C \sum _{ij}X_{ij} \cdot \sum _{ij} (\widetilde{\varvec{W}} \widetilde{\varvec{H}})_{ij} \\&\quad +\frac{C}{2} \sum _{ij}(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij} \cdot \sum _{ijk}(\widetilde{W}_{ik}\widetilde{H}_{kj})^{-1} (W_{ik}H_{kj})^2 \\&\quad +\sum _{ij}\frac{X_{ij}}{\sum _{pq}X_{pq}} \ln \left( \frac{X_{ij}}{\sum _{pq}X_{pq}}\right) +\frac{C}{2}\Biggl (\sum _{ij}X_{ij}\Biggr )^2 \end{aligned}$$

which can be rewritten in the form of (18) with

$$\begin{aligned} \bar{D}_{ijk}^1(W_{ik},H_{kj},\widetilde{\varvec{W}},\widetilde{\varvec{H}})&= \frac{1}{2} \Biggl (\sum _{pq} (\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{pq}\Biggr )^{-1} (\widetilde{W}_{ik}\widetilde{H}_{kj})^{-1} (W_{ik}H_{kj})^2 \\&\quad -\frac{X_{ij}}{\sum _{pq}X_{pq}}(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}^{-1} \widetilde{W}_{ik}\widetilde{H}_{kj} \ln (W_{ik}H_{ij}) \\&\quad -C \sum _{pq}X_{pq} \cdot \widetilde{W}_{ik} \widetilde{H}_{kj} \ln (W_{ik}H_{kj}) \\&\quad +\frac{C}{2} \sum _{pq}(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{pq} \cdot (\widetilde{W}_{ik}\widetilde{H}_{kj})^{-1} (W_{ik}H_{kj})^2 \,. \end{aligned}$$

Derivation of upper bound for multiplicative update rule obtained from Kullback–Leibler divergence with regularization term

For the first multiplicative update rule shown in Table 3, which is obtained from Kullback–Leibler divergence with the regularization term, we derive an upper bound for \(f_{ik}(\varvec{W},\varvec{H})\) on \(\mathcal {F}_{\epsilon }\). By simple mathematical manipulations, we have the following inequalities:

$$\begin{aligned} f_{ik}(\varvec{W},\varvec{H})&< W_{ik} \left( \frac{\sum _j \frac{X_{ij}}{\sum _{pq}X_{pq}} (\varvec{WH})_{ij}^{-1}H_{kj}+C\sum _{pq}X_{pq}\sum _j H_{kj}}{C \sum _{pq}(\varvec{WH})_{pq}\sum _jH_{kj}}\right) ^{\frac{1}{2}} \\&\le W_{ik} \left( \frac{\sum _j \frac{X_{ij}}{\sum _{pq}X_{pq}}(\varvec{WH})_{ij}^{-1}H_{kj}}{C \sum _{pq}(\varvec{WH})_{pq}\sum _jH_{kj}} +\frac{\sum _{pq}X_{pq}}{\sum _{pq}(\varvec{WH})_{pq}} \right) ^{\frac{1}{2}}. \end{aligned}$$

Here note that

$$\begin{aligned} \frac{\sum _j \frac{X_{ij}}{\sum _{pq}X_{pq}}(\varvec{WH})_{ij}^{-1}H_{kj}}{\sum _jH_{kj}}&= \sum _j \frac{X_{ij}}{\sum _{pq}X_{pq}}(\varvec{WH})_{ij}^{-1} \frac{H_{kj}}{\sum _qH_{kq}} \\&< \frac{1}{\epsilon ^2 r} \frac{\sum _j X_{ij}}{\sum _{pq}X_{pq}}\\&< \frac{1}{\epsilon ^2 r} \end{aligned}$$

and

$$\begin{aligned} \frac{1}{\sum _{pq}(\varvec{W}\varvec{H})_{pq}} < \frac{1}{\sum _{q} W_{ik}H_{kq}} = \frac{1}{W_{ik}\sum _q H_{kq}} \le \frac{1}{\epsilon n W_{ik}}. \end{aligned}$$

Therefore we have

$$\begin{aligned} f_{ik}(\varvec{W},\varvec{H}) \le W_{ik}^{\frac{1}{2}} \Biggl ( \frac{1}{\epsilon ^3nrC} +\frac{1}{\epsilon n}\sum _{pq}X_{pq} \Biggr )^{\frac{1}{2}}\,. \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Takahashi, N., Katayama, J., Seki, M. et al. A unified global convergence analysis of multiplicative update rules for nonnegative matrix factorization. Comput Optim Appl 71, 221–250 (2018). https://doi.org/10.1007/s10589-018-9997-y

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-018-9997-y

Keywords

Mathematics Subject Classification

Navigation