A unified global convergence analysis of multiplicative update rules for nonnegative matrix factorization

Takahashi, Norikazu; Katayama, Jiro; Seki, Masato; Takeuchi, Jun’ichi

doi:10.1007/s10589-018-9997-y

A unified global convergence analysis of multiplicative update rules for nonnegative matrix factorization

Published: 15 March 2018

Volume 71, pages 221–250, (2018)
Cite this article

Computational Optimization and Applications Aims and scope Submit manuscript

Norikazu Takahashi ORCID: orcid.org/0000-0001-8222-5593¹,
Jiro Katayama²,
Masato Seki¹ &
…
Jun’ichi Takeuchi²

981 Accesses
10 Citations
Explore all metrics

Abstract

Multiplicative update rules are a well-known computational method for nonnegative matrix factorization. Depending on the error measure between two matrices, various types of multiplicative update rules have been proposed so far. However, their convergence properties are not fully understood. This paper provides a sufficient condition for a general multiplicative update rule to have the global convergence property in the sense that any sequence of solutions has at least one convergent subsequence and the limit of any convergent subsequence is a stationary point of the optimization problem. Using this condition, it is proved that many of the existing multiplicative update rules have the global convergence property if they are modified slightly so that all variables take positive values. This paper also proposes new multiplicative update rules based on Kullback–Leibler, Gamma, and Rényi divergences. It is shown that these three rules have the global convergence property if the same modification as above is made.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Novel Newton-Type Algorithm for Nonnegative Matrix Factorization with Alpha-Divergence

A novel update rule of HALS algorithm for nonnegative matrix factorization and Zangwill’s global convergence

Article Open access 30 April 2022

Takehiro Sano, Tsuyoshi Migita & Norikazu Takahashi

Algorithms for Nonnegative Matrix Factorization with the Kullback–Leibler Divergence

Article 08 May 2021

Le Thi Khanh Hien & Nicolas Gillis

Notes

The error function based on Kullback–Leibler divergence in Table 1 is slightly different from the one in [45]. Instead of assuming that $\sum _{ij} X_{ij}=1$, $X_{ij}$ has been replaced with $X_{ij}/\sum _{pq}X_{pq}$ so that the result can be applied to a general nonnegative matrix $\varvec{X}$.
In the case of Kullback–Leibler divergence, $X_{ij}$ in $f_{ik}(\varvec{W},\varvec{H})$ must be replaced with $X_{ij}/\sum _{pq}X_{pq}$.

References

Badeau, R., Bertin, N., Vincent, E.: Stability analysis of multiplicative update algorithms and application to nonnegative matrix factorization. IEEE Trans. Neural Netw. 21(12), 1869–1881 (2010)
Article Google Scholar
Berman, A., Plemmons, R.: Nonnegative Matrices in the Mathematical Sciences. Academic Press, New York (1979)
MATH Google Scholar
Berry, M.W., Browne, M.: Email surveillance using non-negative matrix factorization. Comput. Math. Organ. Theory 11, 249–264 (2005)
Article MATH Google Scholar
Campbell, S.L., Poole, G.D.: Computing nonnegative rank factorizations. Linear Algebra Appl. 35, 175–182 (1981)
Article MathSciNet MATH Google Scholar
Chen, J.C.: The nonnegative rank factorizations of nonnegative matrices. Linear Algebra Appl. 62, 207–217 (1984)
Article MathSciNet MATH Google Scholar
Chi, E.C., Kolda, T.G.: On tensors, sparsity, and nonnegative factorizations. SIAM J. Matrix Anal. Appl. 33(4), 1272–1299 (2012)
Article MathSciNet MATH Google Scholar
Cichocki, A., Lee, H., Kim, Y.D., Choi, S.: Non-negative matrix factorization with $\alpha $-divergence. Pattern Recognit. Lett. 29(9), 1433–1440 (2008)
Article Google Scholar
Cichocki, A., Phan, A.H.: Fast local algorithms for large scale nonnegative matrix and tensor factorization. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. E92–A(3), 708–721 (2009)
Article Google Scholar
Cichocki, A., Zdunek, R., Amari, S.I.: Hierarchical ALS algorithms for nonnegative matrix and 3D tensor factorization. In: Lecture Notes in Computer Science, vol. 4666, pp. 169–176. Springer (2007)
Cichocki, A., Zdunek, R., Phan, A.H., Amari, S.I.: Nonnegative Matrix and Tensor Factorizations. Wiley, West Sussex (2009)
Book Google Scholar
Dhillon, I.S., Sra, S.: Generalized nonnegative matrix approximations with Bregman divergences. In: Advances in Neural Information Processing Systems, pp. 283–290 (2005)
Févotte, C., Bertin, N., Durrieu, J.L.: Nonnegative matrix factorization with the Itakura–Saito divergence: with application to music analysis. Neural Comput. 21(3), 793–830 (2009)
Article MATH Google Scholar
Févotte, C., Idier, J.: Algorithms for nonnegative matrix factorization with the $\beta $-divergence. Neural Comput. 23(9), 2421–2456 (2011)
Article MathSciNet MATH Google Scholar
Finesso, L., Spreij, P.: Nonnegative matrix factorization and I-divergence alternating minimization. Linear Algebra Appl. 416, 270–287 (2006)
Article MathSciNet MATH Google Scholar
Gillis, N., Glineur, F.: Nonnegative factorization and the maximum edge biclique problem. arXiv e-prints (2008)
Gonzalez, E.F., Zhang, Y.: Accelerating the Lee-Seung algorithm for non-negative matrix factorization. Dept. Comput. & Appl. Math., Rice Univ., Houston, TX, Tech. Rep. TR-05-02 (2005)
Guan, N., Tao, D., Luo, Z., Yuan, B.: NeNMF: an optimal gradient method for nonnegative matrix factorization. IEEE Trans. Signal Process. 60(6), 2882–2898 (2012)
Article MathSciNet MATH Google Scholar
Guillamet, D., Vitria, J.: Non-negative matrix factorization for face recognition. In: Lecture Notes in Artificial Intelligence, pp. 336–344. Springer (2002)
Hansen, S., Plantenga, T., Kolda, T.G.: Newton-based optimization for Kullback–Leibler nonnegative tensor factorizations. Optim. Methods Softw. 30(5), 1002–1029 (2015)
Article MathSciNet MATH Google Scholar
Holzapfel, A., Stylianou, Y.: Musical genre classification using nonnegative matrix factorization-based features. IEEE Trans. Audio Speech Lang. Process. 16(2), 424–434 (2008)
Article Google Scholar
Hsieh, C.J., Dhillon, I.S.: Fast coordinate descent methods with variable selection for non-negative matrix factorization. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1064–1072. ACM (2011)
Katayama, J., Takahashi, N., Takeuchi, J.: Boundedness of modified multiplicative updates for nonnegative matrix factorization. In: Proceedings of the Fifth International Workshop on Computational Advances in Multi-Sensor Adaptive Processing, pp. 252–255 (2013)
Kim, D., Sra, S., Dhillon, I.S.: Fast newton-type methods for the least squares nonnegative matrix approximation problem. In: Proceedings of the Sixth SIAM International Conference on Data Mining, pp. 343–354. SIAM (2007)
Kim, H., Park, H.: Nonnegative matrix factorization based on alternating nonnegativity constrained least squares and active set method. SIAM J. Matrix Anal. Appl. 30(2), 713–730 (2008)
Article MathSciNet MATH Google Scholar
Kim, J., He, Y., Park, H.: Algorithms for nonnegative matrix and tensor factorization: a unified view based on block coordinate descent framework. J. Global Optim. 58(2), 285–319 (2014)
Article MathSciNet MATH Google Scholar
Kimura, T., Takahashi, N.: Global convergence of a modified HALS algorithm for nonnegative matrix factorization. In: Proceedings of 2015 IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing, pp. 21–24 (2015)
Kompass, R.: A generalized divergence measure for nonnegative matrix factorization. Neural Comput. 19(3), 780–791 (2007)
Article MathSciNet MATH Google Scholar
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–792 (1999)
Article MATH Google Scholar
Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: T.K. Leen, T.G. Dietterich, V. Tresp (eds.) Advances in Neural Information Processing Systems, vol. 13, pp. 556–562 (2001)
Lin, C.J.: On the convergence of multiplicative update algorithms for nonnegative matrix factorization. IEEE Trans. Neural Netw. 18(6), 1589–1596 (2007)
Article Google Scholar
Lin, C.J.: Projected gradient methods for non-negative matrix factorization. Neural Comput. 19(10), 2756–2779 (2007)
Article MathSciNet MATH Google Scholar
Paatero, P., Tapper, U.: Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5(2), 111–126 (1994)
Article Google Scholar
Panagakis, Y., Kotropoulos, C., Arce, G.R.: Non-negative multilinear principal component analysis of auditory temporal modulations for music genre classification. IEEE Trans. Audio Speech Lang. Process. 18(3), 576–588 (2010)
Article Google Scholar
Seki, M., Takahashi, N.: New updates based on Kullback-Leibler, gamma, and Rényi divergences for nonnegative matrix factorization. In: Proceedings of 2014 International Symposium on Nonlinear Theory and its Applications, pp. 48–51 (2014)
Shahnaz, F., Berry, M.W., Pauca, V.P., Plemmons, R.J.: Document clustering using nonnegative matrix factorization. Inf. Process. Manag/ 42, 373–386 (2006)
Article MATH Google Scholar
Takahashi, N., Hibi, R.: Global convergence of modified multiplicative updates for nonnegative matrix factorization. Comput. Optim. Appl. 57, 417–440 (2014)
Article MathSciNet MATH Google Scholar
Takahashi, N., Katayama, J., Takeuchi, J.: A generalized sufficient condition for global convergence of modified multiplicative updates for NMF. In: Proceedings of 2014 International Symposium on Nonlinear Theory and its Applications, pp. 44–47 (2014)
Takahashi, N., Nishi, T.: Global convergence of decomposition learning methods for support vector machines. IEEE Trans. Neural Netw. 17(6), 1362–1369 (2006)
Article Google Scholar
Vavasis, S.A.: On the complexity of nonnegative matrix factorization. SIAM J. Optim. 20(3), 1364–1377 (2009)
Article MathSciNet MATH Google Scholar
Wang, R.S., Zhang, S., Wang, Y., Zhang, X.S., Chen, L.: Clustering complex networks and biological networks by nonnegative matrix factorization with various similarity measures. Neurocomputing 72, 134–141 (2008)
Article Google Scholar
Wang, Y.X., Zhang, Y.J.: Nonnegative matrix factorization: a comprehensive review. IEEE Trans. Knowl. Data Eng. 25(6), 1336–1353 (2013)
Article Google Scholar
Wu, C.F.J.: On the convergence properties of the EM algorithm. An. Stat. 11(1), 95–103 (1983)
Article MathSciNet MATH Google Scholar
Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 267–273. ACM (2003)
Yamauchi, S., Kawakita, M., Takeuchi, J.: Botnet detection based on non-negative matrix factorization and the MDL principle. In: Proceedings of 19th International Conference on Neural Information Processing, pp. 400–409. Springer (2012)
Yang, Z., Oja, E.: Unified development of multiplicative algorithm for linear and quadratic nonnegative matrix factorization. IEEE Trans. Neural Netw. 22(12), 1878–1891 (2011)
Article Google Scholar
Zangwill, W.: Nonlinear Programming: A Unified Approach. Prentice-Hall, Englewood Cliffs (1969)
MATH Google Scholar
Zhao, R., Tan, V.Y.: A unified convergence analysis of the multiplicative update algorithm for nonnegative matrix factorization. arXiv preprint arXiv:1609.00951 (2016)

Download references

Acknowledgements

This work was partially supported by JSPS KAKENHI Grant Number JP15K00035.

Author information

Authors and Affiliations

Graduate School of Natural Science and Technology, Okayama University, 3–1–1 Tsushima-naka, Kita-ku, Okayama, 700–8530, Japan
Norikazu Takahashi & Masato Seki
Department of Informatics, Kyushu University, 744 Motooka, Nishi-ku, Fukuoka, 819–0395, Japan
Jiro Katayama & Jun’ichi Takeuchi

Authors

Norikazu Takahashi
View author publications
You can also search for this author in PubMed Google Scholar
Jiro Katayama
View author publications
You can also search for this author in PubMed Google Scholar
Masato Seki
View author publications
You can also search for this author in PubMed Google Scholar
Jun’ichi Takeuchi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Norikazu Takahashi.

Additional information

Part of this paper was presented in 2014 International Symposium on Nonlinear Theory and its Applications [34, 37].

Appendices

How to derive auxiliary functions

In the unified method of Yang and Oja [45], an auxiliary function is systematically derived from a given generalized polynomial by using three rules. They are described by the following lemmas. Because the mathematical expressions differ from those in [45] due to the introduction of the framework of a single auxiliary function, we provide proofs for the sake of readers’ convenience.

Lemma 8

Suppose that the error function is expressed as $D(\varvec{W},\varvec{H})=a\big (\sum _{ij}b_{ij}(\varvec{WH})_{ij}^c\big )^d$ where a and c are nonzero constants, $b_{ij}$ are positive constants, and d is a constant other than 0 or 1. If $\xi (x) \triangleq ax^d$ is convex in $\mathbb {R}_{++}$, let

$$\begin{aligned} \bar{D}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) = a \Biggl (\sum _{ij} b_{ij} (\varvec{\widetilde{W}\widetilde{H}})_{ij}^c \Biggr )^{d-1} \sum _{ij} b_{ij} (\varvec{\widetilde{W}\widetilde{H}})_{ij}^c \left( \frac{(\varvec{WH})_{ij}}{(\varvec{\widetilde{W}\widetilde{H}})_{ij}}\right) ^{cd} \,. \end{aligned}$$

If $\xi (x)$ is concave in $\mathbb {R}_{++}$, let

$$\begin{aligned} \bar{D}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})&=a\Biggl (\sum _{ij} b_{ij}(\varvec{\widetilde{W}\widetilde{H}})_{ij}^c\Biggr )^d +ad\Biggl (\sum _{ij} b_{ij}(\varvec{\widetilde{W}\widetilde{H}})_{ij}^c\Biggr )^{d-1} \\&\quad \times \Biggl (\sum _{ij}b_{ij}(\varvec{WH})_{ij}^c- \sum _{ij}b_{ij}(\varvec{\widetilde{W}\widetilde{H}})_{ij}^c\Biggr ) \,. \end{aligned}$$

Then $\bar{D}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})$ is an auxiliary function of $D(\varvec{W},\varvec{H})$, and satisfies the conditions in Assumptions 1 and 2.

Proof

There are two cases to consider: One is that $\xi (x) \triangleq ax^d$ is convex, and the other is that $\xi (x)$ is concave. In either case, it is easy to see that the following statements hold true:

1.
$\bar{D}(\hat{\varvec{W}},\hat{\varvec{H}},\hat{\varvec{W}},\hat{\varvec{H}}) =D(\hat{\varvec{W}},\hat{\varvec{H}})$ for all $(\hat{\varvec{W}},\hat{\varvec{H}}) \in \mathcal {F}_0$,
2.
$\bar{D}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})$ is differentiable at any point in $\mathrm {int}\,\mathcal {F}_0 \times \mathrm {int}\,\mathcal {F}_0$, and
3.
$\nabla _{\varvec{W}}\bar{D}(\hat{\varvec{W}},\hat{\varvec{H}},\hat{\varvec{W}},\hat{\varvec{H}}) = \nabla _{\varvec{W}}D(\hat{\varvec{W}},\hat{\varvec{H}})$ and $\nabla _{\varvec{H}}\bar{D}(\hat{\varvec{W}},\hat{\varvec{H}},\hat{\varvec{W}},\hat{\varvec{H}}) =\nabla _{\varvec{H}}D(\hat{\varvec{W}},\hat{\varvec{H}})$ for all $(\hat{\varvec{W}},\hat{\varvec{H}}) \in \mathrm {int}\,\mathcal {F}_0$.

Therefore, it suffices for us to show that

$$\begin{aligned} \bar{D}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) \ge D(\varvec{W},\varvec{H}) \end{aligned}$$

for all $(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) \in \mathrm {int}\,\mathcal {F}_0 \times \mathrm {int}\,\mathcal {F}_0$. Suppose first that $\xi (x)=ax^d$ is convex in $\mathbb {R}_{++}$. Then, for any numbers $x_{11},x_{12},\ldots ,x_{mn}$ and any positive numbers $\lambda _{11}, \lambda _{12},\ldots ,\lambda _{mn}$ such that $\sum _{ij}\lambda _{ij}=1$, it follows from Jensen’s inequality that

$$\begin{aligned} \xi \left( \sum _{ij}x_{ij}\right) = \xi \left( \sum _{ij}\lambda _{ij} \cdot \frac{x_{ij}}{\lambda _{ij}}\right) \le \sum _{ij} \lambda _{ij} \xi \left( \frac{x_{ij}}{\lambda _{ij}}\right) \, . \end{aligned}$$

Substituting $x_{ij}=b_{ij}(\varvec{WH})_{ij}^c$ and $\lambda _{ij}=b_{ij}(\varvec{\widetilde{W}\widetilde{H}})_{ij}^c/\sum _{pq} b_{pq}(\varvec{\widetilde{W}\widetilde{H}})_{pq}^c$ into this equation, we have

$$\begin{aligned} D(\varvec{W},\varvec{H})&= a\left( \sum _{ij}b_{ij}(\varvec{WH})_{ij}^c\right) ^d \nonumber \\&\le a\sum _{ij} \frac{b_{ij}(\varvec{\widetilde{W}\widetilde{H}})_{ij}^c}{\sum _{pq}b_{pq}(\varvec{\widetilde{W}\widetilde{H}})_{pq}^c} \left( \frac{b_{ij}(\varvec{WH})_{ij}^c}{b_{ij}\varvec{(\widetilde{W}\widetilde{H}})_{ij}^c/ \sum _{pq}b_{pq}(\varvec{\widetilde{W}\widetilde{H}})_{pq}^c}\right) ^d \nonumber \\&= a \left( \sum _{ij} b_{ij} (\varvec{\widetilde{W}\widetilde{H}})_{ij}^c \right) ^{d-1} \sum _{ij} b_{ij} (\varvec{\widetilde{W}\widetilde{H}})_{ij}^c \left( \frac{(\varvec{WH})_{ij}}{(\varvec{\widetilde{W}\widetilde{H}})_{ij}}\right) ^{cd} \nonumber \\&= \bar{D}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) \, . \end{aligned}$$

Suppose next that $\xi (x)=ax^d$ is concave in $\mathbb {R}_{++}$. Then, for any positive numbers x and $\tilde{x}$, the following inequality holds:

$$\begin{aligned} \xi (x) \le \xi (\tilde{x})+\xi '(\tilde{x})(x-\tilde{x})\,. \end{aligned}$$

Substituting $x=\sum _{ij}b_{ij}(\varvec{WH})_{ij}^c$ and $\tilde{x}=\sum _{ij}b_{ij}(\varvec{\widetilde{W}\widetilde{H}})_{ij}^c$ into this inequality, we have

$$\begin{aligned} D(\varvec{W},\varvec{H})&= a\left( \sum _{ij} b_{ij}(\varvec{WH})_{ij}^c\right) ^d \nonumber \\&\le a\left( \sum _{ij} b_{ij}(\varvec{\widetilde{W}\widetilde{H}})_{ij}^c\right) ^d \nonumber \\&\quad +ad \left( \sum _{ij} b_{ij}(\varvec{\widetilde{W}\widetilde{H}})_{ij}^c\right) ^{d-1} \left( \sum _{ij}b_{ij}(\varvec{WH})_{ij}^c -\sum _{ij}b_{ij}(\varvec{\widetilde{W}\widetilde{H}})_{ij}^c \right) \nonumber \\&= \bar{D}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) \end{aligned}$$

which completes the proof. $\square $

Lemma 9

Suppose that the error function is expressed as $D(\varvec{W},\varvec{H})=\sum _{ij}a_{ij}(\varvec{WH})_{ij}^b$ where $a_{ij}$ are nonzero constants and b is a constant other than 0 or 1. If $\xi _{ij}(x) \triangleq a_{ij}x^b$ is convex in $\mathbb {R}_{++}$, let

$$\begin{aligned} \bar{D}_{ij}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) =a_{ij} (\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}^{b-1} \sum _k (\widetilde{W}_{ik}\widetilde{H}_{kj})^{1-b} (W_{ik}H_{kj})^b \,. \end{aligned}$$

If $\xi _{ij}(x)$ is concave in $\mathbb {R}_{++}$, let

$$\begin{aligned} \bar{D}_{ij}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) = a_{ij} (\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}^{b} +a_{ij}b (\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}^{b-1} \left( (\varvec{W}\varvec{H})_{ij} -(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij} \right) \,. \end{aligned}$$

Then $\bar{D}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) =\sum _{ij}\bar{D}_{ij}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})$ is an auxiliary function of $D(\varvec{W},\varvec{H})$, and satisfies the conditions in Assumptions 1 and 2.

Proof

Let $D_{ij}(\varvec{W},\varvec{H})=a_{ij}(\varvec{W}\varvec{H})_{ij}^b$. There are two cases to consider: One is that $\xi _{ij}(x)\triangleq a_{ij}x^b$ is convex and the other is that $\xi _{ij}(x)$ is concave. In either case, we easily see that the following statements hold true:

1.
$\bar{D}_{ij}(\hat{\varvec{W}},\hat{\varvec{H}},\hat{\varvec{W}},\hat{\varvec{H}}) = D_{ij}(\hat{\varvec{W}},\hat{\varvec{H}})$ for all $(\hat{\varvec{W}},\hat{\varvec{H}}) \in \mathrm {int}\,\mathcal {F}_0$,
2.
$\bar{D}_{ij}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})$ is differentiable at any point in $\mathrm {int}\,\mathcal {F}_0 \times \mathrm {int}\,\mathcal {F}_0$, and
3.
$\nabla _{\varvec{W}}\bar{D}_{ij}(\hat{\varvec{W}},\hat{\varvec{H}},\hat{\varvec{W}},\hat{\varvec{H}}) = \nabla _{\varvec{W}}D_{ij}(\hat{\varvec{W}},\hat{\varvec{H}})$ and $\nabla _{\varvec{H}}\bar{D}_{ij}(\hat{\varvec{W}},\hat{\varvec{H}},\hat{\varvec{W}},\hat{\varvec{H}}) = \nabla _{\varvec{H}}D_{ij}(\hat{\varvec{W}},\hat{\varvec{H}})$ for all $(\hat{\varvec{W}},\hat{\varvec{H}}) \in \mathrm {int}\,\mathcal {F}_0$.

Therefore, it suffices for us to show that

$$\begin{aligned} \bar{D}_{ij}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) \ge D_{ij}(\varvec{W},\varvec{H}) \end{aligned}$$

for all $(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) \in \mathrm {int}\,\mathcal {F}_0 \times \mathrm {int}\,\mathcal {F}_0$. Suppose first that $\xi _{ij}(x)=a_{ij}x^b$ is convex in $\mathbb {R}_{++}$. Then, for any numbers $x_1,x_2,\ldots ,x_r$ and any positive numbers $\lambda _1,\lambda _2,\ldots ,\lambda _r$ such that $\sum _{k} \lambda _k=1$, it follows from Jensen’s inequality that

$$\begin{aligned} \xi _{ij}\left( \sum _k x_k\right) =\xi _{ij}\left( \sum _k \lambda _k \cdot \frac{x_k}{\lambda _k}\right) \le \sum _k \lambda _k \xi _{ij}\left( \frac{x_k}{\lambda _k} \right) \, . \end{aligned}$$

Substituting $x_k=W_{ik}H_{kj}$ and $\lambda _k=\widetilde{W}_{ik}\widetilde{H}_{kj} /(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}$ into this equation, we have

$$\begin{aligned} D_{ij}(\varvec{W},\varvec{H})&= a_{ij} (\varvec{W}\varvec{H})_{ij}^{b} \nonumber \\&\le \sum _k \frac{\widetilde{W}_{ik}\widetilde{H}_{kj}}{(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}} a_{ij} \left( \frac{W_{ik}H_{kj}}{\widetilde{W}_{ik}\widetilde{H}_{kj} /(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}}\right) ^b \nonumber \\&= a_{ij} (\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}^{b-1} \sum _k (\widetilde{W}_{ik}\widetilde{H}_{kj})^{1-b}(W_{ik}H_{kj})^b \nonumber \\&= \bar{D}_{ij}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) \,. \end{aligned}$$

Suppose next that $\xi _{ij}(x)=a_{ij}x^b$ is concave in $\mathbb {R}_{++}$. Then, for any positive numbers x and $\tilde{x}$, the following inequality holds:

$$\begin{aligned} \xi _{ij}(x) \le \xi _{ij}(\tilde{x})+\xi _{ij}'(\tilde{x})(x-\tilde{x})\,. \end{aligned}$$

Substituting $x=(\varvec{W}\varvec{H})_{ij}$ and $\tilde{x}=(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}$ into this inequality, we have

$$\begin{aligned} D_{ij}(\varvec{W},\varvec{H})&= a_{ij}(\varvec{W}\varvec{H})_{ij}^b \nonumber \\&\le a_{ij}(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}^b +a_{ij}b(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}^{b-1} \left( (\varvec{W}\varvec{H})_{ij} -(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}\right) \nonumber \\&= \bar{D}_{ij}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) \end{aligned}$$

which completes the proof. $\square $

Lemma 10

Suppose that the error function is expressed as $D(\varvec{W},\varvec{H})=\sum _{tijk} a_{tijk} (W_{ik}H_{kj})^{b_t}$ where $a_{tijk}$ are nonzero constants and $b_t$ are constants, $a_{tijk}x^{b_t}$ is convex in $\mathbb {R}_{++}$, and $\{b_t\}$ contains at least two distinct nonzero numbers. Let $b_{\max }=\max \{b_t\,|\,b_t \ne 0\}$ and $b_{\min }=\min \{b_t\,|\,b_t \ne 0\}$. Let us define $\bar{D}_{tijk}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})$ on $\mathrm {int}\,\mathcal {F}_0 \times \mathrm {int}\,\mathcal {F}_0$ as follows:

1.
If $b_t \in \{b_{\min }, b_{\max },0\}$, let
$$\begin{aligned} \bar{D}_{tijk}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})=a_{tijk} (W_{ik}H_{kj})^{b_t} \,. \end{aligned}$$
2.
If $b_t \not \in \{b_{\min }, b_{\max },0\}$ and
1. (a)
  if $(b_t>1) \vee ((b_t=1) \wedge (a_{tijk}>0))$, let
  $$\begin{aligned} \bar{D}_{tijk}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})&= \frac{a_{tijk} b_t}{b_{\max }} (\widetilde{W}_{ik}\widetilde{H}_{kj})^{b_t-b_{\max }} \left( W_{ik}H_{kj}\right) ^{b_{\max }} \\&\quad +a_{tijk} (\widetilde{W}_{ik}\widetilde{H}_{kj})^{b_t} \left( 1-\frac{b_t}{b_{\max }}\right) \, , \end{aligned}$$
2. (b)
  if $(b_t<1) \vee ((b_t=1) \wedge (a_{tijk}<0))$, let
  $$\begin{aligned} \bar{D}_{tijk}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})&= \frac{a_{tijk} b_t}{b_{\min }} (\widetilde{W}_{ik}\widetilde{H}_{kj})^{b_t-b_{\min }} \left( W_{ik}H_{kj}\right) ^{b_{\min }} \\&\quad +a_{tijk} (\widetilde{W}_{ik}\widetilde{H}_{kj})^{b_t} \left( 1-\frac{b_t}{b_{\min }}\right) \, . \end{aligned}$$

Then $\bar{D}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) =\sum _{tijk} \bar{D}_{tijk}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})$ is an auxiliary function of $D(\varvec{W},\varvec{H})$, and strictly convex in $\mathrm {int}\,\mathcal {F}_0$. Furthermore, $\bar{D}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})$ satisfies the conditions in Assumptions 1 and 2.

Proof

Let $D_{tijk}(\varvec{W},\varvec{H})=a_{tijk}(W_{ik}H_{kj})^{b_t}$. There are three cases to consider depending on the values of $a_{tijk}$ and $b_t$. In either case, we easily see that $\bar{D}_{tijk}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})$ is differentiable at any point in $\mathrm {int}\,\mathcal {F}_0 \times \mathrm {int}\,\mathcal {F}_0$ and that

$$\begin{aligned} \nabla _{\varvec{W}}\bar{D}_{tijk}(\hat{\varvec{W}},\hat{\varvec{H}},\hat{\varvec{W}},\hat{\varvec{H}})&= \nabla _{\varvec{W}}D_{tijk}(\hat{\varvec{W}},\hat{\varvec{H}})\,, \\ \nabla _{\varvec{H}}\bar{D}_{tijk}(\hat{\varvec{W}},\hat{\varvec{H}},\hat{\varvec{W}},\hat{\varvec{H}})&= \nabla _{\varvec{H}}D_{tijk}(\hat{\varvec{W}},\hat{\varvec{H}}) \end{aligned}$$

hold for all $(\hat{\varvec{W}},\hat{\varvec{H}}) \in \mathcal {F}_0$. Also, it has already been shown by Yang and Oja [45, Lemma 2] that $\bar{D}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})$ is an auxiliary function of $D(\varvec{W},\varvec{H})$. $\square $

Derivation of auxiliary function for Kullback–Leibler divergence with regularization term

We derive an auxiliary function for Kullback–Leibler divergence with the regularization term by using the unified method of Yang and Oja. First of all, we rewrite the error function by using (33) as follows:

$$\begin{aligned} D(\varvec{W},\varvec{H})&=\lim _{\mu \rightarrow 0^{+}} \frac{1}{\mu } \Bigl ( D_1(\varvec{W},\varvec{H})+D_2(\varvec{W},\varvec{H}) +D_3(\varvec{W},\varvec{H})+D_4(\varvec{W},\varvec{H}) \Bigr ) \\&\qquad +\sum _{ij}\frac{X_{ij}}{\sum _{pq}X_{pq}} \ln \left( \frac{X_{ij}}{\sum _{pq}X_{pq}}\right) +\frac{C}{2}\Biggl (\sum _{ij}X_{ij}\Biggr )^2 \end{aligned}$$

where

$$\begin{aligned} D_1(\varvec{W},\varvec{H})&= \Biggl (\sum _{ij}(\varvec{W}\varvec{H})_{ij}\Biggr )^{\mu }, \\ D_2(\varvec{W},\varvec{H})&= -\sum _{ij}\frac{X_{ij}}{\sum _{pq}X_{pq}}(\varvec{W}\varvec{H})_{ij}^{\mu }, \\ D_3(\varvec{W},\varvec{H})&= -\mu C \sum _{ij}X_{ij} \cdot \sum _{ij}(\varvec{W}\varvec{H})_{ij}, \\ D_4(\varvec{W},\varvec{H})&= \frac{\mu C}{2} \Biggl (\sum _{ij}(\varvec{W}\varvec{H})_{ij}\Biggr )^2. \end{aligned}$$

Let us assume that $\mu $ is a sufficiently small positive constant. Applying Lemmas 8 and 9 to these functions, we have the following auxiliary functions:

$$\begin{aligned} \overline{D_1}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})&= \mu \Biggl (\sum _{ij}(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}\Biggr )^{\mu -1} \sum _{ijk}W_{ik}H_{kj} +(1-\mu ) \Biggl (\sum _{ij}(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}\Biggr )^{\mu }, \\ \overline{D_2}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})&=-\sum _{ij}\frac{X_{ij}}{\sum _{pq}X_{pq}}(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}^{\mu -1} \sum _k (\widetilde{W}_{ik}\widetilde{H}_{kj})^{1-\mu } (W_{ik}H_{kj})^{\mu }, \\ \overline{D_3}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})&=-\mu C \sum _{ij}X_{ij} \cdot \sum _{ijk}W_{ik}H_{kj}, \\ \overline{D_4}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})&= \frac{\mu C}{2} \sum _{ij}(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij} \cdot \sum _{ijk}\left( \widetilde{W}_{ik}\widetilde{H}_{kj}\right) ^{-1} (W_{ik}H_{kj})^2. \end{aligned}$$

The exponents of $W_{ik}H_{kj}$ in these auxiliary functions are 1, $\mu $, 1 and 2. The minimum is $\mu $ and the maximum is 2. So we apply Lemma 10 to some of these auxiliary functions to obtain an auxiliary function of $D(\varvec{W},\varvec{H})$ such that the exponents of $W_{ik}H_{kj}$ are restricted to $\mu $ and 2. Applying Lemma 10 to $\overline{D_1}$, we obtain another auxiliary function of $D_1$ as follows:

$$\begin{aligned} \overline{\overline{D_1}}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})&= \frac{\mu }{2} \Biggl (\sum _{ij}(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}\Biggr )^{\mu -1} \sum _{ijk} (\widetilde{W}_{ik}\widetilde{H}_{kj})^{-1} (W_{ik}H_{kj})^2 \\&\quad +\left( 1-\frac{\mu }{2}\right) \Biggl (\sum _{ij}(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}\Biggr )^{\mu }. \end{aligned}$$

Applying Lemma 10 to $\overline{D_3}$, we obtain another auxiliary function of $D_3$ as follows:

$$\begin{aligned}&\overline{\overline{D_3}}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) \\&\quad = -C \sum _{ij}X_{ij} \cdot \sum _{ijk} (\widetilde{W}_{ik}\widetilde{H}_{kj})^{1-\mu } (W_{ik}H_{kj})^{\mu } +(1-\mu )C \sum _{ij}X_{ij} \cdot \sum _{ij}(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij} . \end{aligned}$$

As a result, we have the following auxiliary function:

$$\begin{aligned} \bar{D}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})&= \lim _{\mu \rightarrow 0^{+}} \frac{1}{\mu } \Biggl (\overline{\overline{D_1}}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) +\overline{D_2}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})\\&\qquad +\overline{\overline{D_3}}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) +\overline{D_4}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) \Biggr ) \\&\qquad +\sum _{ij}\frac{X_{ij}}{\sum _{pq}X_{pq}} \ln \left( \frac{X_{ij}}{\sum _{pq}X_{pq}}\right) +\frac{C}{2}\Biggl (\sum _{ij}X_{ij}\Biggr )^2. \end{aligned}$$

Because

$$\begin{aligned}&\lim _{\mu \rightarrow 0^{+}} \Biggl ( \overline{\overline{D_1}}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) +\overline{D_2}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) \\&\quad +\, \,\overline{\overline{D_3}}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) +\overline{D_4}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}}) \Biggr )=0, \end{aligned}$$

we apply L’Hôpital’s rule. Then we have

$$\begin{aligned} \bar{D}(\varvec{W},\varvec{H},\widetilde{\varvec{W}},\widetilde{\varvec{H}})&= \frac{1}{2} \Biggl (\sum _{ij}(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}\Biggr )^{-1} \sum _{ijk} (\widetilde{W}_{ik}\widetilde{H}_{kj})^{-1} (W_{ik}H_{kj})^2 \\&\quad -\frac{1}{2}+\ln \Biggl ( \sum _{ij}(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij} \Biggr ) \\&\quad -\sum _{ij}\frac{X_{ij}}{\sum _{pq}X_{pq}}(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}^{-1} \ln \left( (\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}\right) \sum _k \widetilde{W}_{ik}\widetilde{H}_{kj} \\&\quad -\sum _{ij}\frac{X_{ij}}{\sum _{pq}X_{pq}}(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}^{-1} \sum _k \widetilde{W}_{ik}\widetilde{H}_{kj} \ln \left( \frac{W_{ik}H_{kj}}{\widetilde{W}_{ik}\widetilde{H}_{kj}}\right) \\&\quad -C \sum _{ij}X_{ij} \cdot \sum _{ijk}\widetilde{W}_{ik} \widetilde{H}_{kj} \ln \left( \frac{W_{ik}H_{kj}}{\widetilde{W}_{ik}\widetilde{H}_{kj}}\right) \\&\quad -C \sum _{ij}X_{ij} \cdot \sum _{ij} (\widetilde{\varvec{W}} \widetilde{\varvec{H}})_{ij} \\&\quad +\frac{C}{2} \sum _{ij}(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij} \cdot \sum _{ijk}(\widetilde{W}_{ik}\widetilde{H}_{kj})^{-1} (W_{ik}H_{kj})^2 \\&\quad +\sum _{ij}\frac{X_{ij}}{\sum _{pq}X_{pq}} \ln \left( \frac{X_{ij}}{\sum _{pq}X_{pq}}\right) +\frac{C}{2}\Biggl (\sum _{ij}X_{ij}\Biggr )^2 \end{aligned}$$

which can be rewritten in the form of (18) with

$$\begin{aligned} \bar{D}_{ijk}^1(W_{ik},H_{kj},\widetilde{\varvec{W}},\widetilde{\varvec{H}})&= \frac{1}{2} \Biggl (\sum _{pq} (\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{pq}\Biggr )^{-1} (\widetilde{W}_{ik}\widetilde{H}_{kj})^{-1} (W_{ik}H_{kj})^2 \\&\quad -\frac{X_{ij}}{\sum _{pq}X_{pq}}(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{ij}^{-1} \widetilde{W}_{ik}\widetilde{H}_{kj} \ln (W_{ik}H_{ij}) \\&\quad -C \sum _{pq}X_{pq} \cdot \widetilde{W}_{ik} \widetilde{H}_{kj} \ln (W_{ik}H_{kj}) \\&\quad +\frac{C}{2} \sum _{pq}(\widetilde{\varvec{W}}\widetilde{\varvec{H}})_{pq} \cdot (\widetilde{W}_{ik}\widetilde{H}_{kj})^{-1} (W_{ik}H_{kj})^2 \,. \end{aligned}$$

Derivation of upper bound for multiplicative update rule obtained from Kullback–Leibler divergence with regularization term

For the first multiplicative update rule shown in Table 3, which is obtained from Kullback–Leibler divergence with the regularization term, we derive an upper bound for $f_{ik}(\varvec{W},\varvec{H})$ on $\mathcal {F}_{\epsilon }$. By simple mathematical manipulations, we have the following inequalities:

$$\begin{aligned} f_{ik}(\varvec{W},\varvec{H})&< W_{ik} \left( \frac{\sum _j \frac{X_{ij}}{\sum _{pq}X_{pq}} (\varvec{WH})_{ij}^{-1}H_{kj}+C\sum _{pq}X_{pq}\sum _j H_{kj}}{C \sum _{pq}(\varvec{WH})_{pq}\sum _jH_{kj}}\right) ^{\frac{1}{2}} \\&\le W_{ik} \left( \frac{\sum _j \frac{X_{ij}}{\sum _{pq}X_{pq}}(\varvec{WH})_{ij}^{-1}H_{kj}}{C \sum _{pq}(\varvec{WH})_{pq}\sum _jH_{kj}} +\frac{\sum _{pq}X_{pq}}{\sum _{pq}(\varvec{WH})_{pq}} \right) ^{\frac{1}{2}}. \end{aligned}$$

Here note that

$$\begin{aligned} \frac{\sum _j \frac{X_{ij}}{\sum _{pq}X_{pq}}(\varvec{WH})_{ij}^{-1}H_{kj}}{\sum _jH_{kj}}&= \sum _j \frac{X_{ij}}{\sum _{pq}X_{pq}}(\varvec{WH})_{ij}^{-1} \frac{H_{kj}}{\sum _qH_{kq}} \\&< \frac{1}{\epsilon ^2 r} \frac{\sum _j X_{ij}}{\sum _{pq}X_{pq}}\\&< \frac{1}{\epsilon ^2 r} \end{aligned}$$

and

$$\begin{aligned} \frac{1}{\sum _{pq}(\varvec{W}\varvec{H})_{pq}} < \frac{1}{\sum _{q} W_{ik}H_{kq}} = \frac{1}{W_{ik}\sum _q H_{kq}} \le \frac{1}{\epsilon n W_{ik}}. \end{aligned}$$

Therefore we have

$$\begin{aligned} f_{ik}(\varvec{W},\varvec{H}) \le W_{ik}^{\frac{1}{2}} \Biggl ( \frac{1}{\epsilon ^3nrC} +\frac{1}{\epsilon n}\sum _{pq}X_{pq} \Biggr )^{\frac{1}{2}}\,. \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Takahashi, N., Katayama, J., Seki, M. et al. A unified global convergence analysis of multiplicative update rules for nonnegative matrix factorization. Comput Optim Appl 71, 221–250 (2018). https://doi.org/10.1007/s10589-018-9997-y

Download citation

Received: 10 April 2017
Published: 15 March 2018
Issue Date: September 2018
DOI: https://doi.org/10.1007/s10589-018-9997-y

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A unified global convergence analysis of multiplicative update rules for nonnegative matrix factorization

Abstract

Access this article

Similar content being viewed by others

A Novel Newton-Type Algorithm for Nonnegative Matrix Factorization with Alpha-Divergence

A novel update rule of HALS algorithm for nonnegative matrix factorization and Zangwill’s global convergence

Algorithms for Nonnegative Matrix Factorization with the Kullback–Leibler Divergence

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

How to derive auxiliary functions

Lemma 8

Proof

Lemma 9

Proof

Lemma 10

Proof

Derivation of auxiliary function for Kullback–Leibler divergence with regularization term

Derivation of upper bound for multiplicative update rule obtained from Kullback–Leibler divergence with regularization term

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

A unified global convergence analysis of multiplicative update rules for nonnegative matrix factorization

Abstract

Access this article

Similar content being viewed by others

A Novel Newton-Type Algorithm for Nonnegative Matrix Factorization with Alpha-Divergence

A novel update rule of HALS algorithm for nonnegative matrix factorization and Zangwill’s global convergence

Algorithms for Nonnegative Matrix Factorization with the Kullback–Leibler Divergence

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

How to derive auxiliary functions

Lemma 8

Proof

Lemma 9

Proof

Lemma 10

Proof

Derivation of auxiliary function for Kullback–Leibler divergence with regularization term

Derivation of upper bound for multiplicative update rule obtained from Kullback–Leibler divergence with regularization term

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation