Skip to main content
Log in

A deep discriminative and robust nonnegative matrix factorization network method with soft label constraint

  • Original
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

In order to obtain a discriminative, compact and robust data representation, a discriminative and robust nonnegative matrix factorization method with soft label constraint (DRNMF_SLC) is proposed. By minimizing the objective function, the data representation after learning soft label constraint is obtained. To further acquire a more hierarchical and discriminative data representation, a deep discriminative and robust nonnegative matrix factorization network method with soft label constraint (Deep DRNMFN_SLC) is constructed. In order to improve the feature expression ability of deep neural network (DNN), a deep discriminative and robust nonnegative matrix factorization network method with soft label constraint based on DNN (Deep DRNMFN_SLC_DNN) is proposed, which could obtain a more discriminative, robust and generalized feature representation, and meanwhile greatly reduce the dimension of data features. Furthermore, the objective function of DRNMF_SLC is constructed by introducing both the global loss function and the central loss function of soft label constraint matrix, and the optimization solution and convergence proof of objective function are given simultaneously. When the proposed DRNMF_SLC method and Deep DRNMFN_SLC_DNN method are, respectively, applied to the face recognition under occlusions and illumination variations, the frameworks, Algorithm 1 and Algorithm 2 are given. The extensive and adequate experiments demonstrate the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Sun Y, Mao H, Sang Y et al (2017) Explicit guiding auto-encoders for learning meaningful representation. Neural Comput Appl 28(3):429–436

    Article  Google Scholar 

  2. Belhumeur PN, Hespanha JP, Kriegman DJ (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Int 19(7):711–720

    Article  Google Scholar 

  3. He J, Bi Y, Ding L et al (2017) Unsupervised feature selection based on decision graph. Neural Comput Appl 28(10):3047–3059

    Article  Google Scholar 

  4. Li Z, Liu J, Yang Y et al (2014) Clustering-guided sparse structural learning for unsupervised feature selection. IEEE Trans Know Data Eng 26(9):2138–2150

    Article  Google Scholar 

  5. Yan H, Yang J (2015) Sparse discriminative feature selection. Pattern Recognit 48(5):1827–1835

    Article  Google Scholar 

  6. Li Z, Liu J, Tang J et al (2015) Robust structured subspace learning for data representation. IEEE Trans Pattern Anal Mach Int 37(10):2085–2098

    Article  Google Scholar 

  7. Feng Y, Xiao J, Zhou K et al (2015) A locally weighted sparse graph regularized non-negative matrix factorization method. Neurocomputing 169:68–76

    Article  Google Scholar 

  8. Pang Y, Wang S, Yuan Y (2014) Learning regularized LDA by clustering. IEEE Trans Neural Netw Learn Syst 25(12):2191–2201

    Article  Google Scholar 

  9. He X, Niyogi P (2004) Locality preserving projections. In: Advances in neural information processing systems pp 153–160

  10. Zhang H, Wu QMJ, Chow TWS et al (2012) A two-dimensional neighborhood preserving projection for appearance-based face recognition. Pattern Recognit 45(5):1866–1876

    Article  Google Scholar 

  11. Yan S, Xu D, Zhang B et al (2007) Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Trans Pattern Anal Mach Int 29(1):40–51

    Article  Google Scholar 

  12. Tenenbaum JB, Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323

    Article  Google Scholar 

  13. Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396

    Article  Google Scholar 

  14. Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326

    Article  Google Scholar 

  15. Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791

    Article  Google Scholar 

  16. Li SZ, Hou XW, Zhang HJ et al (2001) Learning spatially localized, parts-based representation. In: IEEE conference on computer vision and pattern recognition, pp 207–212

  17. Pascual-Montano A, Carazo JM, Kochi K et al (2006) Nonsmooth nonnegative matrix factorization (nsNMF). IEEE Trans Pattern Anal Mach Int 28(3):403–415

    Article  Google Scholar 

  18. Cai D, He X, Han J et al (2011) Graph regularized nonnegative matrix factorization for data representation. IEEE Trans Pattern Anal Mach Int 33(8):1548–1560

    Article  Google Scholar 

  19. Wang Y, Jia Y, Hu C et al (2004) Fisher non-negative matrix factorization for learning local features. In: Proceedings of Asian conference on computer vision, pp 27–30

  20. Zafeiriou S, Tefas A, Buciu I et al (2006) Exploiting discriminant information in nonnegative matrix factorization with application to frontal face verification. Neural Netw 17(3):683–695

    Article  Google Scholar 

  21. Liu H, Wu Z, Li X et al (2012) Constrained nonnegative matrix factorization for image representation. IEEE Trans Pattern Anal Mach Int 34(7):1299–1311

    Article  Google Scholar 

  22. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. In: ImageNet challenge, pp 1–10

  23. Schroff F, Kalenichenko D, Philbin J (2015) FaceNet: a unified embedding for face recognition and clustering. In: IEEE conference on computer vision and pattern recognition, pp 815–823

  24. Sun Y, Chen Y, Wang X et al (2014) Deep learning face representation by joint identification-verification. In: Advances in neural information processing systems, pp 1988–1996

  25. Sun Y, Wang X, Tang X (2015) Deeply learned face representations are sparse, selective, and robust. In: IEEE conference on computer vision and pattern recognition, pp 2892–2900

  26. Sun Y, Liang D, Wang X et al (2015) Deepid3: Face recognition with very deep neural networks. arXiv:1502.00873

  27. Szegedy C, Liu W, Jia Y et al (2015) Going deeper with convolutions. In: IEEE conference on computer vision and pattern recognition, pp 1–9

  28. Parkhi OM, Vedaldi A, Zisserman A (2015) Deep face recognition. Proc Br Mach Vis Conf 1(3):6

    Google Scholar 

  29. Yue K, Xu F, Yu J (2017) Shallow and wide fractional max-pooling network for image classification. Neural Comput Appl. https://doi.org/10.1007/s00521-017-3073-x

  30. Zhang H, Cao X, Ho JKL et al (2017) Object-level video advertising: an optimization framework. IEEE Trans Ind Inf 13(2):520–531

    Article  Google Scholar 

  31. Mehdipour Ghazi M, Kemal Ekenel H (2016) A comprehensive analysis of deep learning based representation for face recognition. In: IEEE conference on computer vision and pattern recognition, pp 34–41

  32. Srivastava N, Hinton G, Krizhevsky A et al (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  33. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105

    Google Scholar 

  34. Zheng W (2014) Multi-view facial expression recognition based on group sparse reduced-rank regression. IEEE Trans Affect Comput 5(1):71–85

    Article  Google Scholar 

  35. Li J, Zhao J, Zhao F et al (2016) Robust face recognition with deep multi-view representation learning. In: Proceedings of the 2016 ACM on multimedia conference, pp 1068–1072

  36. Wu F, Jing XY, You X et al (2016) Multi-view low-rank dictionary learning for image classification. Pattern Recognit 50:143–154

    Article  Google Scholar 

  37. Song HA, Kim BK, Xuan TL et al (2015) Hierarchical feature extraction by multi-layer non-negative matrix factorization network for classification task. Neurocomputing 165:63–74

    Article  Google Scholar 

  38. Huang GB, Lee H, Learned-Miller E (2012) Learning hierarchical representations for face verification with convolutional deep belief networks. In: IEEE conference on computer vision and pattern recognition, pp 2518–2525

  39. Trigeorgis G, Bousmalis K, Zafeiriou S et al (2017) A deep matrix factorization method for learning attribute representations. IEEE Trans Pattern Anal Mach Int 39(3):417–429

    Article  Google Scholar 

  40. Ouyang W, Wang X (2012) A discriminative deep model for pedestrian detection with occlusion handling. In: IEEE conference on computer vision and pattern recognition, pp 3258–3265

  41. Chan TH, Jia K, Gao S et al (2015) PCANet: a simple deep learning baseline for image classification? IEEE Trans Image Process 24(12):5017–5032

    Article  MathSciNet  Google Scholar 

  42. Zhen L, Yi D, Li SZ (2016) Learning stacked image descriptor for face recognition. IEEE Trans Circuits Syst Video Technol 26(9):1685–1696

    Article  Google Scholar 

  43. Hosseini-Asl E, Zurada JM, Nasraoui O (2016) Deep learning of part-based representation of data using sparse autoencoders with nonnegativity constraints. IEEE Trans Neural Netw Learn Syst 27(12):2486–2498

    Article  Google Scholar 

  44. Babenko A, Slesarev1 A, Chigorin A et al (2014) Neural codes for image retrieval. In: Proceedings of European conference on computer vision, pp 584–599

  45. Yang S, Luo P, Loy CC et al (2015) Deep representation learning with target coding. In: AAAI, pp 3848–3854

  46. Cao Y, Long M, Wang J et al (2016) Deep quantization network for efficient image retrieval. In: AAAI, pp 3457–3463

  47. Gui L, Morency LP (2015) Learning and transferring deep ConvNet representations with group-sparse factorization. In: International conference on computer vision

  48. Martinez AR, Benavente R (1998) The AR face database. CVC technical report 24, Barcelona, Spain

  49. Samaria FS, Harter AC (1994) Parameterisation of a stochastic model for human face identification. In: Proceedings of the second IEEE workshop on applications of computer vision, pp 138–142

  50. Sim T, Baker S, Bsat M (2002) The CMU pose, illumination, and expression (PIE) database. In: Fifth IEEE international conference on automatic face and gesture recognition

  51. Georghiades AS, Belhumeur PN, Kriegman DJ (2001) From few to many: illumination cone models for face recognition under variable lighting and pose. IEEE Trans Pattern Anal Mach Int 23(6):643–660

    Article  Google Scholar 

  52. Zhang R, Hu Z, Pan G et al (2016) Robust discriminative non-negative matrix factorization. Neurocomputing 173:552–561

    Article  Google Scholar 

  53. Jia Y, Shelhamer E, Donahue J et al (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia, pp 675–678

  54. Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605

    MATH  Google Scholar 

Download references

Acknowledgements

This work was supported partially by National Natural Science Foundation of China (Grant No. 61072110), Shaanxi Province key project of Research and Development Plan (S2018-YF-ZDGY-0187), International Cooperation Project of Shaanxi Province (S2018-YF-GHMS-0061) and International Cooperation Project of Shaanxi Province (2016KW-042).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ming Tong.

Ethics declarations

Conflict of interest

All the authors of the manuscript declared that there are no potential conflicts of interest.

Informed consent

All the authors of the manuscript declared that there is no material that required informed consent.

Human and animal rights

All the authors of the manuscript declared that there is no research involving human participants and/or animal.

Appendix: Proof of Theorem 1

Appendix: Proof of Theorem 1

To prove Theorem 1, the following characteristic of an auxiliary function is used, which is the same as used in the expectation–maximization (EM) algorithm.

Lemma 1

If there is an auxiliary function \( G \) for \( \tilde{J}\left( x \right) \), which satisfies the conditions of \( G\left( {x,x{\kern 1pt}^{t} } \right) \ge \tilde{J}\left( x \right) \) and \( G\left( {x,x} \right)\text{ = }\tilde{J}\left( x \right) \), then \( \tilde{J}\left( x \right) \) is non-increasing under the update

$$ x{\kern 1pt}^{t + 1} \text{ = }\arg \min_{x} G\left( {x,x{\kern 1pt}^{t} } \right) $$
(42)

The equality \( \tilde{J}\left( {x^{{{\kern 1pt} t + 1}} } \right)\text{ = }\tilde{J}\left( {x^{{{\kern 1pt} t}} } \right) \) holds only if \( x{\kern 1pt}^{t} \) is a local minimum of \( G\left( {x,x^{{{\kern 1pt} t}} } \right) \). By iterating the update in Eq. (42), a sequence of estimates that converge to a local minimum \( x_{\hbox{min} } \text{ = }\arg \min_{x} \tilde{J}\left( x \right) \) is obtained, which will be shown by defining a proper auxiliary function for the objective function in Eq. (13).

First, the convergence of update rule in Eq. (21) is proven. For any an element \( A_{j,q} \left( k \right) \) in \( {\mathbf{A}}\left( k \right) \), let \( \tilde{J}_{{A_{j,q} \left( k \right)}} \) denote the part of \( J_{DRNMF\_SLC} \) relevant to \( A_{j,q} \left( k \right) \). Since the update is element wise essentially, it is sufficient to reveal that each \( \tilde{J}_{{A_{j,q} \left( k \right)}} \) is non-increasing under the update rule of Eq. (21), which is proven by defining the auxiliary function with regard to \( A_{j,q} \left( k \right) \) as follows:

Lemma 2

Let \( \tilde{J}^{\prime} \) represent the first-order derivative with regard to \( {\mathbf{A}}\left( k \right) \). The function

$$ \begin{aligned} G\left( {A\left( k \right),A_{j,q}^{t} \left( k \right)} \right)\text{ = }\tilde{J}_{{A_{j,q} \left( k \right)}} \left( {A_{j,q}^{t} \left( k \right)} \right)\text{ + }\tilde{J}^{\prime}_{{A_{j,q} \left( k \right)}} \left( {A_{j,q}^{t} \left( k \right)} \right)\left( {A\left( k \right) - A_{j,q}^{t} \left( k \right)} \right) \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \text{ + }\frac{{\left( {{\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right){\mathbf{A}}\left( k \right){\mathbf{F}}\left( k \right){\mathbf{F}}^{T} \left( k \right)} \right)_{j,q} }}{{A_{j,q}^{t} \left( k \right)}}\left( {A\left( k \right) - A_{j,q}^{t} \left( k \right)} \right)^{2} \\ \end{aligned} $$
(43)

is an auxiliary function for \( \tilde{J}_{{A_{j,q} \left( k \right)}} \), which is the part of \( J_{DRNMF\_SLC} \) that is only related to \( A_{j,q} \left( k \right) \).

Proof

Obviously, \( G\left( {A\left( k \right),A\left( k \right)} \right)\text{ = }\tilde{J}_{{A_{j,q} \left( k \right)}} \left( {A\left( k \right)} \right) \). Based on the definition of auxiliary function, only \( G\left( {A\left( k \right),A_{j,q}^{t} \left( k \right)} \right) \ge \tilde{J}_{{A_{j,q} \left( k \right)}} \left( {A\left( k \right)} \right) \) needs to be demonstrated. In order to achieve this purpose, \( G\left( {A\left( k \right),A_{j,q}^{t} \left( k \right)} \right) \) in Eq. (43) is compared with the Taylor series expansion of \( \tilde{J}_{{A_{j,q} \left( k \right)}} \left( {A\left( k \right)} \right) \):

$$ \tilde{J}_{{A_{j,q} \left( k \right)}} \left( {A\left( k \right)} \right)\text{ = }\tilde{J}_{{A_{j,q} \left( k \right)}} \left( {A_{j,q}^{t} \left( k \right)} \right)\text{ + }\tilde{J}^{\prime}_{{A_{j,q} \left( k \right)}} \left( {A\left( k \right) - A_{j,q}^{t} \left( k \right)} \right)\text{ + }\frac{1}{2}\tilde{J}^{\prime\prime}_{{A_{j,q} \left( k \right)}} \left( {A\left( k \right) - A_{j,q}^{t} \left( k \right)} \right)^{2} $$
(44)

where \( \tilde{J}^{\prime\prime} \) is the second-order derivative regarding \( {\mathbf{A}}\left( k \right) \). It is simple to check that

$$ \tilde{J}^{\prime}_{{A_{j,q} \left( k \right)}} \text{ = }\left( {\frac{{\partial J_{DRNMF\_SLC} }}{{\partial {\mathbf{A}}\left( k \right)}}} \right)_{j,q} \text{ = }\left( { - 2{\mathbf{Z}}^{T} \left( k \right){\mathbf{B}}\left( k \right){\mathbf{F}}^{T} \left( k \right)\text{ + }2{\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right){\mathbf{A}}\left( k \right){\mathbf{F}}\left( k \right){\mathbf{F}}^{{{\kern 1pt} T}} \left( k \right)} \right)_{j,q} $$
(45)
$$ \tilde{J}^{\prime\prime}_{{A_{j,q} \left( k \right)}} \text{ = }2\left( {{\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right)} \right)_{j,j} \left( {{\mathbf{F}}\left( k \right){\mathbf{F}}^{T} \left( k \right)} \right)_{q,q} $$
(46)

Substituting Eq. (46) into Eq. (44) and comparing with Eq. (43), it can be seen that, instead of demonstrating \( G\left( {A\left( k \right),A_{j,q}^{t} \left( k \right)} \right) \ge \tilde{J}_{{A_{j,q} \left( k \right)}} \left( {A\left( k \right)} \right) \), it is equivalent to prove

$$ \frac{{\left( {{\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right){\mathbf{A}}\left( k \right){\mathbf{F}}\left( k \right){\mathbf{F}}^{T} \left( k \right)} \right)_{j,q} }}{{A_{j,q}^{t} \left( k \right)}} \ge \frac{1}{2}\tilde{J}^{\prime\prime}_{{A_{j,q} \left( k \right)}} \text{ = }\left( {{\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right)} \right)_{j,j} \left( {{\mathbf{F}}\left( k \right){\mathbf{F}}^{T} \left( k \right)} \right)_{q,q} $$
(47)

To prove the above inequality, the following inequality holds as:

$$ \begin{aligned} \left( {{\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right){\mathbf{A}}\left( k \right){\mathbf{F}}\left( k \right){\mathbf{F}}^{T} \left( k \right)} \right)_{j,q} \text{ = }\sum\limits_{l} {\left( {{\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right){\mathbf{A}}\left( k \right)} \right)_{j,l} \left( {{\mathbf{F}}\left( k \right){\mathbf{F}}^{T} \left( k \right)} \right)_{l,q} } \\ \ge \left( {{\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right){\mathbf{A}}\left( k \right)} \right)_{j,q} \left( {{\mathbf{F}}\left( k \right){\mathbf{F}}^{T} \left( k \right)} \right)_{q,q} \\ \ge \sum\limits_{l} {\left( {{\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right)} \right)_{j,l} A_{l,q}^{t} \left( k \right)\left( {{\mathbf{F}}\left( k \right){\mathbf{F}}^{T} \left( k \right)} \right)_{q,q} } \\ \ge A_{j,q}^{t} \left( k \right)\left( {{\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right)} \right)_{j,j} \left( {{\mathbf{F}}\left( k \right){\mathbf{F}}^{T} \left( k \right)} \right)_{q,q} \\ \end{aligned} $$
(48)

In summary, the statement \( G\left( {A\left( k \right),A_{j,q}^{t} \left( k \right)} \right) \ge \tilde{J}_{{A_{j,q} \left( k \right)}} \left( {A\left( k \right)} \right) \) holds, and Lemma 2 is proved.

Subsequently, an auxiliary function is defined for the update rule in Eq. (18). Similarly, let \( \tilde{J}_{{Z_{i,j} \left( k \right)}} \) represent the part of \( J_{DRNMF\_SLC} \) related to \( Z_{i,j} \left( k \right) \). Then, the auxiliary function relevant to \( Z_{i,j} \left( k \right) \) is defined as follows:

Lemma 3

The function

$$ \begin{aligned} G\left( {Z\left( k \right),Z_{i,j}^{t} \left( k \right)} \right)\text{ = }\tilde{J}_{{Z_{i,j} \left( k \right)}} \left( {Z_{i,j}^{t} \left( k \right)} \right)\text{ + }\tilde{J}^{\prime}_{{Z_{i,j} \left( k \right)}} \left( {Z_{i,j}^{t} \left( k \right)} \right)\left( {Z\left( k \right) - Z_{i,j}^{t} \left( k \right)} \right) \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \text{ + }\frac{{\left( {{\mathbf{Z}}\left( k \right){\mathbf{A}}\left( k \right){\mathbf{F}}\left( k \right){\mathbf{F}}^{T} \left( k \right){\mathbf{A}}^{T} \left( k \right)} \right)_{i,j} }}{{Z_{i,j}^{t} \left( k \right)}}\left( {Z\left( k \right) - Z_{i,j}^{t} \left( k \right)} \right)^{2} \\ \end{aligned} $$
(49)

is an auxiliary function for \( \tilde{J}_{{Z_{i,j} \left( k \right)}} \), which is the part of \( J_{DRNMF\_SLC} \) that is only related to \( Z_{i,j} \left( k \right) \).

The proof of Lemma 3 is similar to the proof of Lemma 2 essentially and is omitted due to space limitation.

Lemma 4

The function

$$ \begin{aligned} G\left( {F\left( k \right),F_{q,a}^{t} \left( k \right)} \right)\text{ = }\tilde{J}_{{F_{q,a} \left( k \right)}} \left( {F_{q,a}^{t} \left( k \right)} \right)\text{ + }\tilde{J}^{\prime}_{{F_{q,a} \left( k \right)}} \left( {F_{q,a}^{t} \left( k \right)} \right)\left( {F\left( k \right) - F_{q,a}^{t} \left( k \right)} \right) \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\text{ + }}\frac{{\left( {{\mathbf{A}}^{T} \left( k \right){\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right){\mathbf{A}}\left( k \right){\mathbf{F}}\left( k \right)} \right)_{q,a} {\text{ + }}\left( {{\mathbf{F}}\left( k \right)} \right)_{q,a} + \lambda_{{{\kern 1pt} 1}} \left( {\eta_{p,q}^{\left( r \right)} \left( k \right)} \right){\text{ + }}\frac{{\lambda_{3} }}{2}\left( {{\mathbf{D}}\left( k \right){\mathbf{F}}\left( k \right)} \right)_{q,a} }}{{F_{q,a}^{t} \left( k \right)}}\left( {F\left( k \right) - F_{q,a}^{t} \left( k \right)} \right)^{2} \\ \end{aligned} $$
(50)

is an auxiliary function for \( \tilde{J}_{{F_{q,a} \left( k \right)}} \), which is the part of \( J_{DRNMF\_SLC} \) that is only related to \( F_{q,a} \left( k \right) \).

Proof

Obviously, \( G\left( {F\left( k \right),F\left( k \right)} \right)\text{ = }\tilde{J}_{{F_{q,a} \left( k \right)}} \left( {F\left( k \right)} \right) \). Based on the definition of auxiliary function, only \( G\left( {F\left( k \right),F_{q,a}^{{{\kern 1pt} t}} \left( k \right)} \right) \ge \tilde{J}_{{F_{q,a} \left( k \right)}} \left( {F\left( k \right)} \right) \) needs to be proven. In order achieve this goal, \( G\left( {F\left( k \right),F_{q,a}^{{{\kern 1pt} t}} \left( k \right)} \right) \) in Eq. (50) is compared with the Taylor series expansion of \( \tilde{J}_{{F_{q,a} \left( k \right)}} \left( {F\left( k \right)} \right) \):

$$ \tilde{J}_{{F_{q,a} \left( k \right)}} \left( {F\left( k \right)} \right)\text{ = }\tilde{J}_{{F_{q,a} \left( k \right)}} \left( {F_{q,a}^{t} \left( k \right)} \right)\text{ + }\tilde{J}^{\prime}_{{F_{q,a} \left( k \right)}} \left( {F\left( k \right) - F_{q,a}^{t} \left( k \right)} \right)\text{ + }\frac{1}{2}\tilde{J}^{\prime\prime}_{{F_{q,a} \left( k \right)}} \left( {F\left( k \right) - F_{q,a}^{t} \left( k \right)} \right)^{2} $$
(51)

where

$$ \begin{aligned} \tilde{J}^{\prime}_{{F_{q,a} \left( k \right)}} \text{ = }\left( { - 2{\mathbf{A}}^{T} \left( k \right){\mathbf{Z}}^{T} \left( k \right){\mathbf{B}}\left( k \right)\text{ + }2{\mathbf{A}}^{T} \left( k \right){\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right){\mathbf{A}}\left( k \right){\mathbf{F}}\left( k \right)} \right)_{q,a} \text{ + }2\left( {F_{q,a} \left( k \right) - C_{q,a} \left( k \right)} \right) \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \text{ + }2\lambda_{{{\kern 1pt} 1}} \left( {\eta_{p,q}^{\left( r \right)} \left( k \right) - \mu_{q}^{\left( r \right)} \left( k \right)} \right) - \frac{{4\lambda_{2} }}{{N_{r} }}\sum\limits_{i \ne r}^{c} {\left( {\mu_{q}^{\left( r \right)} \left( k \right) - \mu_{q}^{\left( i \right)} \left( k \right)} \right)} \text{ + }\lambda_{3} \left( {{\mathbf{D}}\left( k \right){\mathbf{F}}\left( k \right)} \right)_{q,a} \\ \end{aligned} $$
(52)
$$ \tilde{J}^{\prime\prime}_{{F_{q,a} \left( k \right)}} \text{ = }2\left( {{\mathbf{A}}^{T} \left( k \right){\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right){\mathbf{A}}\left( k \right)} \right)_{q,q} \text{ + }2\text{ + }2\lambda_{{{\kern 1pt} 1}} - \frac{{2\lambda_{1} }}{{N_{r} }} - \frac{{4\lambda_{2} \left( {c - 1} \right)}}{{N_{r}^{2} }}\text{ + }\lambda_{3} \left( {{\mathbf{D}}\left( k \right)} \right)_{q,q} $$
(53)

Instead of demonstrating \( G\left( {F\left( k \right),F_{q,a}^{t} \left( k \right)} \right) \ge \tilde{J}_{{F_{q,a} \left( k \right)}} \left( {F\left( k \right)} \right) \), it is equivalent to prove

$$ \frac{{\left( {{\mathbf{A}}^{T} \left( k \right){\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right){\mathbf{A}}\left( k \right){\mathbf{F}}\left( k \right)} \right)_{q,a} {\text{ + }}\left( {{\mathbf{F}}\left( k \right)} \right)_{q,a} {\text{ + }}\lambda_{{{\kern 1pt} 1}} \left( {\eta_{p,q}^{\left( r \right)} \left( k \right)} \right) + \frac{{\lambda_{3} }}{2}\left( {{\mathbf{D}}\left( k \right){\mathbf{F}}\left( k \right)} \right)_{q,a} }}{{F_{q,a}^{t} \left( k \right)}} \ge \frac{1}{2}\tilde{J}^{\prime\prime}_{{F_{q,a} \left( k \right)}} $$
(54)

To prove above inequality, we have

$$ \begin{aligned} \left( {{\mathbf{A}}^{T} \left( k \right){\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right){\mathbf{A}}\left( k \right){\mathbf{F}}\left( k \right)} \right)_{q,a} \text{ = }\sum\limits_{l} {\left( {{\mathbf{A}}^{T} \left( k \right){\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right){\mathbf{A}}\left( k \right)} \right)_{q,l} } \left( {{\mathbf{F}}\left( k \right)} \right)_{l,a} \\ \ge \left( {{\mathbf{A}}^{T} \left( k \right){\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right){\mathbf{A}}\left( k \right)} \right)_{q,q} \left( {{\mathbf{F}}\left( k \right)} \right)_{q,a} \\ \end{aligned} $$
(55)
$$ \begin{aligned} \left( {{\mathbf{D}}\left( k \right){\mathbf{F}}\left( k \right)} \right)_{q,a} \text{ = }\sum\limits_{l} {\left( {{\mathbf{D}}\left( k \right)} \right)_{q,l} } \left( {{\mathbf{F}}\left( k \right)} \right)_{l,a} \\ \ge \left( {{\mathbf{D}}\left( k \right)} \right)_{q,q} \left( {{\mathbf{F}}\left( k \right)} \right)_{q,a} \\ \end{aligned} $$
(56)

Thus, the following inequality holds:

$$ \frac{{\left( {{\mathbf{A}}^{T} \left( k \right){\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right){\mathbf{A}}\left( k \right){\mathbf{F}}\left( k \right)} \right)_{q,a} {\text{ + }}\left( {{\mathbf{F}}\left( k \right)} \right)_{q,a} {\text{ + }}\lambda_{{{\kern 1pt} 1}} \left( {\eta_{p,q}^{\left( r \right)} \left( k \right)} \right){\text{ + }}\frac{{\lambda_{3} }}{2}\left( {{\mathbf{D}}\left( k \right){\mathbf{F}}\left( k \right)} \right)_{q,a} }}{{F_{q,a}^{t} \left( k \right)}} \ge \frac{1}{2}\tilde{J}^{\prime\prime}_{{F_{q,a} \left( k \right)}} $$
(57)

As a result, the statement \( G\left( {F\left( k \right),F_{q,a}^{t} \left( k \right)} \right) \ge \tilde{J}_{{F_{q,a} \left( k \right)}} \left( {F\left( k \right)} \right) \) holds, and Lemma 4 is proved.

With the above lemmas, the proof of Theorem 1 is further given.

Proof of Theorem 1

Substituting \( G\left( {A\left( k \right),A_{j,q}^{t} \left( k \right)} \right) \) of Eq. (43) into Eq. (42), the following equation is obtained:

$$ A_{j,q}^{t + 1} \left( k \right) = \text{arg}\min_{A\left( k \right)} G\left( {A\left( k \right),A_{j,q}^{t} \left( k \right)} \right)\text{ = }A_{j,q}^{t} \left( k \right)\frac{{\left( {{\mathbf{Z}}^{T} \left( k \right){\mathbf{B}}\left( k \right){\mathbf{F}}^{T} \left( k \right)} \right)_{j,q} }}{{\left( {{\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right){\mathbf{A}}\left( k \right){\mathbf{F}}\left( k \right){\mathbf{F}}^{T} \left( k \right)} \right)_{j,q} }} $$
(58)

Since Eq. (43) is an auxiliary function, \( \tilde{J}_{{A_{j,q} \left( k \right)}} \) is non-increasing under this iteration rule in Eq. (58), according to Lemma 2.

Then, substituting \( G\left( {Z\left( k \right),Z_{i,j}^{t} \left( k \right)} \right) \) of Eq. (49) into Eq. (42), the following equation is obtained:

$$ Z_{i,j}^{t + 1} \left( k \right)\text{ = arg}\min_{Z\left( k \right)} G\left( {Z\left( k \right),Z_{i,j}^{t} \left( k \right)} \right)\text{ = }Z_{i,j}^{t} \left( k \right)\frac{{\left( {{\mathbf{B}}\left( k \right){\mathbf{F}}^{T} \left( k \right){\mathbf{A}}^{T} \left( k \right)} \right)_{i,j} }}{{\left( {{\mathbf{Z}}\left( k \right){\mathbf{A}}\left( k \right){\mathbf{F}}\left( k \right){\mathbf{F}}^{T} \left( k \right){\mathbf{A}}^{T} \left( k \right)} \right)_{i,j} }} $$
(59)

Since Eq. (49) is an auxiliary function, \( \tilde{J}_{{Z_{i,j} \left( k \right)}} \) is non-increasing under this iteration rule in Eq. (59), according to Lemma 3.

Similarly, substituting \( G\left( {F\left( k \right),F_{q,a}^{t} \left( k \right)} \right) \) of Eq. (50) into Eq. (42), the following equation is obtained:

$$ \begin{aligned} F_{q,a}^{t + 1} \left( k \right)\text{ = arg}\min_{F\left( k \right)} G\left( {F\left( k \right),F_{q,a}^{t} \left( k \right)} \right) \\ \text{ = }F_{q,a}^{t} \left( k \right)\frac{{\left( {2{\mathbf{A}}^{T} \left( k \right){\mathbf{Z}}^{T} \left( k \right){\mathbf{B}}\left( k \right)} \right)_{q,a} {\text{ + }}\frac{{4\lambda_{2} }}{{N_{r} }}\sum\limits_{i \ne r}^{c} {\left( {\mu_{q}^{\left( r \right)} \left( k \right) - \mu_{q}^{\left( i \right)} \left( k \right)} \right)} }}{{\left( {2{\mathbf{A}}^{T} \left( k \right){\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right){\mathbf{A}}\left( k \right){\mathbf{F}}\left( k \right)} \right)_{q,a} {\text{ + }}2\left( {F_{q,a} \left( k \right) - C_{q,a} \left( k \right)} \right){\text{ + }}2\lambda_{{{\kern 1pt} 1}} \left( {\eta_{p,q}^{\left( r \right)} \left( k \right) - \mu_{q}^{\left( r \right)} \left( k \right)} \right){\text{ + }}\lambda_{3} \left( {{\mathbf{D}}\left( k \right){\mathbf{F}}\left( k \right)} \right)_{q,a} }} \\ \end{aligned} $$
(60)

Since Eq. (50) is an auxiliary function, \( \tilde{J}_{{F_{q,a} \left( k \right)}} \) is non-increasing under this iteration rule in Eq. (60) according to Lemma 4.

According to the above derivation and proof, the update rules of \( {\mathbf{Z}}\left( k \right) \), \( {\mathbf{A}}\left( k \right) \) and \( {\mathbf{F}}\left( k \right) \) in Eqs. (18), (21) and (32) result in a series of non-increasing values of \( J_{DRNMF\_SLC} \) and hence a local minimum.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tong, M., Chen, Y., Zhao, M. et al. A deep discriminative and robust nonnegative matrix factorization network method with soft label constraint. Neural Comput & Applic 31, 7447–7475 (2019). https://doi.org/10.1007/s00521-018-3554-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-018-3554-6

Keywords

Navigation