A deep discriminative and robust nonnegative matrix factorization network method with soft label constraint

Tong, Ming; Chen, Yiran; Zhao, Mengao; Bu, Haili; Xi, Shengnan

doi:10.1007/s00521-018-3554-6

A deep discriminative and robust nonnegative matrix factorization network method with soft label constraint

Original
Published: 21 June 2018

Volume 31, pages 7447–7475, (2019)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Ming Tong¹,
Yiran Chen¹,
Mengao Zhao¹,
Haili Bu¹ &
…
Shengnan Xi¹

500 Accesses
12 Citations
Explore all metrics

Abstract

In order to obtain a discriminative, compact and robust data representation, a discriminative and robust nonnegative matrix factorization method with soft label constraint (DRNMF_SLC) is proposed. By minimizing the objective function, the data representation after learning soft label constraint is obtained. To further acquire a more hierarchical and discriminative data representation, a deep discriminative and robust nonnegative matrix factorization network method with soft label constraint (Deep DRNMFN_SLC) is constructed. In order to improve the feature expression ability of deep neural network (DNN), a deep discriminative and robust nonnegative matrix factorization network method with soft label constraint based on DNN (Deep DRNMFN_SLC_DNN) is proposed, which could obtain a more discriminative, robust and generalized feature representation, and meanwhile greatly reduce the dimension of data features. Furthermore, the objective function of DRNMF_SLC is constructed by introducing both the global loss function and the central loss function of soft label constraint matrix, and the optimization solution and convergence proof of objective function are given simultaneously. When the proposed DRNMF_SLC method and Deep DRNMFN_SLC_DNN method are, respectively, applied to the face recognition under occlusions and illumination variations, the frameworks, Algorithm 1 and Algorithm 2 are given. The extensive and adequate experiments demonstrate the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 9

Supervised and Constrained Nonnegative Matrix Factorization with Sparseness for Image Representation

Article 05 February 2018

Supervised Non-negative Matrix Factorization Induced by Huber Loss

Nonnegative Feature Learning by Regularized Nonnegative Matrix Factorization

References

Sun Y, Mao H, Sang Y et al (2017) Explicit guiding auto-encoders for learning meaningful representation. Neural Comput Appl 28(3):429–436
Article Google Scholar
Belhumeur PN, Hespanha JP, Kriegman DJ (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Int 19(7):711–720
Article Google Scholar
He J, Bi Y, Ding L et al (2017) Unsupervised feature selection based on decision graph. Neural Comput Appl 28(10):3047–3059
Article Google Scholar
Li Z, Liu J, Yang Y et al (2014) Clustering-guided sparse structural learning for unsupervised feature selection. IEEE Trans Know Data Eng 26(9):2138–2150
Article Google Scholar
Yan H, Yang J (2015) Sparse discriminative feature selection. Pattern Recognit 48(5):1827–1835
Article Google Scholar
Li Z, Liu J, Tang J et al (2015) Robust structured subspace learning for data representation. IEEE Trans Pattern Anal Mach Int 37(10):2085–2098
Article Google Scholar
Feng Y, Xiao J, Zhou K et al (2015) A locally weighted sparse graph regularized non-negative matrix factorization method. Neurocomputing 169:68–76
Article Google Scholar
Pang Y, Wang S, Yuan Y (2014) Learning regularized LDA by clustering. IEEE Trans Neural Netw Learn Syst 25(12):2191–2201
Article Google Scholar
He X, Niyogi P (2004) Locality preserving projections. In: Advances in neural information processing systems pp 153–160
Zhang H, Wu QMJ, Chow TWS et al (2012) A two-dimensional neighborhood preserving projection for appearance-based face recognition. Pattern Recognit 45(5):1866–1876
Article Google Scholar
Yan S, Xu D, Zhang B et al (2007) Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Trans Pattern Anal Mach Int 29(1):40–51
Article Google Scholar
Tenenbaum JB, Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323
Article Google Scholar
Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396
Article Google Scholar
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Article Google Scholar
Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791
Article Google Scholar
Li SZ, Hou XW, Zhang HJ et al (2001) Learning spatially localized, parts-based representation. In: IEEE conference on computer vision and pattern recognition, pp 207–212
Pascual-Montano A, Carazo JM, Kochi K et al (2006) Nonsmooth nonnegative matrix factorization (nsNMF). IEEE Trans Pattern Anal Mach Int 28(3):403–415
Article Google Scholar
Cai D, He X, Han J et al (2011) Graph regularized nonnegative matrix factorization for data representation. IEEE Trans Pattern Anal Mach Int 33(8):1548–1560
Article Google Scholar
Wang Y, Jia Y, Hu C et al (2004) Fisher non-negative matrix factorization for learning local features. In: Proceedings of Asian conference on computer vision, pp 27–30
Zafeiriou S, Tefas A, Buciu I et al (2006) Exploiting discriminant information in nonnegative matrix factorization with application to frontal face verification. Neural Netw 17(3):683–695
Article Google Scholar
Liu H, Wu Z, Li X et al (2012) Constrained nonnegative matrix factorization for image representation. IEEE Trans Pattern Anal Mach Int 34(7):1299–1311
Article Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. In: ImageNet challenge, pp 1–10
Schroff F, Kalenichenko D, Philbin J (2015) FaceNet: a unified embedding for face recognition and clustering. In: IEEE conference on computer vision and pattern recognition, pp 815–823
Sun Y, Chen Y, Wang X et al (2014) Deep learning face representation by joint identification-verification. In: Advances in neural information processing systems, pp 1988–1996
Sun Y, Wang X, Tang X (2015) Deeply learned face representations are sparse, selective, and robust. In: IEEE conference on computer vision and pattern recognition, pp 2892–2900
Sun Y, Liang D, Wang X et al (2015) Deepid3: Face recognition with very deep neural networks. arXiv:1502.00873
Szegedy C, Liu W, Jia Y et al (2015) Going deeper with convolutions. In: IEEE conference on computer vision and pattern recognition, pp 1–9
Parkhi OM, Vedaldi A, Zisserman A (2015) Deep face recognition. Proc Br Mach Vis Conf 1(3):6
Google Scholar
Yue K, Xu F, Yu J (2017) Shallow and wide fractional max-pooling network for image classification. Neural Comput Appl. https://doi.org/10.1007/s00521-017-3073-x
Zhang H, Cao X, Ho JKL et al (2017) Object-level video advertising: an optimization framework. IEEE Trans Ind Inf 13(2):520–531
Article Google Scholar
Mehdipour Ghazi M, Kemal Ekenel H (2016) A comprehensive analysis of deep learning based representation for face recognition. In: IEEE conference on computer vision and pattern recognition, pp 34–41
Srivastava N, Hinton G, Krizhevsky A et al (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
MathSciNet MATH Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Google Scholar
Zheng W (2014) Multi-view facial expression recognition based on group sparse reduced-rank regression. IEEE Trans Affect Comput 5(1):71–85
Article Google Scholar
Li J, Zhao J, Zhao F et al (2016) Robust face recognition with deep multi-view representation learning. In: Proceedings of the 2016 ACM on multimedia conference, pp 1068–1072
Wu F, Jing XY, You X et al (2016) Multi-view low-rank dictionary learning for image classification. Pattern Recognit 50:143–154
Article Google Scholar
Song HA, Kim BK, Xuan TL et al (2015) Hierarchical feature extraction by multi-layer non-negative matrix factorization network for classification task. Neurocomputing 165:63–74
Article Google Scholar
Huang GB, Lee H, Learned-Miller E (2012) Learning hierarchical representations for face verification with convolutional deep belief networks. In: IEEE conference on computer vision and pattern recognition, pp 2518–2525
Trigeorgis G, Bousmalis K, Zafeiriou S et al (2017) A deep matrix factorization method for learning attribute representations. IEEE Trans Pattern Anal Mach Int 39(3):417–429
Article Google Scholar
Ouyang W, Wang X (2012) A discriminative deep model for pedestrian detection with occlusion handling. In: IEEE conference on computer vision and pattern recognition, pp 3258–3265
Chan TH, Jia K, Gao S et al (2015) PCANet: a simple deep learning baseline for image classification? IEEE Trans Image Process 24(12):5017–5032
Article MathSciNet Google Scholar
Zhen L, Yi D, Li SZ (2016) Learning stacked image descriptor for face recognition. IEEE Trans Circuits Syst Video Technol 26(9):1685–1696
Article Google Scholar
Hosseini-Asl E, Zurada JM, Nasraoui O (2016) Deep learning of part-based representation of data using sparse autoencoders with nonnegativity constraints. IEEE Trans Neural Netw Learn Syst 27(12):2486–2498
Article Google Scholar
Babenko A, Slesarev1 A, Chigorin A et al (2014) Neural codes for image retrieval. In: Proceedings of European conference on computer vision, pp 584–599
Yang S, Luo P, Loy CC et al (2015) Deep representation learning with target coding. In: AAAI, pp 3848–3854
Cao Y, Long M, Wang J et al (2016) Deep quantization network for efficient image retrieval. In: AAAI, pp 3457–3463
Gui L, Morency LP (2015) Learning and transferring deep ConvNet representations with group-sparse factorization. In: International conference on computer vision
Martinez AR, Benavente R (1998) The AR face database. CVC technical report 24, Barcelona, Spain
Samaria FS, Harter AC (1994) Parameterisation of a stochastic model for human face identification. In: Proceedings of the second IEEE workshop on applications of computer vision, pp 138–142
Sim T, Baker S, Bsat M (2002) The CMU pose, illumination, and expression (PIE) database. In: Fifth IEEE international conference on automatic face and gesture recognition
Georghiades AS, Belhumeur PN, Kriegman DJ (2001) From few to many: illumination cone models for face recognition under variable lighting and pose. IEEE Trans Pattern Anal Mach Int 23(6):643–660
Article Google Scholar
Zhang R, Hu Z, Pan G et al (2016) Robust discriminative non-negative matrix factorization. Neurocomputing 173:552–561
Article Google Scholar
Jia Y, Shelhamer E, Donahue J et al (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia, pp 675–678
Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605
MATH Google Scholar

Download references

Acknowledgements

This work was supported partially by National Natural Science Foundation of China (Grant No. 61072110), Shaanxi Province key project of Research and Development Plan (S2018-YF-ZDGY-0187), International Cooperation Project of Shaanxi Province (S2018-YF-GHMS-0061) and International Cooperation Project of Shaanxi Province (2016KW-042).

Author information

Authors and Affiliations

School of Electronic Engineering, Xidian University, Xi’an, 710071, China
Ming Tong, Yiran Chen, Mengao Zhao, Haili Bu & Shengnan Xi

Authors

Ming Tong
View author publications
You can also search for this author in PubMed Google Scholar
Yiran Chen
View author publications
You can also search for this author in PubMed Google Scholar
Mengao Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Haili Bu
View author publications
You can also search for this author in PubMed Google Scholar
Shengnan Xi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ming Tong.

Ethics declarations

Conflict of interest

All the authors of the manuscript declared that there are no potential conflicts of interest.

Informed consent

All the authors of the manuscript declared that there is no material that required informed consent.

Human and animal rights

All the authors of the manuscript declared that there is no research involving human participants and/or animal.

Appendix: Proof of Theorem 1

To prove Theorem 1, the following characteristic of an auxiliary function is used, which is the same as used in the expectation–maximization (EM) algorithm.

Lemma 1

If there is an auxiliary function $ G $ for $ \tilde{J}\left( x \right) $, which satisfies the conditions of $ G\left( {x,x{\kern 1pt}^{t} } \right) \ge \tilde{J}\left( x \right) $ and $ G\left( {x,x} \right)\text{ = }\tilde{J}\left( x \right) $, then $ \tilde{J}\left( x \right) $ is non-increasing under the update

$$ x{\kern 1pt}^{t + 1} \text{ = }\arg \min_{x} G\left( {x,x{\kern 1pt}^{t} } \right) $$

(42)

The equality $ \tilde{J}\left( {x^{{{\kern 1pt} t + 1}} } \right)\text{ = }\tilde{J}\left( {x^{{{\kern 1pt} t}} } \right) $ holds only if $ x{\kern 1pt}^{t} $ is a local minimum of $ G\left( {x,x^{{{\kern 1pt} t}} } \right) $. By iterating the update in Eq. (42), a sequence of estimates that converge to a local minimum $ x_{\hbox{min} } \text{ = }\arg \min_{x} \tilde{J}\left( x \right) $ is obtained, which will be shown by defining a proper auxiliary function for the objective function in Eq. (13).

First, the convergence of update rule in Eq. (21) is proven. For any an element $ A_{j,q} \left( k \right) $ in $ {\mathbf{A}}\left( k \right) $, let $ \tilde{J}_{{A_{j,q} \left( k \right)}} $ denote the part of $ J_{DRNMF\_SLC} $ relevant to $ A_{j,q} \left( k \right) $. Since the update is element wise essentially, it is sufficient to reveal that each $ \tilde{J}_{{A_{j,q} \left( k \right)}} $ is non-increasing under the update rule of Eq. (21), which is proven by defining the auxiliary function with regard to $ A_{j,q} \left( k \right) $ as follows:

Lemma 2

Let $ \tilde{J}^{\prime} $ represent the first-order derivative with regard to $ {\mathbf{A}}\left( k \right) $. The function

$$ \begin{aligned} G\left( {A\left( k \right),A_{j,q}^{t} \left( k \right)} \right)\text{ = }\tilde{J}_{{A_{j,q} \left( k \right)}} \left( {A_{j,q}^{t} \left( k \right)} \right)\text{ + }\tilde{J}^{\prime}_{{A_{j,q} \left( k \right)}} \left( {A_{j,q}^{t} \left( k \right)} \right)\left( {A\left( k \right) - A_{j,q}^{t} \left( k \right)} \right) \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \text{ + }\frac{{\left( {{\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right){\mathbf{A}}\left( k \right){\mathbf{F}}\left( k \right){\mathbf{F}}^{T} \left( k \right)} \right)_{j,q} }}{{A_{j,q}^{t} \left( k \right)}}\left( {A\left( k \right) - A_{j,q}^{t} \left( k \right)} \right)^{2} \\ \end{aligned} $$

(43)

is an auxiliary function for $ \tilde{J}_{{A_{j,q} \left( k \right)}} $, which is the part of $ J_{DRNMF\_SLC} $ that is only related to $ A_{j,q} \left( k \right) $.

Proof

Obviously, $ G\left( {A\left( k \right),A\left( k \right)} \right)\text{ = }\tilde{J}_{{A_{j,q} \left( k \right)}} \left( {A\left( k \right)} \right) $. Based on the definition of auxiliary function, only $ G\left( {A\left( k \right),A_{j,q}^{t} \left( k \right)} \right) \ge \tilde{J}_{{A_{j,q} \left( k \right)}} \left( {A\left( k \right)} \right) $ needs to be demonstrated. In order to achieve this purpose, $ G\left( {A\left( k \right),A_{j,q}^{t} \left( k \right)} \right) $ in Eq. (43) is compared with the Taylor series expansion of $ \tilde{J}_{{A_{j,q} \left( k \right)}} \left( {A\left( k \right)} \right) $:

$$ \tilde{J}_{{A_{j,q} \left( k \right)}} \left( {A\left( k \right)} \right)\text{ = }\tilde{J}_{{A_{j,q} \left( k \right)}} \left( {A_{j,q}^{t} \left( k \right)} \right)\text{ + }\tilde{J}^{\prime}_{{A_{j,q} \left( k \right)}} \left( {A\left( k \right) - A_{j,q}^{t} \left( k \right)} \right)\text{ + }\frac{1}{2}\tilde{J}^{\prime\prime}_{{A_{j,q} \left( k \right)}} \left( {A\left( k \right) - A_{j,q}^{t} \left( k \right)} \right)^{2} $$

(44)

where $ \tilde{J}^{\prime\prime} $ is the second-order derivative regarding $ {\mathbf{A}}\left( k \right) $. It is simple to check that

$$ \tilde{J}^{\prime}_{{A_{j,q} \left( k \right)}} \text{ = }\left( {\frac{{\partial J_{DRNMF\_SLC} }}{{\partial {\mathbf{A}}\left( k \right)}}} \right)_{j,q} \text{ = }\left( { - 2{\mathbf{Z}}^{T} \left( k \right){\mathbf{B}}\left( k \right){\mathbf{F}}^{T} \left( k \right)\text{ + }2{\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right){\mathbf{A}}\left( k \right){\mathbf{F}}\left( k \right){\mathbf{F}}^{{{\kern 1pt} T}} \left( k \right)} \right)_{j,q} $$

(45)

$$ \tilde{J}^{\prime\prime}_{{A_{j,q} \left( k \right)}} \text{ = }2\left( {{\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right)} \right)_{j,j} \left( {{\mathbf{F}}\left( k \right){\mathbf{F}}^{T} \left( k \right)} \right)_{q,q} $$

(46)

Substituting Eq. (46) into Eq. (44) and comparing with Eq. (43), it can be seen that, instead of demonstrating $ G\left( {A\left( k \right),A_{j,q}^{t} \left( k \right)} \right) \ge \tilde{J}_{{A_{j,q} \left( k \right)}} \left( {A\left( k \right)} \right) $, it is equivalent to prove

$$ \frac{{\left( {{\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right){\mathbf{A}}\left( k \right){\mathbf{F}}\left( k \right){\mathbf{F}}^{T} \left( k \right)} \right)_{j,q} }}{{A_{j,q}^{t} \left( k \right)}} \ge \frac{1}{2}\tilde{J}^{\prime\prime}_{{A_{j,q} \left( k \right)}} \text{ = }\left( {{\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right)} \right)_{j,j} \left( {{\mathbf{F}}\left( k \right){\mathbf{F}}^{T} \left( k \right)} \right)_{q,q} $$

(47)

To prove the above inequality, the following inequality holds as:

$$ \begin{aligned} \left( {{\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right){\mathbf{A}}\left( k \right){\mathbf{F}}\left( k \right){\mathbf{F}}^{T} \left( k \right)} \right)_{j,q} \text{ = }\sum\limits_{l} {\left( {{\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right){\mathbf{A}}\left( k \right)} \right)_{j,l} \left( {{\mathbf{F}}\left( k \right){\mathbf{F}}^{T} \left( k \right)} \right)_{l,q} } \\ \ge \left( {{\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right){\mathbf{A}}\left( k \right)} \right)_{j,q} \left( {{\mathbf{F}}\left( k \right){\mathbf{F}}^{T} \left( k \right)} \right)_{q,q} \\ \ge \sum\limits_{l} {\left( {{\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right)} \right)_{j,l} A_{l,q}^{t} \left( k \right)\left( {{\mathbf{F}}\left( k \right){\mathbf{F}}^{T} \left( k \right)} \right)_{q,q} } \\ \ge A_{j,q}^{t} \left( k \right)\left( {{\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right)} \right)_{j,j} \left( {{\mathbf{F}}\left( k \right){\mathbf{F}}^{T} \left( k \right)} \right)_{q,q} \\ \end{aligned} $$

(48)

In summary, the statement $ G\left( {A\left( k \right),A_{j,q}^{t} \left( k \right)} \right) \ge \tilde{J}_{{A_{j,q} \left( k \right)}} \left( {A\left( k \right)} \right) $ holds, and Lemma 2 is proved.

Subsequently, an auxiliary function is defined for the update rule in Eq. (18). Similarly, let $ \tilde{J}_{{Z_{i,j} \left( k \right)}} $ represent the part of $ J_{DRNMF\_SLC} $ related to $ Z_{i,j} \left( k \right) $. Then, the auxiliary function relevant to $ Z_{i,j} \left( k \right) $ is defined as follows:

Lemma 3

The function

$$ \begin{aligned} G\left( {Z\left( k \right),Z_{i,j}^{t} \left( k \right)} \right)\text{ = }\tilde{J}_{{Z_{i,j} \left( k \right)}} \left( {Z_{i,j}^{t} \left( k \right)} \right)\text{ + }\tilde{J}^{\prime}_{{Z_{i,j} \left( k \right)}} \left( {Z_{i,j}^{t} \left( k \right)} \right)\left( {Z\left( k \right) - Z_{i,j}^{t} \left( k \right)} \right) \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \text{ + }\frac{{\left( {{\mathbf{Z}}\left( k \right){\mathbf{A}}\left( k \right){\mathbf{F}}\left( k \right){\mathbf{F}}^{T} \left( k \right){\mathbf{A}}^{T} \left( k \right)} \right)_{i,j} }}{{Z_{i,j}^{t} \left( k \right)}}\left( {Z\left( k \right) - Z_{i,j}^{t} \left( k \right)} \right)^{2} \\ \end{aligned} $$

(49)

is an auxiliary function for $ \tilde{J}_{{Z_{i,j} \left( k \right)}} $, which is the part of $ J_{DRNMF\_SLC} $ that is only related to $ Z_{i,j} \left( k \right) $.

The proof of Lemma 3 is similar to the proof of Lemma 2 essentially and is omitted due to space limitation.

Lemma 4

The function

$$ \begin{aligned} G\left( {F\left( k \right),F_{q,a}^{t} \left( k \right)} \right)\text{ = }\tilde{J}_{{F_{q,a} \left( k \right)}} \left( {F_{q,a}^{t} \left( k \right)} \right)\text{ + }\tilde{J}^{\prime}_{{F_{q,a} \left( k \right)}} \left( {F_{q,a}^{t} \left( k \right)} \right)\left( {F\left( k \right) - F_{q,a}^{t} \left( k \right)} \right) \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\text{ + }}\frac{{\left( {{\mathbf{A}}^{T} \left( k \right){\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right){\mathbf{A}}\left( k \right){\mathbf{F}}\left( k \right)} \right)_{q,a} {\text{ + }}\left( {{\mathbf{F}}\left( k \right)} \right)_{q,a} + \lambda_{{{\kern 1pt} 1}} \left( {\eta_{p,q}^{\left( r \right)} \left( k \right)} \right){\text{ + }}\frac{{\lambda_{3} }}{2}\left( {{\mathbf{D}}\left( k \right){\mathbf{F}}\left( k \right)} \right)_{q,a} }}{{F_{q,a}^{t} \left( k \right)}}\left( {F\left( k \right) - F_{q,a}^{t} \left( k \right)} \right)^{2} \\ \end{aligned} $$

(50)

is an auxiliary function for $ \tilde{J}_{{F_{q,a} \left( k \right)}} $, which is the part of $ J_{DRNMF\_SLC} $ that is only related to $ F_{q,a} \left( k \right) $.

Proof

Obviously, $ G\left( {F\left( k \right),F\left( k \right)} \right)\text{ = }\tilde{J}_{{F_{q,a} \left( k \right)}} \left( {F\left( k \right)} \right) $. Based on the definition of auxiliary function, only $ G\left( {F\left( k \right),F_{q,a}^{{{\kern 1pt} t}} \left( k \right)} \right) \ge \tilde{J}_{{F_{q,a} \left( k \right)}} \left( {F\left( k \right)} \right) $ needs to be proven. In order achieve this goal, $ G\left( {F\left( k \right),F_{q,a}^{{{\kern 1pt} t}} \left( k \right)} \right) $ in Eq. (50) is compared with the Taylor series expansion of $ \tilde{J}_{{F_{q,a} \left( k \right)}} \left( {F\left( k \right)} \right) $:

$$ \tilde{J}_{{F_{q,a} \left( k \right)}} \left( {F\left( k \right)} \right)\text{ = }\tilde{J}_{{F_{q,a} \left( k \right)}} \left( {F_{q,a}^{t} \left( k \right)} \right)\text{ + }\tilde{J}^{\prime}_{{F_{q,a} \left( k \right)}} \left( {F\left( k \right) - F_{q,a}^{t} \left( k \right)} \right)\text{ + }\frac{1}{2}\tilde{J}^{\prime\prime}_{{F_{q,a} \left( k \right)}} \left( {F\left( k \right) - F_{q,a}^{t} \left( k \right)} \right)^{2} $$

(51)

where

$$ \begin{aligned} \tilde{J}^{\prime}_{{F_{q,a} \left( k \right)}} \text{ = }\left( { - 2{\mathbf{A}}^{T} \left( k \right){\mathbf{Z}}^{T} \left( k \right){\mathbf{B}}\left( k \right)\text{ + }2{\mathbf{A}}^{T} \left( k \right){\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right){\mathbf{A}}\left( k \right){\mathbf{F}}\left( k \right)} \right)_{q,a} \text{ + }2\left( {F_{q,a} \left( k \right) - C_{q,a} \left( k \right)} \right) \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \text{ + }2\lambda_{{{\kern 1pt} 1}} \left( {\eta_{p,q}^{\left( r \right)} \left( k \right) - \mu_{q}^{\left( r \right)} \left( k \right)} \right) - \frac{{4\lambda_{2} }}{{N_{r} }}\sum\limits_{i \ne r}^{c} {\left( {\mu_{q}^{\left( r \right)} \left( k \right) - \mu_{q}^{\left( i \right)} \left( k \right)} \right)} \text{ + }\lambda_{3} \left( {{\mathbf{D}}\left( k \right){\mathbf{F}}\left( k \right)} \right)_{q,a} \\ \end{aligned} $$

(52)

$$ \tilde{J}^{\prime\prime}_{{F_{q,a} \left( k \right)}} \text{ = }2\left( {{\mathbf{A}}^{T} \left( k \right){\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right){\mathbf{A}}\left( k \right)} \right)_{q,q} \text{ + }2\text{ + }2\lambda_{{{\kern 1pt} 1}} - \frac{{2\lambda_{1} }}{{N_{r} }} - \frac{{4\lambda_{2} \left( {c - 1} \right)}}{{N_{r}^{2} }}\text{ + }\lambda_{3} \left( {{\mathbf{D}}\left( k \right)} \right)_{q,q} $$

(53)

Instead of demonstrating $ G\left( {F\left( k \right),F_{q,a}^{t} \left( k \right)} \right) \ge \tilde{J}_{{F_{q,a} \left( k \right)}} \left( {F\left( k \right)} \right) $, it is equivalent to prove

$$ \frac{{\left( {{\mathbf{A}}^{T} \left( k \right){\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right){\mathbf{A}}\left( k \right){\mathbf{F}}\left( k \right)} \right)_{q,a} {\text{ + }}\left( {{\mathbf{F}}\left( k \right)} \right)_{q,a} {\text{ + }}\lambda_{{{\kern 1pt} 1}} \left( {\eta_{p,q}^{\left( r \right)} \left( k \right)} \right) + \frac{{\lambda_{3} }}{2}\left( {{\mathbf{D}}\left( k \right){\mathbf{F}}\left( k \right)} \right)_{q,a} }}{{F_{q,a}^{t} \left( k \right)}} \ge \frac{1}{2}\tilde{J}^{\prime\prime}_{{F_{q,a} \left( k \right)}} $$

(54)

To prove above inequality, we have

$$ \begin{aligned} \left( {{\mathbf{A}}^{T} \left( k \right){\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right){\mathbf{A}}\left( k \right){\mathbf{F}}\left( k \right)} \right)_{q,a} \text{ = }\sum\limits_{l} {\left( {{\mathbf{A}}^{T} \left( k \right){\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right){\mathbf{A}}\left( k \right)} \right)_{q,l} } \left( {{\mathbf{F}}\left( k \right)} \right)_{l,a} \\ \ge \left( {{\mathbf{A}}^{T} \left( k \right){\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right){\mathbf{A}}\left( k \right)} \right)_{q,q} \left( {{\mathbf{F}}\left( k \right)} \right)_{q,a} \\ \end{aligned} $$

(55)

$$ \begin{aligned} \left( {{\mathbf{D}}\left( k \right){\mathbf{F}}\left( k \right)} \right)_{q,a} \text{ = }\sum\limits_{l} {\left( {{\mathbf{D}}\left( k \right)} \right)_{q,l} } \left( {{\mathbf{F}}\left( k \right)} \right)_{l,a} \\ \ge \left( {{\mathbf{D}}\left( k \right)} \right)_{q,q} \left( {{\mathbf{F}}\left( k \right)} \right)_{q,a} \\ \end{aligned} $$

(56)

Thus, the following inequality holds:

$$ \frac{{\left( {{\mathbf{A}}^{T} \left( k \right){\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right){\mathbf{A}}\left( k \right){\mathbf{F}}\left( k \right)} \right)_{q,a} {\text{ + }}\left( {{\mathbf{F}}\left( k \right)} \right)_{q,a} {\text{ + }}\lambda_{{{\kern 1pt} 1}} \left( {\eta_{p,q}^{\left( r \right)} \left( k \right)} \right){\text{ + }}\frac{{\lambda_{3} }}{2}\left( {{\mathbf{D}}\left( k \right){\mathbf{F}}\left( k \right)} \right)_{q,a} }}{{F_{q,a}^{t} \left( k \right)}} \ge \frac{1}{2}\tilde{J}^{\prime\prime}_{{F_{q,a} \left( k \right)}} $$

(57)

As a result, the statement $ G\left( {F\left( k \right),F_{q,a}^{t} \left( k \right)} \right) \ge \tilde{J}_{{F_{q,a} \left( k \right)}} \left( {F\left( k \right)} \right) $ holds, and Lemma 4 is proved.

With the above lemmas, the proof of Theorem 1 is further given.

Proof of Theorem 1

Substituting $ G\left( {A\left( k \right),A_{j,q}^{t} \left( k \right)} \right) $ of Eq. (43) into Eq. (42), the following equation is obtained:

$$ A_{j,q}^{t + 1} \left( k \right) = \text{arg}\min_{A\left( k \right)} G\left( {A\left( k \right),A_{j,q}^{t} \left( k \right)} \right)\text{ = }A_{j,q}^{t} \left( k \right)\frac{{\left( {{\mathbf{Z}}^{T} \left( k \right){\mathbf{B}}\left( k \right){\mathbf{F}}^{T} \left( k \right)} \right)_{j,q} }}{{\left( {{\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right){\mathbf{A}}\left( k \right){\mathbf{F}}\left( k \right){\mathbf{F}}^{T} \left( k \right)} \right)_{j,q} }} $$

(58)

Since Eq. (43) is an auxiliary function, $ \tilde{J}_{{A_{j,q} \left( k \right)}} $ is non-increasing under this iteration rule in Eq. (58), according to Lemma 2.

Then, substituting $ G\left( {Z\left( k \right),Z_{i,j}^{t} \left( k \right)} \right) $ of Eq. (49) into Eq. (42), the following equation is obtained:

$$ Z_{i,j}^{t + 1} \left( k \right)\text{ = arg}\min_{Z\left( k \right)} G\left( {Z\left( k \right),Z_{i,j}^{t} \left( k \right)} \right)\text{ = }Z_{i,j}^{t} \left( k \right)\frac{{\left( {{\mathbf{B}}\left( k \right){\mathbf{F}}^{T} \left( k \right){\mathbf{A}}^{T} \left( k \right)} \right)_{i,j} }}{{\left( {{\mathbf{Z}}\left( k \right){\mathbf{A}}\left( k \right){\mathbf{F}}\left( k \right){\mathbf{F}}^{T} \left( k \right){\mathbf{A}}^{T} \left( k \right)} \right)_{i,j} }} $$

(59)

Since Eq. (49) is an auxiliary function, $ \tilde{J}_{{Z_{i,j} \left( k \right)}} $ is non-increasing under this iteration rule in Eq. (59), according to Lemma 3.

Similarly, substituting $ G\left( {F\left( k \right),F_{q,a}^{t} \left( k \right)} \right) $ of Eq. (50) into Eq. (42), the following equation is obtained:

$$ \begin{aligned} F_{q,a}^{t + 1} \left( k \right)\text{ = arg}\min_{F\left( k \right)} G\left( {F\left( k \right),F_{q,a}^{t} \left( k \right)} \right) \\ \text{ = }F_{q,a}^{t} \left( k \right)\frac{{\left( {2{\mathbf{A}}^{T} \left( k \right){\mathbf{Z}}^{T} \left( k \right){\mathbf{B}}\left( k \right)} \right)_{q,a} {\text{ + }}\frac{{4\lambda_{2} }}{{N_{r} }}\sum\limits_{i \ne r}^{c} {\left( {\mu_{q}^{\left( r \right)} \left( k \right) - \mu_{q}^{\left( i \right)} \left( k \right)} \right)} }}{{\left( {2{\mathbf{A}}^{T} \left( k \right){\mathbf{Z}}^{T} \left( k \right){\mathbf{Z}}\left( k \right){\mathbf{A}}\left( k \right){\mathbf{F}}\left( k \right)} \right)_{q,a} {\text{ + }}2\left( {F_{q,a} \left( k \right) - C_{q,a} \left( k \right)} \right){\text{ + }}2\lambda_{{{\kern 1pt} 1}} \left( {\eta_{p,q}^{\left( r \right)} \left( k \right) - \mu_{q}^{\left( r \right)} \left( k \right)} \right){\text{ + }}\lambda_{3} \left( {{\mathbf{D}}\left( k \right){\mathbf{F}}\left( k \right)} \right)_{q,a} }} \\ \end{aligned} $$

(60)

Since Eq. (50) is an auxiliary function, $ \tilde{J}_{{F_{q,a} \left( k \right)}} $ is non-increasing under this iteration rule in Eq. (60) according to Lemma 4.

According to the above derivation and proof, the update rules of $ {\mathbf{Z}}\left( k \right) $, $ {\mathbf{A}}\left( k \right) $ and $ {\mathbf{F}}\left( k \right) $ in Eqs. (18), (21) and (32) result in a series of non-increasing values of $ J_{DRNMF\_SLC} $ and hence a local minimum.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tong, M., Chen, Y., Zhao, M. et al. A deep discriminative and robust nonnegative matrix factorization network method with soft label constraint. Neural Comput & Applic 31, 7447–7475 (2019). https://doi.org/10.1007/s00521-018-3554-6

Download citation

Received: 08 January 2018
Accepted: 16 May 2018
Published: 21 June 2018
Issue Date: November 2019
DOI: https://doi.org/10.1007/s00521-018-3554-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A deep discriminative and robust nonnegative matrix factorization network method with soft label constraint

Abstract

Access this article

Similar content being viewed by others

Supervised and Constrained Nonnegative Matrix Factorization with Sparseness for Image Representation

Supervised Non-negative Matrix Factorization Induced by Huber Loss

Nonnegative Feature Learning by Regularized Nonnegative Matrix Factorization

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Informed consent

Human and animal rights

Appendix: Proof of Theorem 1

Lemma 1

Lemma 2

Proof

Lemma 3

Lemma 4

Proof

Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A deep discriminative and robust nonnegative matrix factorization network method with soft label constraint

Abstract

Access this article

Similar content being viewed by others

Supervised and Constrained Nonnegative Matrix Factorization with Sparseness for Image Representation

Supervised Non-negative Matrix Factorization Induced by Huber Loss

Nonnegative Feature Learning by Regularized Nonnegative Matrix Factorization

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Informed consent

Human and animal rights

Appendix: Proof of Theorem 1

Appendix: Proof of Theorem 1

Lemma 1

Lemma 2

Proof

Lemma 3

Lemma 4

Proof

Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation