Abstract
In order to obtain a discriminative, compact and robust data representation, a discriminative and robust nonnegative matrix factorization method with soft label constraint (DRNMF_SLC) is proposed. By minimizing the objective function, the data representation after learning soft label constraint is obtained. To further acquire a more hierarchical and discriminative data representation, a deep discriminative and robust nonnegative matrix factorization network method with soft label constraint (Deep DRNMFN_SLC) is constructed. In order to improve the feature expression ability of deep neural network (DNN), a deep discriminative and robust nonnegative matrix factorization network method with soft label constraint based on DNN (Deep DRNMFN_SLC_DNN) is proposed, which could obtain a more discriminative, robust and generalized feature representation, and meanwhile greatly reduce the dimension of data features. Furthermore, the objective function of DRNMF_SLC is constructed by introducing both the global loss function and the central loss function of soft label constraint matrix, and the optimization solution and convergence proof of objective function are given simultaneously. When the proposed DRNMF_SLC method and Deep DRNMFN_SLC_DNN method are, respectively, applied to the face recognition under occlusions and illumination variations, the frameworks, Algorithm 1 and Algorithm 2 are given. The extensive and adequate experiments demonstrate the effectiveness of the proposed method.
Similar content being viewed by others
References
Sun Y, Mao H, Sang Y et al (2017) Explicit guiding auto-encoders for learning meaningful representation. Neural Comput Appl 28(3):429–436
Belhumeur PN, Hespanha JP, Kriegman DJ (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Int 19(7):711–720
He J, Bi Y, Ding L et al (2017) Unsupervised feature selection based on decision graph. Neural Comput Appl 28(10):3047–3059
Li Z, Liu J, Yang Y et al (2014) Clustering-guided sparse structural learning for unsupervised feature selection. IEEE Trans Know Data Eng 26(9):2138–2150
Yan H, Yang J (2015) Sparse discriminative feature selection. Pattern Recognit 48(5):1827–1835
Li Z, Liu J, Tang J et al (2015) Robust structured subspace learning for data representation. IEEE Trans Pattern Anal Mach Int 37(10):2085–2098
Feng Y, Xiao J, Zhou K et al (2015) A locally weighted sparse graph regularized non-negative matrix factorization method. Neurocomputing 169:68–76
Pang Y, Wang S, Yuan Y (2014) Learning regularized LDA by clustering. IEEE Trans Neural Netw Learn Syst 25(12):2191–2201
He X, Niyogi P (2004) Locality preserving projections. In: Advances in neural information processing systems pp 153–160
Zhang H, Wu QMJ, Chow TWS et al (2012) A two-dimensional neighborhood preserving projection for appearance-based face recognition. Pattern Recognit 45(5):1866–1876
Yan S, Xu D, Zhang B et al (2007) Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Trans Pattern Anal Mach Int 29(1):40–51
Tenenbaum JB, Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323
Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791
Li SZ, Hou XW, Zhang HJ et al (2001) Learning spatially localized, parts-based representation. In: IEEE conference on computer vision and pattern recognition, pp 207–212
Pascual-Montano A, Carazo JM, Kochi K et al (2006) Nonsmooth nonnegative matrix factorization (nsNMF). IEEE Trans Pattern Anal Mach Int 28(3):403–415
Cai D, He X, Han J et al (2011) Graph regularized nonnegative matrix factorization for data representation. IEEE Trans Pattern Anal Mach Int 33(8):1548–1560
Wang Y, Jia Y, Hu C et al (2004) Fisher non-negative matrix factorization for learning local features. In: Proceedings of Asian conference on computer vision, pp 27–30
Zafeiriou S, Tefas A, Buciu I et al (2006) Exploiting discriminant information in nonnegative matrix factorization with application to frontal face verification. Neural Netw 17(3):683–695
Liu H, Wu Z, Li X et al (2012) Constrained nonnegative matrix factorization for image representation. IEEE Trans Pattern Anal Mach Int 34(7):1299–1311
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. In: ImageNet challenge, pp 1–10
Schroff F, Kalenichenko D, Philbin J (2015) FaceNet: a unified embedding for face recognition and clustering. In: IEEE conference on computer vision and pattern recognition, pp 815–823
Sun Y, Chen Y, Wang X et al (2014) Deep learning face representation by joint identification-verification. In: Advances in neural information processing systems, pp 1988–1996
Sun Y, Wang X, Tang X (2015) Deeply learned face representations are sparse, selective, and robust. In: IEEE conference on computer vision and pattern recognition, pp 2892–2900
Sun Y, Liang D, Wang X et al (2015) Deepid3: Face recognition with very deep neural networks. arXiv:1502.00873
Szegedy C, Liu W, Jia Y et al (2015) Going deeper with convolutions. In: IEEE conference on computer vision and pattern recognition, pp 1–9
Parkhi OM, Vedaldi A, Zisserman A (2015) Deep face recognition. Proc Br Mach Vis Conf 1(3):6
Yue K, Xu F, Yu J (2017) Shallow and wide fractional max-pooling network for image classification. Neural Comput Appl. https://doi.org/10.1007/s00521-017-3073-x
Zhang H, Cao X, Ho JKL et al (2017) Object-level video advertising: an optimization framework. IEEE Trans Ind Inf 13(2):520–531
Mehdipour Ghazi M, Kemal Ekenel H (2016) A comprehensive analysis of deep learning based representation for face recognition. In: IEEE conference on computer vision and pattern recognition, pp 34–41
Srivastava N, Hinton G, Krizhevsky A et al (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Zheng W (2014) Multi-view facial expression recognition based on group sparse reduced-rank regression. IEEE Trans Affect Comput 5(1):71–85
Li J, Zhao J, Zhao F et al (2016) Robust face recognition with deep multi-view representation learning. In: Proceedings of the 2016 ACM on multimedia conference, pp 1068–1072
Wu F, Jing XY, You X et al (2016) Multi-view low-rank dictionary learning for image classification. Pattern Recognit 50:143–154
Song HA, Kim BK, Xuan TL et al (2015) Hierarchical feature extraction by multi-layer non-negative matrix factorization network for classification task. Neurocomputing 165:63–74
Huang GB, Lee H, Learned-Miller E (2012) Learning hierarchical representations for face verification with convolutional deep belief networks. In: IEEE conference on computer vision and pattern recognition, pp 2518–2525
Trigeorgis G, Bousmalis K, Zafeiriou S et al (2017) A deep matrix factorization method for learning attribute representations. IEEE Trans Pattern Anal Mach Int 39(3):417–429
Ouyang W, Wang X (2012) A discriminative deep model for pedestrian detection with occlusion handling. In: IEEE conference on computer vision and pattern recognition, pp 3258–3265
Chan TH, Jia K, Gao S et al (2015) PCANet: a simple deep learning baseline for image classification? IEEE Trans Image Process 24(12):5017–5032
Zhen L, Yi D, Li SZ (2016) Learning stacked image descriptor for face recognition. IEEE Trans Circuits Syst Video Technol 26(9):1685–1696
Hosseini-Asl E, Zurada JM, Nasraoui O (2016) Deep learning of part-based representation of data using sparse autoencoders with nonnegativity constraints. IEEE Trans Neural Netw Learn Syst 27(12):2486–2498
Babenko A, Slesarev1 A, Chigorin A et al (2014) Neural codes for image retrieval. In: Proceedings of European conference on computer vision, pp 584–599
Yang S, Luo P, Loy CC et al (2015) Deep representation learning with target coding. In: AAAI, pp 3848–3854
Cao Y, Long M, Wang J et al (2016) Deep quantization network for efficient image retrieval. In: AAAI, pp 3457–3463
Gui L, Morency LP (2015) Learning and transferring deep ConvNet representations with group-sparse factorization. In: International conference on computer vision
Martinez AR, Benavente R (1998) The AR face database. CVC technical report 24, Barcelona, Spain
Samaria FS, Harter AC (1994) Parameterisation of a stochastic model for human face identification. In: Proceedings of the second IEEE workshop on applications of computer vision, pp 138–142
Sim T, Baker S, Bsat M (2002) The CMU pose, illumination, and expression (PIE) database. In: Fifth IEEE international conference on automatic face and gesture recognition
Georghiades AS, Belhumeur PN, Kriegman DJ (2001) From few to many: illumination cone models for face recognition under variable lighting and pose. IEEE Trans Pattern Anal Mach Int 23(6):643–660
Zhang R, Hu Z, Pan G et al (2016) Robust discriminative non-negative matrix factorization. Neurocomputing 173:552–561
Jia Y, Shelhamer E, Donahue J et al (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia, pp 675–678
Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605
Acknowledgements
This work was supported partially by National Natural Science Foundation of China (Grant No. 61072110), Shaanxi Province key project of Research and Development Plan (S2018-YF-ZDGY-0187), International Cooperation Project of Shaanxi Province (S2018-YF-GHMS-0061) and International Cooperation Project of Shaanxi Province (2016KW-042).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All the authors of the manuscript declared that there are no potential conflicts of interest.
Informed consent
All the authors of the manuscript declared that there is no material that required informed consent.
Human and animal rights
All the authors of the manuscript declared that there is no research involving human participants and/or animal.
Appendix: Proof of Theorem 1
Appendix: Proof of Theorem 1
To prove Theorem 1, the following characteristic of an auxiliary function is used, which is the same as used in the expectation–maximization (EM) algorithm.
Lemma 1
If there is an auxiliary function \( G \) for \( \tilde{J}\left( x \right) \), which satisfies the conditions of \( G\left( {x,x{\kern 1pt}^{t} } \right) \ge \tilde{J}\left( x \right) \) and \( G\left( {x,x} \right)\text{ = }\tilde{J}\left( x \right) \), then \( \tilde{J}\left( x \right) \) is non-increasing under the update
The equality \( \tilde{J}\left( {x^{{{\kern 1pt} t + 1}} } \right)\text{ = }\tilde{J}\left( {x^{{{\kern 1pt} t}} } \right) \) holds only if \( x{\kern 1pt}^{t} \) is a local minimum of \( G\left( {x,x^{{{\kern 1pt} t}} } \right) \). By iterating the update in Eq. (42), a sequence of estimates that converge to a local minimum \( x_{\hbox{min} } \text{ = }\arg \min_{x} \tilde{J}\left( x \right) \) is obtained, which will be shown by defining a proper auxiliary function for the objective function in Eq. (13).
First, the convergence of update rule in Eq. (21) is proven. For any an element \( A_{j,q} \left( k \right) \) in \( {\mathbf{A}}\left( k \right) \), let \( \tilde{J}_{{A_{j,q} \left( k \right)}} \) denote the part of \( J_{DRNMF\_SLC} \) relevant to \( A_{j,q} \left( k \right) \). Since the update is element wise essentially, it is sufficient to reveal that each \( \tilde{J}_{{A_{j,q} \left( k \right)}} \) is non-increasing under the update rule of Eq. (21), which is proven by defining the auxiliary function with regard to \( A_{j,q} \left( k \right) \) as follows:
Lemma 2
Let \( \tilde{J}^{\prime} \) represent the first-order derivative with regard to \( {\mathbf{A}}\left( k \right) \). The function
is an auxiliary function for \( \tilde{J}_{{A_{j,q} \left( k \right)}} \), which is the part of \( J_{DRNMF\_SLC} \) that is only related to \( A_{j,q} \left( k \right) \).
Proof
Obviously, \( G\left( {A\left( k \right),A\left( k \right)} \right)\text{ = }\tilde{J}_{{A_{j,q} \left( k \right)}} \left( {A\left( k \right)} \right) \). Based on the definition of auxiliary function, only \( G\left( {A\left( k \right),A_{j,q}^{t} \left( k \right)} \right) \ge \tilde{J}_{{A_{j,q} \left( k \right)}} \left( {A\left( k \right)} \right) \) needs to be demonstrated. In order to achieve this purpose, \( G\left( {A\left( k \right),A_{j,q}^{t} \left( k \right)} \right) \) in Eq. (43) is compared with the Taylor series expansion of \( \tilde{J}_{{A_{j,q} \left( k \right)}} \left( {A\left( k \right)} \right) \):
where \( \tilde{J}^{\prime\prime} \) is the second-order derivative regarding \( {\mathbf{A}}\left( k \right) \). It is simple to check that
Substituting Eq. (46) into Eq. (44) and comparing with Eq. (43), it can be seen that, instead of demonstrating \( G\left( {A\left( k \right),A_{j,q}^{t} \left( k \right)} \right) \ge \tilde{J}_{{A_{j,q} \left( k \right)}} \left( {A\left( k \right)} \right) \), it is equivalent to prove
To prove the above inequality, the following inequality holds as:
In summary, the statement \( G\left( {A\left( k \right),A_{j,q}^{t} \left( k \right)} \right) \ge \tilde{J}_{{A_{j,q} \left( k \right)}} \left( {A\left( k \right)} \right) \) holds, and Lemma 2 is proved.
Subsequently, an auxiliary function is defined for the update rule in Eq. (18). Similarly, let \( \tilde{J}_{{Z_{i,j} \left( k \right)}} \) represent the part of \( J_{DRNMF\_SLC} \) related to \( Z_{i,j} \left( k \right) \). Then, the auxiliary function relevant to \( Z_{i,j} \left( k \right) \) is defined as follows:
Lemma 3
The function
is an auxiliary function for \( \tilde{J}_{{Z_{i,j} \left( k \right)}} \), which is the part of \( J_{DRNMF\_SLC} \) that is only related to \( Z_{i,j} \left( k \right) \).
The proof of Lemma 3 is similar to the proof of Lemma 2 essentially and is omitted due to space limitation.
Lemma 4
The function
is an auxiliary function for \( \tilde{J}_{{F_{q,a} \left( k \right)}} \), which is the part of \( J_{DRNMF\_SLC} \) that is only related to \( F_{q,a} \left( k \right) \).
Proof
Obviously, \( G\left( {F\left( k \right),F\left( k \right)} \right)\text{ = }\tilde{J}_{{F_{q,a} \left( k \right)}} \left( {F\left( k \right)} \right) \). Based on the definition of auxiliary function, only \( G\left( {F\left( k \right),F_{q,a}^{{{\kern 1pt} t}} \left( k \right)} \right) \ge \tilde{J}_{{F_{q,a} \left( k \right)}} \left( {F\left( k \right)} \right) \) needs to be proven. In order achieve this goal, \( G\left( {F\left( k \right),F_{q,a}^{{{\kern 1pt} t}} \left( k \right)} \right) \) in Eq. (50) is compared with the Taylor series expansion of \( \tilde{J}_{{F_{q,a} \left( k \right)}} \left( {F\left( k \right)} \right) \):
where
Instead of demonstrating \( G\left( {F\left( k \right),F_{q,a}^{t} \left( k \right)} \right) \ge \tilde{J}_{{F_{q,a} \left( k \right)}} \left( {F\left( k \right)} \right) \), it is equivalent to prove
To prove above inequality, we have
Thus, the following inequality holds:
As a result, the statement \( G\left( {F\left( k \right),F_{q,a}^{t} \left( k \right)} \right) \ge \tilde{J}_{{F_{q,a} \left( k \right)}} \left( {F\left( k \right)} \right) \) holds, and Lemma 4 is proved.
With the above lemmas, the proof of Theorem 1 is further given.
Proof of Theorem 1
Substituting \( G\left( {A\left( k \right),A_{j,q}^{t} \left( k \right)} \right) \) of Eq. (43) into Eq. (42), the following equation is obtained:
Since Eq. (43) is an auxiliary function, \( \tilde{J}_{{A_{j,q} \left( k \right)}} \) is non-increasing under this iteration rule in Eq. (58), according to Lemma 2.
Then, substituting \( G\left( {Z\left( k \right),Z_{i,j}^{t} \left( k \right)} \right) \) of Eq. (49) into Eq. (42), the following equation is obtained:
Since Eq. (49) is an auxiliary function, \( \tilde{J}_{{Z_{i,j} \left( k \right)}} \) is non-increasing under this iteration rule in Eq. (59), according to Lemma 3.
Similarly, substituting \( G\left( {F\left( k \right),F_{q,a}^{t} \left( k \right)} \right) \) of Eq. (50) into Eq. (42), the following equation is obtained:
Since Eq. (50) is an auxiliary function, \( \tilde{J}_{{F_{q,a} \left( k \right)}} \) is non-increasing under this iteration rule in Eq. (60) according to Lemma 4.
According to the above derivation and proof, the update rules of \( {\mathbf{Z}}\left( k \right) \), \( {\mathbf{A}}\left( k \right) \) and \( {\mathbf{F}}\left( k \right) \) in Eqs. (18), (21) and (32) result in a series of non-increasing values of \( J_{DRNMF\_SLC} \) and hence a local minimum.
Rights and permissions
About this article
Cite this article
Tong, M., Chen, Y., Zhao, M. et al. A deep discriminative and robust nonnegative matrix factorization network method with soft label constraint. Neural Comput & Applic 31, 7447–7475 (2019). https://doi.org/10.1007/s00521-018-3554-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-018-3554-6