Abstract
Nonnegative learning aims to learn the part-based representation of nonnegative data and receives much attention in recent years. Nonnegative matrix factorization has been popular to make nonnegative learning applicable, which can also be explained as an optimization problem with bound constraints. In order to exploit the informative components hidden in nonnegative patterns, a novel nonnegative learning method, termed nonnegative class-specific entropy component analysis, is developed in this work. Distinguish from the existing methods, the proposed method aims to conduct the general objective functions, and the conjugate gradient technique is applied to enhance the iterative optimization. In view of the development, a general nonnegative learning framework is presented to deal with the nonnegative optimization problem with general objective costs. Owing to the general objective costs and the nonnegative bound constraints, the diseased nonnegative learning problem usually occurs. To address this limitation, a modified line search criterion is proposed, which prevents the null trap with insured conditions while keeping the feasible step descendent. In addition, the numerical stopping rule is employed to achieve optimized efficiency, instead of the popular gradient-based one. Experiments on face recognition with varieties of conditions reveal that the proposed method possesses better performance over other methods.
Similar content being viewed by others
Notes
Note that, for some specific purposes where another existing upper bound is available, it is actually unnecessary to bring such constraint. In NCECA, the NMF approximation is still involved to make an universal arrangement.
References
Belhumeur P, Hespanha J, Kriegman D (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720
Bertsekas DP (1976) On the Goldstein-Levitin-Polyak gradient projection method. IEEE Trans Autom Control 21:174–184
Bertsekas DP (1999) Nonlinear programming. Athena Scientific, Belmont
Cheng M, Fang B, Pun CM, Tang YY (2011) Kernel-view based discriminant approach for embedded feature extraction in high-dimensional space. Neurocomputing 74(9):1478–1484
Cheng M, Fang B, Tang YY, Zhang T, Wen J (2010) Incremental embedding and learning in the local discriminant subspace with application to face recognition. IEEE Trans Syst Man Cybern Part C Appl Rev 40(5):580–591
Cheng M, Fang B, Wen J, Tang YY (2010) Marginal discriminant projections: an adaptable marginal discriminant approach to feature reduction and extraction. Pattern Recogn Lett 31(13):1965–1974
Cover TM, Thomas JA (2006) Elements of information theory, 2nd edn. Wiley, New York
Ding C, He X, Simon HD (2005) On the equivalence of nonnegative matrix factorization and spectral clustering. In: Proceeding of SIAM international conference on data mining, pp 606–610
Erdogmus D, Prineipe JC (2002) Generalized information potential criterion for adaptive system training. IEEE Trans Neural Netw 13(5):1035–1044
Fukunaga K (1990) Introduction to statistical pattern recognition. Academic Press, New York
Georghiades AS, Belhumeur PN, Kriegman DJ (2001) From few to many: illumination cone models for face recognition under variable lighting and pose. IEEE Trans Pattern Anal Mach Intell 23(6):643–660
Guan N, Tao D, Luo Z, Yuan B (2011) Manifold regularized discriminative nonnegative matrix factorization with fast gradient descent. IEEE Trans Image Process 20(7):2030–2048
Guan N, Tao D, Luo Z, Yuan B (2011) Non-negative patch alignment framework. IEEE Trans Neural Netw 22(8):1218–1230
Han L, Neumann M, Prasad U (2009) Alternating projected Barzilai–Borwein methods for nonnegative matrix factorization. Electron Trans Numer Anal 36(6):54–82
He X, Yan S, Hu Y, Niyogi P, Zhang H (2005) Face recognition using Laplacianfaces. IEEE Trans Pattern Recognit Mach Intell 27(3):328–340
Hoyer PO (2004) Non-negative matrix factorization with sparseness constraints. J Mach Learn Res 5:1457–1469
Jenssen R (2010) Kernel entropy component analysis. IEEE Trans Pattern Anal Mach Intell 32(5):847–860
Kim D, Sra S, Dhillon IS (2007) Fast newton-type methods for the least squares nonnegative matrix approximation problem. In: IEEE International Conference on Data Mining, pp 343–354
Kotsia I, Zafeiriou S, Pitas I (2007) A novel discriminant non-negative matrix factorization algorithm with applications to facial image characterization problems. IEEE Trans Inform Forensics Secur 2(3):588–595
Laboratory OR.: The Olivetti & Oracle Research Laboratory face database of faces [Online]. Available: http://www.cam-orl.co.uk/facedatabase.html
Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791
Lee DD, Seung HS (2000) Algorithms for non-negative matrix factorization. In: Proceedings of neural information processing systems, pp 556–562
Leiva-Murillo JM, Artés-Rodríguez A (2007) Maximization of mutual information for supervised linear feature extraction. IEEE Trans Neural Netw 18(5):1433–1441
Li SZ, Hou XW, Zhang HJ (2001) Learning spatially localized, parts-based representation. In: Proceedings of computer vision and pattern recognition, pp 207–212
Lin CJ (2007) On the convergence of multiplicative update for nonnegative matrix factorization. IEEE Trans Neural Netw 18(6):1589–1596
Lin CJ (2007) Projected gradients for nonnegative matrix factorization. Neural Comput 19:2756–2779
Liu C, He K, Zhou J, Zhang J (2010) Generalized discriminant orthogonal non-negative matrix factorization. J Comput Inform Sys 6(6):1743–1750
Lyons MJ, Akamatsu S, Kamachi M, Gyoba J (2005) Coding facial expression with Gabor wavelets. In: Proceedings of third IEEE international conference on automatic face and gesture recognition, pp 200–205
Moré JJ, Toraldo G (1991) On the solution of large quadratic programming problems with bound constraints. SIAM J Optim 1(1):93–113
Parzen E (1962) On estimation of a probability density function and mode. Ann Math Stat 33:1065–1076
Phillips PJ, Moon H, Rizvi SA, Rauss PJ (2000) The feret evaluation methodology for face-recognition algorithms. IEEE Trans Pattern Anal Mach Intell 22(10):1090–1104
Principe JC, Xu D, Fisher JWI (2000) Information-theoretic learning, vol 1. Wiley, New York
Renyi A (1961) On measures of entropy and information. In: Proceedings of the fourth Berkeley symposium on mathematical statistics and probability, University of California Press, Berkeley, pp 547–561
Torkkola K (2003) Feature extraction by non-parametric mutual information. J Mach Learn Res 3:1415–1438
Turk M, Pentland A (1991) Eigenfaces for recognition. J Cogn Neurosci 3(1):71–86
Wang Y, Jia Y, Hu C, Turk M (2005) Non-negative matrix factorization framework for face recognition. Int J Pattern Recogn Artif Intell 19(4):495–511
Zafeiriou S, Tefas A, Buciu I, Pitas I (2006) Exploiting discriminant information in nonnegative matrix factorization with application to frontal face verification. IEEE Trans Neutral Netw 17(3):683–695
Zdunek R, Cichocki A (2006) Non-negative matrix factorization with quasi-newton optimization. In: The 8th international conference on artificial intelligence and soft computing, pp 870–879
Zdunek R, Cichocki A (2007) Nonnegative matrix factorization with constrained second-order optimization. Signal Process 87(8):1904–1916
Acknowledgments
The authors would like to thank the handling associate editor and anonymous reviewers for their constructive comments. And the authors also would like to thank the US Army Research Laboratory for the FERET database. This work was supported by the research grant funded by the research committee of University of Macau.
Author information
Authors and Affiliations
Corresponding author
Appendix: A calculation of \( \nabla J_{E} (W) \)
Appendix: A calculation of \( \nabla J_{E} (W) \)
The gradient \( \nabla \mathcal {H} ({W^T X | C}) \) is quite associated with the gradient of the information potential \( \nabla \mathcal {V} ({W^T X | c}) \) for each class. The gradient \( \nabla \mathcal {V} ({W^T X | c}) \) is calculated as
Thus, the partial derivative \( {{\partial H ({W^T X | C})} / {\partial W}} \) is given by
with the above partial derivatives, it is straightforward to obtain the gradient \( \nabla J_E (W). \)
Rights and permissions
About this article
Cite this article
Cheng, M., Pun, CM. & Tang, Y.Y. Nonnegative class-specific entropy component analysis with adaptive step search criterion. Pattern Anal Applic 17, 113–127 (2014). https://doi.org/10.1007/s10044-011-0258-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-011-0258-2