Abstract
For the tasks of pattern analysis and recognition, nonnegative matrix factorization and concept factorization (CF) have attracted much attention due to its effective application to find the meaningful low-dimensional representation of data. However, they neglect the geometry information embedded in the local neighborhoods of the data and fail to exploit the prior knowledge. In this paper, a novel semi-supervised learning algorithm named hyper-graph regularized discriminative concept factorization (HDCF) is proposed. For the sake of exploring intrinsic geometrical structure of the data and making use of label information, HDCF incorporates hyper-graph regularizer into CF framework and uses the label information to train a classifier for the classification task. HDCF can learn a new concept factorization with respect to the intrinsic manifold structure of the data and also simultaneously adapted to the classification task and a classifier built on the low-dimensional representations. Moreover, an iterative updating optimization scheme is developed to solve the objective function of the proposed HDCF and the convergence proof of our optimization scheme is also provided. Experimental results on ORL, Yale and USPS image databases demonstrate the effectiveness of our proposed algorithm.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agarwal S, Branson K, Belongie S (2006) Higher order learning with graphs. In: Proceedings of the 23th international conference on machine learning. Pittsburgh, PA pp 17–24
Agarwal S, Lim J, Zelnik Manor L, Perona P, Kriegman D, Belongie S (2005) Beyond pairwise clustering. Proceedings of the international conference on computer vision and pattern recognition. San Diego, CA, pp 838–845
Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from examples. J Mach Learn Res 7:2399–2434
Belkin M, Niyogi P (2001) Laplacian eigenmaps and spectral techniques for embedding and clustering. In: 15th Annual Neural Information Processing Systems Conference, NIPS 2001, vol 14. MIT Press, Cambridge, pp 585–591
Cai D, He X, Han J, Huang T (2011) Graph regularized nonnegative matrix factorization for data representation. IEEE Trans Pattern Anal Mach Intell 33:1548–1560
Cai D, He X, Han J (2011) Locally consistent concept factorization for document clustering. IEEE Trans Knowl Data Eng 23(6):902–913
Chapelle O, Scholkopf B, Zien A et al (2006) Semi-supervised learning, vol 2. MIT Press, Cambridge
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc Ser B (Methodological) 39(1):1–38
Diaz-Valenzuela I, Loia V, Martin-Bautista MJ, Senatore S, Vila MA (2016) Automatic constraints generation for semisupervised clustering: experiences with documents classification. Soft Comput 20:2329–2339
Grira N, Crucianu M, Boujemaa N (2005) Semi-supervised fuzzy clustering with pairwise-constrained competitive agglomeration. In: The 14th IEEE international conference on fuzzy systems, FUZZ’05. IEEE, pp 867–872
He W, Chen Jim X, Zhang WH (2017) Low-rank representation with graph regularization for subspace clustering. Soft Comput 21(6):1–13
He R, Zheng W, Hu B, Kong X (2006) Nonnegative sparse coding for discriminative semi-supervised learning. In: 2011 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 2849–2856
Hong C, Yu J, Li J, Chen X (2013) Multi-view hypergraph learning by patch alignment framework. Neurocomputing 118:79–86
Hua W, He X (2011) Discriminative concept factorization for data representation. Neurocomputing 74:3800–3807
Huan Y, Liu Q, Lv F, Gong Y, Metaxax D (2011) Unsupervised image categorization by hypergraph partition. IEEE Trans Pattern Anal Mach Intell 33(6):1266–1273
Huang Y, Liu Q, Metaxas D (2009) Video object segmentation by hypergraph cut. In: Proceedings of the international conference on computer vision and pattern recognition. Miami, FL, pp 1738–1745
Huang L, Su CY (2006) Facial expression synthesis using manifold learning and belief propagation. Soft Comput 10:1193–1200
Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791
Li X, Zhao CX, Shu ZQ, Guo JH (2015) Hyper-graph regularized concept factorization algorithm and its application to data representation. China Acad Control Decis 30(8):1399–1404
Mairal J, Bach F, Ponce J, Sapiro G, Zisserman A (2009) Supervised dictionary learning. In: Koller D, Schuurmans D, Bengio Y, Bottou L (eds) Advances in neural information processing systems, vol 21. Hyatt Regency, Vancouver, pp 1033–1040
Sha F, Lin Y, Saul LK, Lee DD (2007) Multiplicative updates for nonnegative quadratic programming. Neural Comput 19(8):2004–2031
Shahnaz F, Berry MW, Pauca V, Plemmons RJ (2006) Document clustering using nonnegative matrix factorization. Inf Process Manag 42(2):373–386
Shashua A, Hazan T (2005) Nonnegative tensor factorization with applications to statistics and computer vision. In: Proceedings of the 22nd international conference on machine learning, pp 792–799
Sun L, Ji S, Ye J (2008) Hypergraph spectral learning for multi-label classification. Proceedings of the international conference on knowledge discovery and data mining. Las Vegas, NV, pp 668–676
Tenenbaum J, de Silva V, Langford J (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323
Tian Z, Hwang T, Kuang R (2009) A hypergraph-based learning algorithm for classifying gene expression and array CGH data with prior knowledge. Bioinformatics 25(21):2831–2838
Wang C, Yu J, Tao D (2013) High-level attributes modeling for indoor scenes classification. Neurocomputing 121:337–343
Wang Y, Jia Y, Hu C, Turk M (2005) Nonnegative matrix factorization framework for face recognition. Int J Pattern Recognit Artif Intell 19(4):495–511
Xu W, Gong Y (2004) Document clustering by concept factorization. In: Proceedings of 2004 international conference on research and development in information retrieval (SIGIR’04), Sheffield, UK, July 2004, pp 202–209
Yangcheng H, Hongtao L, Lei H, Saining X (2014) Pairwise constrained concept factorization for data representation. Neural Netw 52:1–17
Yu J, Tao D, Wang M (2012) Adaptive hypergraph learning and its application in image classification. IEEE Trans Image Process 21(7):3262–3272
Zass R, Shashua A (2008) Probabilistic graph and hyper graph matching. In: Proceedings of the international conference on computer vision and pattern recognition in Anchorage, AK, pp 1–8
Zeng K, Yu J, Li CH, You J, Jin TS (2014) Image clustering by hyper-graph regularized non-negative matrix factorization. Neurocomputing 138:209–217
Zhang Y, Yeung D (2008) Semi-supervised discriminant analysis using robust path-based similarity. In: IEEE conference on computer vision and pattern recognition, p 18
Zhou D, Huang J, Scholkopf B (2006) Learning with hypergraphs: clustering, classification, and embedding. In: 20th Annual Conference on Neural Information Processing Systems, NIPS 2006. MIT Press, Cambridge, pp 1601–1608
Acknowledgements
This work is partially supported by the National Natural Science Foundation of China under Grant Nos. 61373063, 61233011, 61125305, 61375007, 61220301 and by National Basic Research Program of China under Grant No. 2014CB349303. Also this work is supported in part by the Natural Science Foundation of Jiangsu Province (BK20150867), the Natural Science Research Foundation for Jiangsu Universities (13KJB510022) and the Natural Science Foundation of Nanjing University of Posts and Telecommunications (NY215125).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Jun Ye and Zhong Jin declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors. All the data used in the experiments were obtained from public datasets.
Additional information
Communicated by V. Loia.
Appendix A (proof of theorem 1)
Appendix A (proof of theorem 1)
To prove Theorem 1, we need to show that the objective function \({\varvec{J}}_\mathbf{HDCF } \) in Eq. (12) is nonincreasing under the updating rules stated in Eqs. (20), (21) and (23). Now, we make use of an auxiliary function similar to that used in the EM algorithm (Dempster et al. 1977) to prove the convergence of Theorem 1. We begin with the definition of the auxiliary function.
Definition 1
The function \(G( {x,{x}'} )\) is an auxiliary function for F(x), if the \(G( {x,{x}'} )\ge F( x )\) and \(G( {x,x} )=F(x )\) are satisfied.
The auxiliary function is very useful because of the following lemma.
Lemma 1
If G is an auxiliary function of F, then F is nonincreasing under the update
Proof
Since the updating rule of \({\varvec{W}}\) is exactly the same with the original CF, the convergence proof of Eq. (20) can be referred to Xu and Gong (2004). Here we only need to prove the convergence of the updating rules for \({\varvec{V}}\) and \({\varvec{A}}\) in Eqs. (21) and (23). Next we will show that the updating rule for \({\varvec{V}}\) in Eq. (21) is exactly the update in Eq. (28) with a proper auxiliary function.
Considering any element \(v_{ab} \) in \({\varvec{V}}\), we use \(F_{v_{ab} } \) to denote the part of \({\varvec{J}}_\mathbf{HDCF } \) which is only relevant to \(v_{ab} \). It is easy to check that
Since our update is essentially elementwise, it is sufficient to show that each \(F_{v_{ab} } \) is nonincreasing under the update step of Eq. (21). \(\square \)
Lemma 2
The function in Eq. (2) is an auxiliary function for \(F_{v_{ab} } \).
Proof
Since \(G(v,v)=F_{v_{ab} } (v)\) is obvious, we need show that \(G(v,v_{ab}^{(t)} )\ge F_{v_{ab} } (v)\). To do this, we compare the Taylor series expansion of \(F_{v_{ab} } (v)\)
with Eq. (28) to find that \(G(v,v_{ab}^{(t)} )\ge F_{v_{ab} } (v)\) is equivalent to
We have
And
Thus, Eq. (30) holds and \(G(v,v_{ab}^{(t)} )\ge F_{v_{ab} } (v)\). \(\square \)
Next we define an auxiliary function for the update rule in Eq. (23). Similarly, consider any element \(a_{ab} \) in A; we use \(F_{a_{ab} } \) to denote the part of \({\varvec{J}}({\varvec{A}})\) which is only relevant to \(a_{ab} \). It is easy to check that
Similarly, it is sufficient to show that each \(F_{a_{ab} } \) is nonincreasing under the update step of Eq. (23). Then the auxiliary function regarding \(a_{ab} \) is defined as follows:
Lemma 3
The function in Eq. (31) is an auxiliary function for \(F_{a_{ab}}\).
Proof
Since \(G(a,a)=F_{a_{ab} } (a)\) is obvious, we need show that \(G(a,a_{ab}^{(t)} )\ge F_{a_{ab} } (a)\). To do this, we compare the Taylor series expansion of \(F_{a_{ab} } (a)\)
With Eq. (28) to find that \(G(a,a_{ab}^{(t)} )\ge F_{a_{ab} } (a)\) is equivalent to
We have \(({\varvec{AV}}^{T}{\varvec{C}}^{T}{\varvec{CV}})_{ab} =\sum \limits _{q=1}^r {a_{aq}^{(t)} ({\varvec{V}}^{T}{\varvec{C}}^{T}{\varvec{CV}})_{qb} } \ge a_{ab}^{(t)} \) \(({\varvec{V}}^{T}{\varvec{C}}^{T}{\varvec{CV}})_{bb} \);
Thus, Eq. (32) holds and \(G(a,a_{ab}^{(t)} )\ge F_{a_{ab} } (a)\). \(\square \)
Now we can demonstrate the convergence of Theorem 1:
Proof of Theorem 1
Replacing \(G(v,v_{ab}^{(t)} )\) in Eq. (28) by Eq. (20), we get
Since Eq. (20) is an auxiliary function, \(F_{v_{ab} } \) is nonincreasing under this updating rule.
Similarly, replacing \(G(a,a_{ab}^{(t)} )\) in Eq. (28) by Eq. (23), we get
Since Eq. (22) is an auxiliary function, \(F_{a_{ab} } \) is nonincreasing under this updating rule. \(\square \)
Rights and permissions
About this article
Cite this article
Ye, J., Jin, Z. Hyper-graph regularized discriminative concept factorization for data representation. Soft Comput 22, 4417–4429 (2018). https://doi.org/10.1007/s00500-017-2636-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-017-2636-1