Skip to main content
Log in

Hyper-graph regularized discriminative concept factorization for data representation

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

For the tasks of pattern analysis and recognition, nonnegative matrix factorization and concept factorization (CF) have attracted much attention due to its effective application to find the meaningful low-dimensional representation of data. However, they neglect the geometry information embedded in the local neighborhoods of the data and fail to exploit the prior knowledge. In this paper, a novel semi-supervised learning algorithm named hyper-graph regularized discriminative concept factorization (HDCF) is proposed. For the sake of exploring intrinsic geometrical structure of the data and making use of label information, HDCF incorporates hyper-graph regularizer into CF framework and uses the label information to train a classifier for the classification task. HDCF can learn a new concept factorization with respect to the intrinsic manifold structure of the data and also simultaneously adapted to the classification task and a classifier built on the low-dimensional representations. Moreover, an iterative updating optimization scheme is developed to solve the objective function of the proposed HDCF and the convergence proof of our optimization scheme is also provided. Experimental results on ORL, Yale and USPS image databases demonstrate the effectiveness of our proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Agarwal S, Branson K, Belongie S (2006) Higher order learning with graphs. In: Proceedings of the 23th international conference on machine learning. Pittsburgh, PA pp 17–24

  • Agarwal S, Lim J, Zelnik Manor L, Perona P, Kriegman D, Belongie S (2005) Beyond pairwise clustering. Proceedings of the international conference on computer vision and pattern recognition. San Diego, CA, pp 838–845

    Google Scholar 

  • Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from examples. J Mach Learn Res 7:2399–2434

    MathSciNet  MATH  Google Scholar 

  • Belkin M, Niyogi P (2001) Laplacian eigenmaps and spectral techniques for embedding and clustering. In: 15th Annual Neural Information Processing Systems Conference, NIPS 2001, vol 14. MIT Press, Cambridge, pp 585–591

  • Cai D, He X, Han J, Huang T (2011) Graph regularized nonnegative matrix factorization for data representation. IEEE Trans Pattern Anal Mach Intell 33:1548–1560

    Article  Google Scholar 

  • Cai D, He X, Han J (2011) Locally consistent concept factorization for document clustering. IEEE Trans Knowl Data Eng 23(6):902–913

    Article  Google Scholar 

  • Chapelle O, Scholkopf B, Zien A et al (2006) Semi-supervised learning, vol 2. MIT Press, Cambridge

    Book  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc Ser B (Methodological) 39(1):1–38

    MathSciNet  MATH  Google Scholar 

  • Diaz-Valenzuela I, Loia V, Martin-Bautista MJ, Senatore S, Vila MA (2016) Automatic constraints generation for semisupervised clustering: experiences with documents classification. Soft Comput 20:2329–2339

    Article  Google Scholar 

  • Grira N, Crucianu M, Boujemaa N (2005) Semi-supervised fuzzy clustering with pairwise-constrained competitive agglomeration. In: The 14th IEEE international conference on fuzzy systems, FUZZ’05. IEEE, pp 867–872

  • He W, Chen Jim X, Zhang WH (2017) Low-rank representation with graph regularization for subspace clustering. Soft Comput 21(6):1–13

    Article  Google Scholar 

  • He R, Zheng W, Hu B, Kong X (2006) Nonnegative sparse coding for discriminative semi-supervised learning. In: 2011 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 2849–2856

  • Hong C, Yu J, Li J, Chen X (2013) Multi-view hypergraph learning by patch alignment framework. Neurocomputing 118:79–86

    Article  Google Scholar 

  • Hua W, He X (2011) Discriminative concept factorization for data representation. Neurocomputing 74:3800–3807

    Article  Google Scholar 

  • Huan Y, Liu Q, Lv F, Gong Y, Metaxax D (2011) Unsupervised image categorization by hypergraph partition. IEEE Trans Pattern Anal Mach Intell 33(6):1266–1273

    Article  Google Scholar 

  • Huang Y, Liu Q, Metaxas D (2009) Video object segmentation by hypergraph cut. In: Proceedings of the international conference on computer vision and pattern recognition. Miami, FL, pp 1738–1745

  • Huang L, Su CY (2006) Facial expression synthesis using manifold learning and belief propagation. Soft Comput 10:1193–1200

    Article  MATH  Google Scholar 

  • Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791

    Article  MATH  Google Scholar 

  • Li X, Zhao CX, Shu ZQ, Guo JH (2015) Hyper-graph regularized concept factorization algorithm and its application to data representation. China Acad Control Decis 30(8):1399–1404

  • Mairal J, Bach F, Ponce J, Sapiro G, Zisserman A (2009) Supervised dictionary learning. In: Koller D, Schuurmans D, Bengio Y, Bottou L (eds) Advances in neural information processing systems, vol 21. Hyatt Regency, Vancouver, pp 1033–1040

    Google Scholar 

  • Sha F, Lin Y, Saul LK, Lee DD (2007) Multiplicative updates for nonnegative quadratic programming. Neural Comput 19(8):2004–2031

    Article  MathSciNet  MATH  Google Scholar 

  • Shahnaz F, Berry MW, Pauca V, Plemmons RJ (2006) Document clustering using nonnegative matrix factorization. Inf Process Manag 42(2):373–386

    Article  MATH  Google Scholar 

  • Shashua A, Hazan T (2005) Nonnegative tensor factorization with applications to statistics and computer vision. In: Proceedings of the 22nd international conference on machine learning, pp 792–799

  • Sun L, Ji S, Ye J (2008) Hypergraph spectral learning for multi-label classification. Proceedings of the international conference on knowledge discovery and data mining. Las Vegas, NV, pp 668–676

    Google Scholar 

  • Tenenbaum J, de Silva V, Langford J (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323

    Article  Google Scholar 

  • Tian Z, Hwang T, Kuang R (2009) A hypergraph-based learning algorithm for classifying gene expression and array CGH data with prior knowledge. Bioinformatics 25(21):2831–2838

    Article  Google Scholar 

  • Wang C, Yu J, Tao D (2013) High-level attributes modeling for indoor scenes classification. Neurocomputing 121:337–343

    Article  Google Scholar 

  • Wang Y, Jia Y, Hu C, Turk M (2005) Nonnegative matrix factorization framework for face recognition. Int J Pattern Recognit Artif Intell 19(4):495–511

    Article  Google Scholar 

  • Xu W, Gong Y (2004) Document clustering by concept factorization. In: Proceedings of 2004 international conference on research and development in information retrieval (SIGIR’04), Sheffield, UK, July 2004, pp 202–209

  • Yangcheng H, Hongtao L, Lei H, Saining X (2014) Pairwise constrained concept factorization for data representation. Neural Netw 52:1–17

    Article  MATH  Google Scholar 

  • Yu J, Tao D, Wang M (2012) Adaptive hypergraph learning and its application in image classification. IEEE Trans Image Process 21(7):3262–3272

    Article  MathSciNet  MATH  Google Scholar 

  • Zass R, Shashua A (2008) Probabilistic graph and hyper graph matching. In: Proceedings of the international conference on computer vision and pattern recognition in Anchorage, AK, pp 1–8

  • Zeng K, Yu J, Li CH, You J, Jin TS (2014) Image clustering by hyper-graph regularized non-negative matrix factorization. Neurocomputing 138:209–217

    Article  Google Scholar 

  • Zhang Y, Yeung D (2008) Semi-supervised discriminant analysis using robust path-based similarity. In: IEEE conference on computer vision and pattern recognition, p 18

  • Zhou D, Huang J, Scholkopf B (2006) Learning with hypergraphs: clustering, classification, and embedding. In: 20th Annual Conference on Neural Information Processing Systems, NIPS 2006. MIT Press, Cambridge, pp 1601–1608

Download references

Acknowledgements

This work is partially supported by the National Natural Science Foundation of China under Grant Nos. 61373063, 61233011, 61125305, 61375007, 61220301 and by National Basic Research Program of China under Grant No. 2014CB349303. Also this work is supported in part by the Natural Science Foundation of Jiangsu Province (BK20150867), the Natural Science Research Foundation for Jiangsu Universities (13KJB510022) and the Natural Science Foundation of Nanjing University of Posts and Telecommunications (NY215125).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Ye.

Ethics declarations

Conflict of interest

Jun Ye and Zhong Jin declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors. All the data used in the experiments were obtained from public datasets.

Additional information

Communicated by V. Loia.

Appendix A (proof of theorem 1)

Appendix A (proof of theorem 1)

To prove Theorem 1, we need to show that the objective function \({\varvec{J}}_\mathbf{HDCF } \) in Eq. (12) is nonincreasing under the updating rules stated in Eqs. (20), (21) and (23). Now, we make use of an auxiliary function similar to that used in the EM algorithm (Dempster et al. 1977) to prove the convergence of Theorem 1. We begin with the definition of the auxiliary function.

Definition 1

The function \(G( {x,{x}'} )\) is an auxiliary function for F(x), if the \(G( {x,{x}'} )\ge F( x )\) and \(G( {x,x} )=F(x )\) are satisfied.

The auxiliary function is very useful because of the following lemma.

Lemma 1

If G is an auxiliary function of F, then F is nonincreasing under the update

$$\begin{aligned} x^{(t+1)}=\mathop {\arg \min }\limits _{x} G({x,x^{(t)}} ) \end{aligned}$$
(28)

Proof

$$\begin{aligned} F( {x^{(t+1)}} )\le G( {x^{(t+1)},x^{(t)}} )\le G( {x^{(t)},x^{(t)}} )=F( {x^{(t)}} ). \end{aligned}$$

Since the updating rule of \({\varvec{W}}\) is exactly the same with the original CF, the convergence proof of Eq. (20) can be referred to Xu and Gong (2004). Here we only need to prove the convergence of the updating rules for \({\varvec{V}}\) and \({\varvec{A}}\) in Eqs. (21) and (23). Next we will show that the updating rule for \({\varvec{V}}\) in Eq. (21) is exactly the update in Eq. (28) with a proper auxiliary function.

Considering any element \(v_{ab} \) in \({\varvec{V}}\), we use \(F_{v_{ab} } \) to denote the part of \({\varvec{J}}_\mathbf{HDCF } \) which is only relevant to \(v_{ab} \). It is easy to check that

$$\begin{aligned} {F'}_{v_{ab} }= & {} \left( {\frac{\partial {\varvec{J}}_\mathbf{HDCF } }{\partial {\varvec{V}}}} \right) _{ab} \\= & {} \left( {-2{\varvec{KW}}+2{\varvec{VW}}^{T} {\varvec{KW}}+2\alpha {\varvec{L}}^\mathrm{hyper}{\varvec{V}}} \right) _{ab}\nonumber \\&+\,2\beta \left( {-{\varvec{C}}^{T}{\varvec{Y}}^{T} {\varvec{A}}+{\varvec{C}}^{T}{\varvec{CVA}}^{T}{\varvec{A}}} \right) _{ab}, \\ {F}_{v_{ab}}^{\prime \prime }= & {} 2({{\varvec{W}}^{T}{\varvec{KW}}} )_{bb} +2\alpha \left( {{\varvec{L}}^{\mathrm{hyper}}} \right) _{aa} \nonumber \\&+\,2\beta ({\varvec{C}}^{T}{\varvec{C}})_{aa} ( {\varvec{A}}^{T}{\varvec{A}})_{bb} \end{aligned}$$

Since our update is essentially elementwise, it is sufficient to show that each \(F_{v_{ab} } \) is nonincreasing under the update step of Eq. (21). \(\square \)

Lemma 2

The function in Eq. (2) is an auxiliary function for \(F_{v_{ab} } \).

$$\begin{aligned} G(v,v_{ab}^{(t)} )= & {} F_{v_{ab} } (v_{ab}^{(t)} )+{F}'_{v_{ab} } (v_{ab}^{(t)} )(v-v_{ab}^{(t)} )\nonumber \\&+\frac{({\varvec{VW}}^{T}{\varvec{KW}})_{ab} +\alpha ({\varvec{D}}_{v} {\varvec{V}})_{ab} +\beta ({\varvec{C}}^{T}{\varvec{CVA}}^{T}{\varvec{A}})_{ab} }{v_{ab}^{(t)} }\nonumber \\&(v-v_{ab}^{(t)} )^{2} \end{aligned}$$
(29)

Proof

Since \(G(v,v)=F_{v_{ab} } (v)\) is obvious, we need show that \(G(v,v_{ab}^{(t)} )\ge F_{v_{ab} } (v)\). To do this, we compare the Taylor series expansion of \(F_{v_{ab} } (v)\)

$$\begin{aligned} F_{v_{ab} } (v)= & {} F_{v_{ab} } (v_{ab}^{(t)} )+{F}'_{v_{ab} } (v_{ab}^{(t)} )(v-v_{ab}^{(t)} )\\&+\,[({\varvec{W}}^{T}{\varvec{KW}})_{bb} +\alpha ({\varvec{L}}^{\mathrm{hyper}})_{aa} \\&+\,\beta ({\varvec{C}}^{T}{\varvec{C}})_{aa} ({\varvec{A}}^{T}{\varvec{A}})_{bb} ](v-v_{ab}^{(t)} )^{2} \end{aligned}$$

with Eq. (28) to find that \(G(v,v_{ab}^{(t)} )\ge F_{v_{ab} } (v)\) is equivalent to

$$\begin{aligned}&\frac{({\varvec{VW}}^{T}{\varvec{KW}})_{ab} +\alpha ({\varvec{D}}_{v} {\varvec{V}})_{ab} +\beta ({\varvec{C}}^{T}{\varvec{CVA}}^{T}{\varvec{A}})_{ab} }{v_{ab}^{(t)} }\nonumber \\&\quad \ge ({\varvec{W}}^{T}{\varvec{KW}})_{bb} +\alpha ({\varvec{L}}^{\mathrm{hyper}})_{aa} +\beta ({\varvec{C}}^{T}{\varvec{C}})_{aa} ({\varvec{A}}^{T}{\varvec{A}})_{bb}\nonumber \\ \end{aligned}$$
(30)

We have

$$\begin{aligned} ({{\varvec{VW}}^{T}{\varvec{KW}}})_{ab}= & {} \sum \limits _{q=1}^r {v_{aq}^{(t)} \left( {{\varvec{W}}^{T}{\varvec{KW}}} \right) _{qb} } \ge v_{ab}^{(t)} \left( {{\varvec{W}}^{T}{\varvec{KW}}} \right) _{bb} ;\\ \alpha ({\varvec{D}}_{v} {\varvec{V}})_{ab}= & {} \alpha \sum \limits _{j=1}^n {({\varvec{D}}_{v} )_{aj} v_{jb}^{(t)} } \ge \alpha ({\varvec{D}}_{v} )_{aa} v_{ab}^{(t)}\\\ge & {} \alpha ({\varvec{D}}_{v} -{\varvec{S}})_{aa} v_{ab}^{(t)} =\alpha ({\varvec{L}}^\mathrm{hyper})_{aa} v_{ab}^{(t)} \end{aligned}$$

And

$$\begin{aligned} \beta ({\varvec{C}}^{T}{\varvec{CVA}}^{T}{\varvec{A}})_{ab}= & {} \beta \sum \limits _{q=1}^r {\left( {{\varvec{C}}^{T}{\varvec{CV}}} \right) _{aq} \left( {{\varvec{A}}^{T}{\varvec{A}}} \right) _{qb} } \\\ge & {} \beta \left( {{\varvec{C}}^{T}{\varvec{CV}}} \right) _{ab} \left( {{\varvec{A}}^{T}{\varvec{A}}} \right) _{bb} \\\ge & {} \beta \sum \limits _{j=1}^n {\left( {{\varvec{C}}^{T}{\varvec{C}}} \right) _{aj} v_{jb}^{(t)} \left( {{\varvec{A}}^{T}{\varvec{A}}} \right) _{bb} } \\\ge & {} \beta v_{ab}^{\left( t \right) } \left( {{\varvec{C}}^{T}{\varvec{C}}} \right) _{aa} \left( {{\varvec{A}}^{T}{\varvec{A}}} \right) _{bb} \end{aligned}$$

Thus, Eq. (30) holds and \(G(v,v_{ab}^{(t)} )\ge F_{v_{ab} } (v)\). \(\square \)

Next we define an auxiliary function for the update rule in Eq. (23). Similarly, consider any element \(a_{ab} \) in A; we use \(F_{a_{ab} } \) to denote the part of \({\varvec{J}}({\varvec{A}})\) which is only relevant to \(a_{ab} \). It is easy to check that

$$\begin{aligned} {F'}_{a_{ab} }= & {} \left( {\frac{\partial {\varvec{J}}({\varvec{A}})}{\partial {\varvec{A}}}} \right) _{ab} \\= & {} \left( {-2{\varvec{YCV}}+2{\varvec{AV}}^{T}{\varvec{C}}^{T} {\varvec{CV}}+2\gamma {\varvec{A}}} \right) _{ab} , \\ {{{F}''}}_{a_{{{ab}}} }= & {} 2({{\varvec{V}}^{T}{\varvec{C}}^{T}{\varvec{CV}}})_{bb} +2\gamma \end{aligned}$$

Similarly, it is sufficient to show that each \(F_{a_{ab} } \) is nonincreasing under the update step of Eq. (23). Then the auxiliary function regarding \(a_{ab} \) is defined as follows:

Lemma 3

The function in Eq. (31) is an auxiliary function for \(F_{a_{ab}}\).

$$\begin{aligned} G(a,a_{ab}^{(t)} )= & {} F_{a_{ab} } (a_{ab}^{(t)} )+{F}'_{a_{ab} } (a_{ab}^{(t)} )(a-a_{ab}^{(t)} )\nonumber \\&+\frac{\left( {{\varvec{AV}}^{T}{\varvec{C}}^{T}{\varvec{CV}}+\gamma {\varvec{A}}} \right) _{ab} }{a_{ab}^{(t)} }(a-a_{ab}^{(t)} )^{2} \end{aligned}$$
(31)

Proof

Since \(G(a,a)=F_{a_{ab} } (a)\) is obvious, we need show that \(G(a,a_{ab}^{(t)} )\ge F_{a_{ab} } (a)\). To do this, we compare the Taylor series expansion of \(F_{a_{ab} } (a)\)

$$\begin{aligned} F_{a_{ab} } (v)= & {} F_{a_{ab} } (a_{ab}^{(t)} )+{F}'_{a_{ab} } (a_{ab}^{(t)} )\left( a-a_{ab}^{(t)} \right) \\&+\,[( {{\varvec{V}}^{T}{\varvec{C}}^{T}{\varvec{CV}}} )_{bb} +\gamma ]\left( a-a_{ab}^{(t)} \right) ^{2} \end{aligned}$$

With Eq. (28) to find that \(G(a,a_{ab}^{(t)} )\ge F_{a_{ab} } (a)\) is equivalent to

$$\begin{aligned} \frac{\left( {{\varvec{AV}}^{T}{\varvec{C}}^{T}{\varvec{CV}}+\gamma {\varvec{A}}} \right) _{ab} }{a_{ab}^{(t)} }\ge [( {{\varvec{V}}^{T}{\varvec{C}}^{T}{\varvec{CV}}})_{bb} +\gamma ] \end{aligned}$$
(32)

We have \(({\varvec{AV}}^{T}{\varvec{C}}^{T}{\varvec{CV}})_{ab} =\sum \limits _{q=1}^r {a_{aq}^{(t)} ({\varvec{V}}^{T}{\varvec{C}}^{T}{\varvec{CV}})_{qb} } \ge a_{ab}^{(t)} \) \(({\varvec{V}}^{T}{\varvec{C}}^{T}{\varvec{CV}})_{bb} \);

$$\begin{aligned} \gamma ({{\varvec{A}}})_{{\varvec{ab}}}= & {} \gamma ( {{\varvec{AI}}})_{{\varvec{ab}}} \\= & {} \gamma \sum \nolimits _{q=1}^r {a_{aq}^{(t)} ({\varvec{I}})_{qb} } \ge \gamma a_{ab}^{(t)} ({\varvec{I}})_{bb}\\= & {} \gamma a_{ab}^{(t)} \end{aligned}$$

Thus, Eq. (32) holds and \(G(a,a_{ab}^{(t)} )\ge F_{a_{ab} } (a)\). \(\square \)

Now we can demonstrate the convergence of Theorem 1:

Proof of Theorem 1

Replacing \(G(v,v_{ab}^{(t)} )\) in Eq. (28) by Eq. (20), we get

$$\begin{aligned} v_{ab}^{(t+1)}= & {} v_{ab}^{(t)} -v_{ab}^{(t)} \frac{{F}'_{v_{ab} } (v_{ab}^{(t)} )}{2({\varvec{VW}}^{T}{\varvec{KW}})_{ab} +2\alpha ({\varvec{D}}_{v} {\varvec{V}})_{ab} +2\beta ({\varvec{C}}^{T}{\varvec{CVA}}^{T}{\varvec{A}})_{ab} }\\= & {} v_{ab}^{(t)} \frac{\left( {{\varvec{KW}}+\alpha {\varvec{SV}}+\beta {\varvec{C}}^{T}{\varvec{Y}}^{T}{\varvec{A}}} \right) _{ab} }{\left( {{\varvec{VW}}^{T}{\varvec{KW}}+\alpha {\varvec{D}}_{v} {\varvec{V}}+\beta {\varvec{C}}^{T}{\varvec{CVA}}^{T}{\varvec{A}}} \right) _{ab} } \end{aligned}$$

Since Eq. (20) is an auxiliary function, \(F_{v_{ab} } \) is nonincreasing under this updating rule.

Similarly, replacing \(G(a,a_{ab}^{(t)} )\) in Eq. (28) by Eq. (23), we get

$$\begin{aligned} a_{ab}^{(t+1)}= & {} a_{ab}^{(t)} -a_{ab}^{(t)} \frac{{F}'_{a_{ab} } (a_{ab}^{(t)} )}{2({\varvec{AV}}^{T}{\varvec{C}}^{T}{\varvec{CV}}+\gamma {\varvec{A}})_{ab} }\\= & {} a_{ab}^{(t)} \frac{\left( {{\varvec{YCV}}} \right) _{ab} }{\left( {{\varvec{AV}}^{T}{\varvec{C}}^{T}{\varvec{CV}}+\gamma {\varvec{A}}} \right) _{ab} } \end{aligned}$$

Since Eq. (22) is an auxiliary function, \(F_{a_{ab} } \) is nonincreasing under this updating rule. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ye, J., Jin, Z. Hyper-graph regularized discriminative concept factorization for data representation. Soft Comput 22, 4417–4429 (2018). https://doi.org/10.1007/s00500-017-2636-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-017-2636-1

Keywords

Navigation