Skip to main content
Log in

Non-negative Matrix Factorization with Pairwise Constraints and Graph Laplacian

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Non-negative matrix factorization (NMF) is a very effective method for high dimensional data analysis, which has been widely used in information retrieval, computer vision, and pattern recognition. NMF aims to find two non-negative matrices whose product approximates the original matrix well. It can capture the underlying structure of data in the low dimensional data space using its parts-based representations. However, NMF is actually an unsupervised method without making use of prior information of data. In this paper, we propose a novel pairwise constrained non-negative matrix factorization with graph Laplacian method, which not only utilizes the local structure of the data by graph Laplacian, but also incorporates pairwise constraints generated among all labeled data into NMF framework. More specifically, we expect that data points which have the same class label will have very similar representations in the low dimensional space as much as possible, while data points with different class labels will have dissimilar representations as much as possible. Consequently, all data points are represented with more discriminating power in the lower dimensional space. We compare our approach with other typical methods and experimental results for image clustering show that this novel algorithm achieves the state-of-the-art performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. http://www.face-rec.org/databases/.

  2. http://www.face-rec.org/databases/.

  3. http://www.cad.zju.edu.cn/home/dengcai/Data/MLData.html.

References

  1. Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396

    Article  Google Scholar 

  2. Cai D, He X, Han J, Huang T (2011) Graph regularized nonnegative matrix factorization for data representation. IEEE Trans Pattern Anal Mach Intell 33(8):1548–1560

    Article  Google Scholar 

  3. Chapelle O, Schölkopf B, Zien A et al (2006) Semi-supervised learning, vol 2. MIT Press, Cambridge

    Book  Google Scholar 

  4. Das Gupta M, Xiao J (2011) Non-negative matrix factorization as a feature selection tool for maximum margin classifiers. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 2841–2848

  5. Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc Ser B (Methodological) 39:1–38

    Google Scholar 

  6. Grira N, Crucianu M, Boujemaa N (2005) Semi-supervised fuzzy clustering with pairwise-constrained competitive agglomeration. In: The 14th IEEE International Conference on Fuzzy Systems, 2005. FUZZ’05. IEEE, pp 867–872

  7. Guillamet D, Vitria J (2002) Classifying faces with nonnegative matrix factorization. In: Proceedings of 5th Catalan Conference for Artificial Intelligence

  8. He R, Zheng W, Hu B, Kong X (2011) Nonnegative sparse coding for discriminative semi-supervised learning. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 2849–2856

  9. He Y, Lu H, Xie S (2013) Semi-supervised non-negative matrix factorization for image clustering with graph laplacian. Multim Tools Appl. doi:10.1007/s11042-013-1465-1

  10. Kim J, Park H (2008) Sparse nonnegative matrix factorization for clustering. CSE Technical Reports, Georgia Institute of Technology, pp 1–16

  11. Lee D, Seung H (2001) Algorithms for non-negative matrix factorization. Adv Neural Inf Process Syst 13:556–562

    Google Scholar 

  12. Lee D, Seung H et al (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791

    Article  Google Scholar 

  13. Li T, Ding C (2006) The relationships among various nonnegative matrix factorization methods for clustering. In: 6th International Conference on Data Mining, 2006. ICDM’06. IEEE, pp 362–371

  14. Liu H, Wu Z (2010) Non-negative matrix factorization with constraints. In: 24th AAAI Conference on Artificial Intelligence.

  15. Liu H, Wu Z, Cai D, Huang T (2011) Constrained non-negative matrix factorization for image representation. IEEE Trans Pattern Anal Mach Intell 99:1–1

    Google Scholar 

  16. Liu H, Wu Z, Li X, Cai D, Huang TS (2012) Constrained nonnegative matrix factorization for image representation. IEEE Trans Pattern Anal Mach Intell 34(7):1299–1311

    Article  Google Scholar 

  17. Lovász L, Plummer M (1986) Matching theory. Elsevier Science Ltd., Amsterdam, p 121

    Google Scholar 

  18. Niyogi X (2004) Locality preserving projections. In: Advances in neural information processing systems 16: proceedings of the 2003 conference, vol. 16. The MIT Press, p 153

  19. Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: IEEE Conference on Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE, pp 1–8

  20. Roweis S, Saul L (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323

    Article  Google Scholar 

  21. Saul L, Pereira F (1997) Aggregate and mixed-order markov models for statistical language processing. In: Proceedings of the second conference on empirical methods in natural language processing. Association for Computational Linguistics, Somerset, New Jersey, pp 81–89

  22. Shashua A, Hazan T (2005) Non-negative tensor factorization with applications to statistics and computer vision. In: Proceedings of the 22nd international conference on Machine learning. ACM, pp 792–799

  23. Wang F, Li T, Zhang C (2008) Semi-supervised clustering via matrix factorization. In: Proceedings of The 8th SIAM Conference on Data Mining

  24. Welling M (2005) Fisher linear discriminant analysis. Technical report, vol 3. Department of Computer Science, University of Toronto

  25. Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemom Intell Lab Syst 2(1–3):37–52

    Article  Google Scholar 

  26. Wright J, Yang A, Ganesh A, Sastry S, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31(2):210–227

    Article  Google Scholar 

  27. Wu M, Scholkopf B (2007) A local learning approach for clustering. Adv Neural Inf Process Syst 19:1529

    Google Scholar 

  28. Xu W, Gong Y (2004) Document clustering by concept factorization. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 202–209

  29. Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval. ACM, pp 267–273

  30. Yang L, Jin R, Sukthankar R (2008) Semi-supervised learning with weakly-related unlabeled data: Towards better text categorization. In: 22nd annual conference on neural information processing systems, Citeseer.

  31. Yang Y, Hu B (2007) Pairwise constraints-guided non-negative matrix factorization for document clustering. In: IEEE/WIC/ACM international conference on web intelligence. IEEE, pp 250–256

  32. Yang Y, Shen HT, Nie F, Ji R, Zhou X (2011) Nonnegative spectral clustering with discriminative regularization. In: AAAI

  33. Ye J, Zhao Z, Wu M (2007) Discriminative k-means for clustering. Adv Neural Inf Process Syst 20:1649–1656

    Google Scholar 

  34. Zhang L, Yang M, Feng X (2011) Sparse representation or collaborative representation: Which helps face recognition? In: 2011 IEEE international conference on Computer Vision (ICCV). IEEE, pp 471–478

  35. Zhang Y, Yeung D (2008) Semi-supervised discriminant analysis using robust path-based similarity. In: IEEE conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE, pp 1–8

  36. Zhang Z, Wang J, Zha H (2005) Adaptive manifold learning. IEEE Trans Pattern Anal Mach Intell 99:1–1

    Google Scholar 

  37. Zhang Z, Zha H, Zhang M (2008) Spectral methods for semi-supervised manifold learning. In: IEEE conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE, pp 1–6

Download references

Acknowledgments

This work was supported in part by NSFC (no. 61272247), the National Basic Research Program of China (973 program) under Grant 2009CB320901, the National High Technology Research and Development Program of China (863 program) under Grant 2008AA02Z310, the National Natural Science Foundation of China under Grant 60873133, arts and Science Cross Special Fund of Shanghai Jiao Tong University under Grant 13JCY14.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yang-Cheng He.

Appendix

Appendix

In this section, we prove the convergence of PCGNMF. We begin with the following theorem regarding the iterative updating rules in Eqs. (15) and (16).

Theorem 1

The objective function \({J}\) is nonincreasing under the iterative updating rules in Eqs. (15) and (16). The objective function is invariant under these updates if and only if \(\mathbf {U}\) and \(\mathbf {V}\) are at a stationary point.

Theorem 1 guarantees that these iterative updating rules of \(\mathbf {U}\) and \(\mathbf {V}\) in Eqs. (15) and (16) can converge on a stationary point and hence final solution will be a local optimum. To prove Theorem 1, we have to show that \(J \) is nonincreasing under the iterative updating rules in Eqs. (15) and (16). Since the second term and the third term of \(J \) are only related to \(\mathbf {V}\), and the iterative updating rule (16) is exactly the same as update formula for \(\mathbf {U}\) in the NMF. The convergence proof of NMF has shown that \(J \) is nonincreasing under the iterative updating rule in Eq. (16) [11]. So, we only need to prove that \(J \) is nonincreasing under the iterative updating rule in Eq. (15). Firstly, we make use of a similar auxiliary function which has been used in the Expectation-Maximization algorithm [5, 21].

Definition

G \((v, v')\) is an auxiliary function for F(\(v\)) if the conditions G \((v, v')\) \(\ge \) F(\(v\)), G \((v, v)\) = F(\(v\)) are satisfied.

We have the following lemma regarding the very useful auxiliary function, which will be helpful to prove the convergence of the objective function.

Lemma 1

If G is an auxiliary function of F, then F is nonincreasing under the update

$$\begin{aligned} v^{(t+1)} = \arg \min \limits _{v} \textit{G}(v, v^t) \end{aligned}$$
(20)

Proof

F \((v^{(t+1)})\) \(\le \) G \((v^{(t+1)},v^{t})\) \(\le \) G \((v^{t},v^{t})=\) F \((v^{t})\)

Now, we will prove that the iterative updating rule for \(\mathbf {V}\) in Eq. (15) is exactly the update rule in Eq. (20) with an appropriate auxiliary function. For any entry \(v_{ab}\) in \(\mathbf {V}\), we use \(F_{v_{ab}}\) to denote the part of \(J \) only relevant to \(v_{ab}\). It is easy to check that

$$\begin{aligned} \textit{F}'_{v_{ab}}&= (\frac{\partial J }{\partial {\mathbf {V}}})_{ab} =-2(\mathbf {X}^T\mathbf {U})_{ab}+2(\mathbf {V}\mathbf {U}^T\mathbf {U})_{ab} + 2\alpha (\mathbf {D}\mathbf {V})_{ab} \nonumber \\&-2\alpha (\mathbf {W}\mathbf {V})_{ab} + \beta (\mathbf {M}\mathbf {V}\mathbf {A}+\mathbf {C}\mathbf {V})_{ab}\end{aligned}$$
(21)
$$\begin{aligned} \textit{F}''_{v_{ab}}&= 2(\mathbf {U}^T\mathbf {U})_{bb} + 2\alpha \mathbf {D}_{aa} - 2\alpha \mathbf {W}_{aa} + \beta \mathbf {C}_{aa} \end{aligned}$$
(22)

Where \(\textit{F}'\), \(\textit{F}''\) are the first and second order derivative with respect to \(\mathbf {V}\), respectively. \(\square \)

Lemma 2

The function

$$\begin{aligned} \textit{G}(v,v^{(t)}_{ab})&= \textit{F}_{v_{ab}}(v^{(t)}_{ab})+\textit{F}'_{v_{ab}}(v^{(t)}_{ab})(v-v^{(t)}_{ab}) \nonumber \\&+\frac{(\mathbf {V}\mathbf {U}^T\mathbf {U})_{ab} + \alpha (\mathbf {D}\mathbf {V})_{ab} + \frac{\beta }{2}(\mathbf {M}\mathbf {V}\mathbf {A}+\mathbf {C}\mathbf {V})_{ab}}{v^{(t)}_{ab}} (v-v^{(t)}_{ab})^2 \end{aligned}$$
(23)

is an auxiliary function for \(F_{v_{ab}}\), and it is the part of \(J \) related \(v_{ab}\).

Proof

Since \(\textit{G}(v,v)=F_{v_{ab}}(v)\) is explicit, we only have to show that \(\textit{G}(v,v^{(t)}_{ab})\ge F_{v_{ab}}(v)\). In order to achieve that, we can compare the Taylor series expansion of \(F_{v_{ab}}(v)\) with the auxiliary function \(\textit{G}(v,v^{(t)}_{ab})\).

$$\begin{aligned} \textit{F}_{v_{ab}}(v)&= \textit{F}_{v_{ab}}(v^{(t)}_{ab})+\textit{F}'_{v_{ab}}(v^{(t)}_{ab})(v-v^{(t)}_{ab}) \\&+\left[ (\mathbf {U}^T\mathbf {U})_{bb} + \alpha \mathbf {D}_{aa} - \alpha \mathbf {W}_{aa} + \frac{\beta }{2} \mathbf {C}_{aa}\right] (v-v^{(t)}_{ab})^2 \nonumber \end{aligned}$$
(24)

Clearly, showing \(\textit{G}(v,v^{(t)}_{ab})\ge F_{v_{ab}}(v)\) is equivalent to prove that

$$\begin{aligned} \frac{(\mathbf {V}\mathbf {U}^T\mathbf {U})_{ab} + \alpha (\mathbf {D}\mathbf {V})_{ab} + \frac{\beta }{2} (\mathbf {M}\mathbf {V}\mathbf {A}+\mathbf {C}\mathbf {V})_{ab}}{v^{(t)}_{ab}}&\ge (\mathbf {U}^T\mathbf {U})_{bb} + \alpha \mathbf {D}_{aa}\\&- \alpha \mathbf {W}_{aa} + \frac{\beta }{2} \mathbf {C}_{aa} \nonumber \end{aligned}$$
(25)

In order to prove above inequality holds, we have

$$\begin{aligned} (\mathbf {V}\mathbf {U}^T\mathbf {U})_{ab}=\sum _{c=1}^k v_{ac}^{(t)}(\mathbf {U}^T\mathbf {U})_{cb} \ge v_{ab}^{(t)}(\mathbf {U}^T\mathbf {U})_{bb} \end{aligned}$$
(26)

and

$$\begin{aligned} \alpha (\mathbf {D}\mathbf {V})_{ab}&= \alpha \sum _{j=1}^n \mathbf {D}_{aj}v_{jb}^{(t)} \ge \alpha \mathbf {D}_{aa} v_{ab}^{(t)}\end{aligned}$$
(27)
$$\begin{aligned} \frac{\beta }{2} (\mathbf {C}\mathbf {V})_{ab}&= \frac{\beta }{2} \sum _{j=1}^n \mathbf {C}_{aj}v_{jb}^{(t)} \ge \frac{\beta }{2} v_{ab}^{(t)}\mathbf {C}_{aa} \end{aligned}$$
(28)

Therefore, the inequality \(\textit{G}(v,v^{(t)}_{ab})\ge F_{v_{ab}}(v)\) holds.\(\square \)

Now, we can show the convergence of Theorem 1 for \(\mathbf {V}\):

Proof of Theorem 1

we can replace \(\textit{G}(v,v^{(t)}_{ab})\) in Eq. (20) by Eq. (23) to obtain the update rule which is exactly the same as the iterative updating rule for \(\mathbf {V}\).

$$\begin{aligned} v_{ab}^{(t+1)}&= v_{ab}^{(t)}\frac{[2\mathbf {V}\mathbf {U}^T\mathbf {U}+ 2\alpha \mathbf {DV} + \beta (\mathbf {M}\mathbf {V}\mathbf {A}+\mathbf {C}\mathbf {V})]_{ab}-\textit{F}'_{v_{ab}}(v^{(t)}_{ab})}{[2\mathbf {V}\mathbf {U}^T\mathbf {U}+ 2\alpha \mathbf {DV} + \beta (\mathbf {M}\mathbf {V}\mathbf {A}+\mathbf {C}\mathbf {V})]_{ab}}\\&= v_{ab}^{(t)}\frac{2(\mathbf {X}^T\mathbf {U})_{ab} + 2\alpha (\mathbf {W}\mathbf {V})_{ab} }{2(\mathbf {V}\mathbf {U}^T\mathbf {U})_{ab} + 2\alpha (\mathbf {DV})_{ab} + \beta (\mathbf {M}\mathbf {V}\mathbf {A}+\mathbf {C}\mathbf {V})_{ab}}\nonumber \end{aligned}$$
(29)

Since Eq. (23) is an auxiliary function, \(\textit{F}_{v_{ab}}\) is nonincreasing under this updating rule with Lemma 2.\(\square \)

Rights and permissions

Reprints and permissions

About this article

Cite this article

He, YC., Lu, HT., Huang, L. et al. Non-negative Matrix Factorization with Pairwise Constraints and Graph Laplacian. Neural Process Lett 42, 167–185 (2015). https://doi.org/10.1007/s11063-014-9350-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-014-9350-0

Keywords

Navigation