Non-negative Matrix Factorization with Pairwise Constraints and Graph Laplacian

He, Yang-Cheng; Lu, Hong-Tao; Huang, Lei; Shi, Xiao-Hua

doi:10.1007/s11063-014-9350-0

Non-negative Matrix Factorization with Pairwise Constraints and Graph Laplacian

Published: 12 April 2014

Volume 42, pages 167–185, (2015)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Yang-Cheng He¹,
Hong-Tao Lu¹,
Lei Huang¹ &
…
Xiao-Hua Shi¹

570 Accesses
12 Citations
Explore all metrics

Abstract

Non-negative matrix factorization (NMF) is a very effective method for high dimensional data analysis, which has been widely used in information retrieval, computer vision, and pattern recognition. NMF aims to find two non-negative matrices whose product approximates the original matrix well. It can capture the underlying structure of data in the low dimensional data space using its parts-based representations. However, NMF is actually an unsupervised method without making use of prior information of data. In this paper, we propose a novel pairwise constrained non-negative matrix factorization with graph Laplacian method, which not only utilizes the local structure of the data by graph Laplacian, but also incorporates pairwise constraints generated among all labeled data into NMF framework. More specifically, we expect that data points which have the same class label will have very similar representations in the low dimensional space as much as possible, while data points with different class labels will have dissimilar representations as much as possible. Consequently, all data points are represented with more discriminating power in the lower dimensional space. We compare our approach with other typical methods and experimental results for image clustering show that this novel algorithm achieves the state-of-the-art performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Constrained Non-negative Matrix Factorization with Graph Laplacian

Constrained Dual Graph Regularized NMF for Image Clustering

Robust Graph Regularized Non-negative Matrix Factorization for Image Clustering

Notes

References

Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396
Article Google Scholar
Cai D, He X, Han J, Huang T (2011) Graph regularized nonnegative matrix factorization for data representation. IEEE Trans Pattern Anal Mach Intell 33(8):1548–1560
Article Google Scholar
Chapelle O, Schölkopf B, Zien A et al (2006) Semi-supervised learning, vol 2. MIT Press, Cambridge
Book Google Scholar
Das Gupta M, Xiao J (2011) Non-negative matrix factorization as a feature selection tool for maximum margin classifiers. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 2841–2848
Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc Ser B (Methodological) 39:1–38
Google Scholar
Grira N, Crucianu M, Boujemaa N (2005) Semi-supervised fuzzy clustering with pairwise-constrained competitive agglomeration. In: The 14th IEEE International Conference on Fuzzy Systems, 2005. FUZZ’05. IEEE, pp 867–872
Guillamet D, Vitria J (2002) Classifying faces with nonnegative matrix factorization. In: Proceedings of 5th Catalan Conference for Artificial Intelligence
He R, Zheng W, Hu B, Kong X (2011) Nonnegative sparse coding for discriminative semi-supervised learning. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 2849–2856
He Y, Lu H, Xie S (2013) Semi-supervised non-negative matrix factorization for image clustering with graph laplacian. Multim Tools Appl. doi:10.1007/s11042-013-1465-1
Kim J, Park H (2008) Sparse nonnegative matrix factorization for clustering. CSE Technical Reports, Georgia Institute of Technology, pp 1–16
Lee D, Seung H (2001) Algorithms for non-negative matrix factorization. Adv Neural Inf Process Syst 13:556–562
Google Scholar
Lee D, Seung H et al (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791
Article Google Scholar
Li T, Ding C (2006) The relationships among various nonnegative matrix factorization methods for clustering. In: 6th International Conference on Data Mining, 2006. ICDM’06. IEEE, pp 362–371
Liu H, Wu Z (2010) Non-negative matrix factorization with constraints. In: 24th AAAI Conference on Artificial Intelligence.
Liu H, Wu Z, Cai D, Huang T (2011) Constrained non-negative matrix factorization for image representation. IEEE Trans Pattern Anal Mach Intell 99:1–1
Google Scholar
Liu H, Wu Z, Li X, Cai D, Huang TS (2012) Constrained nonnegative matrix factorization for image representation. IEEE Trans Pattern Anal Mach Intell 34(7):1299–1311
Article Google Scholar
Lovász L, Plummer M (1986) Matching theory. Elsevier Science Ltd., Amsterdam, p 121
Google Scholar
Niyogi X (2004) Locality preserving projections. In: Advances in neural information processing systems 16: proceedings of the 2003 conference, vol. 16. The MIT Press, p 153
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: IEEE Conference on Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE, pp 1–8
Roweis S, Saul L (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323
Article Google Scholar
Saul L, Pereira F (1997) Aggregate and mixed-order markov models for statistical language processing. In: Proceedings of the second conference on empirical methods in natural language processing. Association for Computational Linguistics, Somerset, New Jersey, pp 81–89
Shashua A, Hazan T (2005) Non-negative tensor factorization with applications to statistics and computer vision. In: Proceedings of the 22nd international conference on Machine learning. ACM, pp 792–799
Wang F, Li T, Zhang C (2008) Semi-supervised clustering via matrix factorization. In: Proceedings of The 8th SIAM Conference on Data Mining
Welling M (2005) Fisher linear discriminant analysis. Technical report, vol 3. Department of Computer Science, University of Toronto
Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemom Intell Lab Syst 2(1–3):37–52
Article Google Scholar
Wright J, Yang A, Ganesh A, Sastry S, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31(2):210–227
Article Google Scholar
Wu M, Scholkopf B (2007) A local learning approach for clustering. Adv Neural Inf Process Syst 19:1529
Google Scholar
Xu W, Gong Y (2004) Document clustering by concept factorization. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 202–209
Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval. ACM, pp 267–273
Yang L, Jin R, Sukthankar R (2008) Semi-supervised learning with weakly-related unlabeled data: Towards better text categorization. In: 22nd annual conference on neural information processing systems, Citeseer.
Yang Y, Hu B (2007) Pairwise constraints-guided non-negative matrix factorization for document clustering. In: IEEE/WIC/ACM international conference on web intelligence. IEEE, pp 250–256
Yang Y, Shen HT, Nie F, Ji R, Zhou X (2011) Nonnegative spectral clustering with discriminative regularization. In: AAAI
Ye J, Zhao Z, Wu M (2007) Discriminative k-means for clustering. Adv Neural Inf Process Syst 20:1649–1656
Google Scholar
Zhang L, Yang M, Feng X (2011) Sparse representation or collaborative representation: Which helps face recognition? In: 2011 IEEE international conference on Computer Vision (ICCV). IEEE, pp 471–478
Zhang Y, Yeung D (2008) Semi-supervised discriminant analysis using robust path-based similarity. In: IEEE conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE, pp 1–8
Zhang Z, Wang J, Zha H (2005) Adaptive manifold learning. IEEE Trans Pattern Anal Mach Intell 99:1–1
Google Scholar
Zhang Z, Zha H, Zhang M (2008) Spectral methods for semi-supervised manifold learning. In: IEEE conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE, pp 1–6

Download references

Acknowledgments

This work was supported in part by NSFC (no. 61272247), the National Basic Research Program of China (973 program) under Grant 2009CB320901, the National High Technology Research and Development Program of China (863 program) under Grant 2008AA02Z310, the National Natural Science Foundation of China under Grant 60873133, arts and Science Cross Special Fund of Shanghai Jiao Tong University under Grant 13JCY14.

Author information

Authors and Affiliations

MOE-Microsoft Laboratory for Intelligent Computing and Intelligent Systems, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai , 200240, People’s Republic of China
Yang-Cheng He, Hong-Tao Lu, Lei Huang & Xiao-Hua Shi

Authors

Yang-Cheng He
View author publications
You can also search for this author in PubMed Google Scholar
Hong-Tao Lu
View author publications
You can also search for this author in PubMed Google Scholar
Lei Huang
View author publications
You can also search for this author in PubMed Google Scholar
Xiao-Hua Shi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yang-Cheng He.

Appendix

In this section, we prove the convergence of PCGNMF. We begin with the following theorem regarding the iterative updating rules in Eqs. (15) and (16).

Theorem 1

The objective function ${J}$ is nonincreasing under the iterative updating rules in Eqs. (15) and (16). The objective function is invariant under these updates if and only if $\mathbf {U}$ and $\mathbf {V}$ are at a stationary point.

Theorem 1 guarantees that these iterative updating rules of $\mathbf {U}$ and $\mathbf {V}$ in Eqs. (15) and (16) can converge on a stationary point and hence final solution will be a local optimum. To prove Theorem 1, we have to show that $J $ is nonincreasing under the iterative updating rules in Eqs. (15) and (16). Since the second term and the third term of $J $ are only related to $\mathbf {V}$, and the iterative updating rule (16) is exactly the same as update formula for $\mathbf {U}$ in the NMF. The convergence proof of NMF has shown that $J $ is nonincreasing under the iterative updating rule in Eq. (16) [11]. So, we only need to prove that $J $ is nonincreasing under the iterative updating rule in Eq. (15). Firstly, we make use of a similar auxiliary function which has been used in the Expectation-Maximization algorithm [5, 21].

Definition

G $(v, v')$ is an auxiliary function for F($v$) if the conditions G $(v, v')$ $\ge $ F($v$), G $(v, v)$ = F($v$) are satisfied.

We have the following lemma regarding the very useful auxiliary function, which will be helpful to prove the convergence of the objective function.

Lemma 1

If G is an auxiliary function of F, then F is nonincreasing under the update

$$\begin{aligned} v^{(t+1)} = \arg \min \limits _{v} \textit{G}(v, v^t) \end{aligned}$$

(20)

Proof

F $(v^{(t+1)})$ $\le $ G $(v^{(t+1)},v^{t})$ $\le $ G $(v^{t},v^{t})=$ F $(v^{t})$

Now, we will prove that the iterative updating rule for $\mathbf {V}$ in Eq. (15) is exactly the update rule in Eq. (20) with an appropriate auxiliary function. For any entry $v_{ab}$ in $\mathbf {V}$, we use $F_{v_{ab}}$ to denote the part of $J $ only relevant to $v_{ab}$. It is easy to check that

$$\begin{aligned} \textit{F}'_{v_{ab}}&= (\frac{\partial J }{\partial {\mathbf {V}}})_{ab} =-2(\mathbf {X}^T\mathbf {U})_{ab}+2(\mathbf {V}\mathbf {U}^T\mathbf {U})_{ab} + 2\alpha (\mathbf {D}\mathbf {V})_{ab} \nonumber \\&-2\alpha (\mathbf {W}\mathbf {V})_{ab} + \beta (\mathbf {M}\mathbf {V}\mathbf {A}+\mathbf {C}\mathbf {V})_{ab}\end{aligned}$$

(21)

$$\begin{aligned} \textit{F}''_{v_{ab}}&= 2(\mathbf {U}^T\mathbf {U})_{bb} + 2\alpha \mathbf {D}_{aa} - 2\alpha \mathbf {W}_{aa} + \beta \mathbf {C}_{aa} \end{aligned}$$

(22)

Where $\textit{F}'$, $\textit{F}''$ are the first and second order derivative with respect to $\mathbf {V}$, respectively. $\square $

Lemma 2

The function

$$\begin{aligned} \textit{G}(v,v^{(t)}_{ab})&= \textit{F}_{v_{ab}}(v^{(t)}_{ab})+\textit{F}'_{v_{ab}}(v^{(t)}_{ab})(v-v^{(t)}_{ab}) \nonumber \\&+\frac{(\mathbf {V}\mathbf {U}^T\mathbf {U})_{ab} + \alpha (\mathbf {D}\mathbf {V})_{ab} + \frac{\beta }{2}(\mathbf {M}\mathbf {V}\mathbf {A}+\mathbf {C}\mathbf {V})_{ab}}{v^{(t)}_{ab}} (v-v^{(t)}_{ab})^2 \end{aligned}$$

(23)

is an auxiliary function for $F_{v_{ab}}$, and it is the part of $J $ related $v_{ab}$.

Proof

Since $\textit{G}(v,v)=F_{v_{ab}}(v)$ is explicit, we only have to show that $\textit{G}(v,v^{(t)}_{ab})\ge F_{v_{ab}}(v)$. In order to achieve that, we can compare the Taylor series expansion of $F_{v_{ab}}(v)$ with the auxiliary function $\textit{G}(v,v^{(t)}_{ab})$.

$$\begin{aligned} \textit{F}_{v_{ab}}(v)&= \textit{F}_{v_{ab}}(v^{(t)}_{ab})+\textit{F}'_{v_{ab}}(v^{(t)}_{ab})(v-v^{(t)}_{ab}) \\&+\left[ (\mathbf {U}^T\mathbf {U})_{bb} + \alpha \mathbf {D}_{aa} - \alpha \mathbf {W}_{aa} + \frac{\beta }{2} \mathbf {C}_{aa}\right] (v-v^{(t)}_{ab})^2 \nonumber \end{aligned}$$

(24)

Clearly, showing $\textit{G}(v,v^{(t)}_{ab})\ge F_{v_{ab}}(v)$ is equivalent to prove that

$$\begin{aligned} \frac{(\mathbf {V}\mathbf {U}^T\mathbf {U})_{ab} + \alpha (\mathbf {D}\mathbf {V})_{ab} + \frac{\beta }{2} (\mathbf {M}\mathbf {V}\mathbf {A}+\mathbf {C}\mathbf {V})_{ab}}{v^{(t)}_{ab}}&\ge (\mathbf {U}^T\mathbf {U})_{bb} + \alpha \mathbf {D}_{aa}\\&- \alpha \mathbf {W}_{aa} + \frac{\beta }{2} \mathbf {C}_{aa} \nonumber \end{aligned}$$

(25)

In order to prove above inequality holds, we have

$$\begin{aligned} (\mathbf {V}\mathbf {U}^T\mathbf {U})_{ab}=\sum _{c=1}^k v_{ac}^{(t)}(\mathbf {U}^T\mathbf {U})_{cb} \ge v_{ab}^{(t)}(\mathbf {U}^T\mathbf {U})_{bb} \end{aligned}$$

(26)

and

$$\begin{aligned} \alpha (\mathbf {D}\mathbf {V})_{ab}&= \alpha \sum _{j=1}^n \mathbf {D}_{aj}v_{jb}^{(t)} \ge \alpha \mathbf {D}_{aa} v_{ab}^{(t)}\end{aligned}$$

(27)

$$\begin{aligned} \frac{\beta }{2} (\mathbf {C}\mathbf {V})_{ab}&= \frac{\beta }{2} \sum _{j=1}^n \mathbf {C}_{aj}v_{jb}^{(t)} \ge \frac{\beta }{2} v_{ab}^{(t)}\mathbf {C}_{aa} \end{aligned}$$

(28)

Therefore, the inequality $\textit{G}(v,v^{(t)}_{ab})\ge F_{v_{ab}}(v)$ holds.$\square $

Now, we can show the convergence of Theorem 1 for $\mathbf {V}$:

Proof of Theorem 1

we can replace $\textit{G}(v,v^{(t)}_{ab})$ in Eq. (20) by Eq. (23) to obtain the update rule which is exactly the same as the iterative updating rule for $\mathbf {V}$.

$$\begin{aligned} v_{ab}^{(t+1)}&= v_{ab}^{(t)}\frac{[2\mathbf {V}\mathbf {U}^T\mathbf {U}+ 2\alpha \mathbf {DV} + \beta (\mathbf {M}\mathbf {V}\mathbf {A}+\mathbf {C}\mathbf {V})]_{ab}-\textit{F}'_{v_{ab}}(v^{(t)}_{ab})}{[2\mathbf {V}\mathbf {U}^T\mathbf {U}+ 2\alpha \mathbf {DV} + \beta (\mathbf {M}\mathbf {V}\mathbf {A}+\mathbf {C}\mathbf {V})]_{ab}}\\&= v_{ab}^{(t)}\frac{2(\mathbf {X}^T\mathbf {U})_{ab} + 2\alpha (\mathbf {W}\mathbf {V})_{ab} }{2(\mathbf {V}\mathbf {U}^T\mathbf {U})_{ab} + 2\alpha (\mathbf {DV})_{ab} + \beta (\mathbf {M}\mathbf {V}\mathbf {A}+\mathbf {C}\mathbf {V})_{ab}}\nonumber \end{aligned}$$

(29)

Since Eq. (23) is an auxiliary function, $\textit{F}_{v_{ab}}$ is nonincreasing under this updating rule with Lemma 2.$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

He, YC., Lu, HT., Huang, L. et al. Non-negative Matrix Factorization with Pairwise Constraints and Graph Laplacian. Neural Process Lett 42, 167–185 (2015). https://doi.org/10.1007/s11063-014-9350-0

Download citation

Published: 12 April 2014
Issue Date: August 2015
DOI: https://doi.org/10.1007/s11063-014-9350-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Non-negative Matrix Factorization with Pairwise Constraints and Graph Laplacian

Abstract

Access this article

Similar content being viewed by others

Constrained Non-negative Matrix Factorization with Graph Laplacian

Constrained Dual Graph Regularized NMF for Image Clustering

Robust Graph Regularized Non-negative Matrix Factorization for Image Clustering

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Theorem 1

Definition

Lemma 1

Proof

Lemma 2

Proof

Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Non-negative Matrix Factorization with Pairwise Constraints and Graph Laplacian

Abstract

Access this article

Similar content being viewed by others

Constrained Non-negative Matrix Factorization with Graph Laplacian

Constrained Dual Graph Regularized NMF for Image Clustering

Robust Graph Regularized Non-negative Matrix Factorization for Image Clustering

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Theorem 1

Definition

Lemma 1

Proof

Lemma 2

Proof

Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation