Abstract
Non-negative matrix factorization (NMF) is a very effective method for high dimensional data analysis, which has been widely used in information retrieval, computer vision, and pattern recognition. NMF aims to find two non-negative matrices whose product approximates the original matrix well. It can capture the underlying structure of data in the low dimensional data space using its parts-based representations. However, NMF is actually an unsupervised method without making use of prior information of data. In this paper, we propose a novel pairwise constrained non-negative matrix factorization with graph Laplacian method, which not only utilizes the local structure of the data by graph Laplacian, but also incorporates pairwise constraints generated among all labeled data into NMF framework. More specifically, we expect that data points which have the same class label will have very similar representations in the low dimensional space as much as possible, while data points with different class labels will have dissimilar representations as much as possible. Consequently, all data points are represented with more discriminating power in the lower dimensional space. We compare our approach with other typical methods and experimental results for image clustering show that this novel algorithm achieves the state-of-the-art performance.
Similar content being viewed by others
References
Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396
Cai D, He X, Han J, Huang T (2011) Graph regularized nonnegative matrix factorization for data representation. IEEE Trans Pattern Anal Mach Intell 33(8):1548–1560
Chapelle O, Schölkopf B, Zien A et al (2006) Semi-supervised learning, vol 2. MIT Press, Cambridge
Das Gupta M, Xiao J (2011) Non-negative matrix factorization as a feature selection tool for maximum margin classifiers. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 2841–2848
Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc Ser B (Methodological) 39:1–38
Grira N, Crucianu M, Boujemaa N (2005) Semi-supervised fuzzy clustering with pairwise-constrained competitive agglomeration. In: The 14th IEEE International Conference on Fuzzy Systems, 2005. FUZZ’05. IEEE, pp 867–872
Guillamet D, Vitria J (2002) Classifying faces with nonnegative matrix factorization. In: Proceedings of 5th Catalan Conference for Artificial Intelligence
He R, Zheng W, Hu B, Kong X (2011) Nonnegative sparse coding for discriminative semi-supervised learning. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 2849–2856
He Y, Lu H, Xie S (2013) Semi-supervised non-negative matrix factorization for image clustering with graph laplacian. Multim Tools Appl. doi:10.1007/s11042-013-1465-1
Kim J, Park H (2008) Sparse nonnegative matrix factorization for clustering. CSE Technical Reports, Georgia Institute of Technology, pp 1–16
Lee D, Seung H (2001) Algorithms for non-negative matrix factorization. Adv Neural Inf Process Syst 13:556–562
Lee D, Seung H et al (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791
Li T, Ding C (2006) The relationships among various nonnegative matrix factorization methods for clustering. In: 6th International Conference on Data Mining, 2006. ICDM’06. IEEE, pp 362–371
Liu H, Wu Z (2010) Non-negative matrix factorization with constraints. In: 24th AAAI Conference on Artificial Intelligence.
Liu H, Wu Z, Cai D, Huang T (2011) Constrained non-negative matrix factorization for image representation. IEEE Trans Pattern Anal Mach Intell 99:1–1
Liu H, Wu Z, Li X, Cai D, Huang TS (2012) Constrained nonnegative matrix factorization for image representation. IEEE Trans Pattern Anal Mach Intell 34(7):1299–1311
Lovász L, Plummer M (1986) Matching theory. Elsevier Science Ltd., Amsterdam, p 121
Niyogi X (2004) Locality preserving projections. In: Advances in neural information processing systems 16: proceedings of the 2003 conference, vol. 16. The MIT Press, p 153
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: IEEE Conference on Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE, pp 1–8
Roweis S, Saul L (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323
Saul L, Pereira F (1997) Aggregate and mixed-order markov models for statistical language processing. In: Proceedings of the second conference on empirical methods in natural language processing. Association for Computational Linguistics, Somerset, New Jersey, pp 81–89
Shashua A, Hazan T (2005) Non-negative tensor factorization with applications to statistics and computer vision. In: Proceedings of the 22nd international conference on Machine learning. ACM, pp 792–799
Wang F, Li T, Zhang C (2008) Semi-supervised clustering via matrix factorization. In: Proceedings of The 8th SIAM Conference on Data Mining
Welling M (2005) Fisher linear discriminant analysis. Technical report, vol 3. Department of Computer Science, University of Toronto
Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemom Intell Lab Syst 2(1–3):37–52
Wright J, Yang A, Ganesh A, Sastry S, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31(2):210–227
Wu M, Scholkopf B (2007) A local learning approach for clustering. Adv Neural Inf Process Syst 19:1529
Xu W, Gong Y (2004) Document clustering by concept factorization. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 202–209
Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval. ACM, pp 267–273
Yang L, Jin R, Sukthankar R (2008) Semi-supervised learning with weakly-related unlabeled data: Towards better text categorization. In: 22nd annual conference on neural information processing systems, Citeseer.
Yang Y, Hu B (2007) Pairwise constraints-guided non-negative matrix factorization for document clustering. In: IEEE/WIC/ACM international conference on web intelligence. IEEE, pp 250–256
Yang Y, Shen HT, Nie F, Ji R, Zhou X (2011) Nonnegative spectral clustering with discriminative regularization. In: AAAI
Ye J, Zhao Z, Wu M (2007) Discriminative k-means for clustering. Adv Neural Inf Process Syst 20:1649–1656
Zhang L, Yang M, Feng X (2011) Sparse representation or collaborative representation: Which helps face recognition? In: 2011 IEEE international conference on Computer Vision (ICCV). IEEE, pp 471–478
Zhang Y, Yeung D (2008) Semi-supervised discriminant analysis using robust path-based similarity. In: IEEE conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE, pp 1–8
Zhang Z, Wang J, Zha H (2005) Adaptive manifold learning. IEEE Trans Pattern Anal Mach Intell 99:1–1
Zhang Z, Zha H, Zhang M (2008) Spectral methods for semi-supervised manifold learning. In: IEEE conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE, pp 1–6
Acknowledgments
This work was supported in part by NSFC (no. 61272247), the National Basic Research Program of China (973 program) under Grant 2009CB320901, the National High Technology Research and Development Program of China (863 program) under Grant 2008AA02Z310, the National Natural Science Foundation of China under Grant 60873133, arts and Science Cross Special Fund of Shanghai Jiao Tong University under Grant 13JCY14.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
In this section, we prove the convergence of PCGNMF. We begin with the following theorem regarding the iterative updating rules in Eqs. (15) and (16).
Theorem 1
The objective function \({J}\) is nonincreasing under the iterative updating rules in Eqs. (15) and (16). The objective function is invariant under these updates if and only if \(\mathbf {U}\) and \(\mathbf {V}\) are at a stationary point.
Theorem 1 guarantees that these iterative updating rules of \(\mathbf {U}\) and \(\mathbf {V}\) in Eqs. (15) and (16) can converge on a stationary point and hence final solution will be a local optimum. To prove Theorem 1, we have to show that \(J \) is nonincreasing under the iterative updating rules in Eqs. (15) and (16). Since the second term and the third term of \(J \) are only related to \(\mathbf {V}\), and the iterative updating rule (16) is exactly the same as update formula for \(\mathbf {U}\) in the NMF. The convergence proof of NMF has shown that \(J \) is nonincreasing under the iterative updating rule in Eq. (16) [11]. So, we only need to prove that \(J \) is nonincreasing under the iterative updating rule in Eq. (15). Firstly, we make use of a similar auxiliary function which has been used in the Expectation-Maximization algorithm [5, 21].
Definition
G \((v, v')\) is an auxiliary function for F(\(v\)) if the conditions G \((v, v')\) \(\ge \) F(\(v\)), G \((v, v)\) = F(\(v\)) are satisfied.
We have the following lemma regarding the very useful auxiliary function, which will be helpful to prove the convergence of the objective function.
Lemma 1
If G is an auxiliary function of F, then F is nonincreasing under the update
Proof
F \((v^{(t+1)})\) \(\le \) G \((v^{(t+1)},v^{t})\) \(\le \) G \((v^{t},v^{t})=\) F \((v^{t})\)
Now, we will prove that the iterative updating rule for \(\mathbf {V}\) in Eq. (15) is exactly the update rule in Eq. (20) with an appropriate auxiliary function. For any entry \(v_{ab}\) in \(\mathbf {V}\), we use \(F_{v_{ab}}\) to denote the part of \(J \) only relevant to \(v_{ab}\). It is easy to check that
Where \(\textit{F}'\), \(\textit{F}''\) are the first and second order derivative with respect to \(\mathbf {V}\), respectively. \(\square \)
Lemma 2
The function
is an auxiliary function for \(F_{v_{ab}}\), and it is the part of \(J \) related \(v_{ab}\).
Proof
Since \(\textit{G}(v,v)=F_{v_{ab}}(v)\) is explicit, we only have to show that \(\textit{G}(v,v^{(t)}_{ab})\ge F_{v_{ab}}(v)\). In order to achieve that, we can compare the Taylor series expansion of \(F_{v_{ab}}(v)\) with the auxiliary function \(\textit{G}(v,v^{(t)}_{ab})\).
Clearly, showing \(\textit{G}(v,v^{(t)}_{ab})\ge F_{v_{ab}}(v)\) is equivalent to prove that
In order to prove above inequality holds, we have
and
Therefore, the inequality \(\textit{G}(v,v^{(t)}_{ab})\ge F_{v_{ab}}(v)\) holds.\(\square \)
Now, we can show the convergence of Theorem 1 for \(\mathbf {V}\):
Proof of Theorem 1
we can replace \(\textit{G}(v,v^{(t)}_{ab})\) in Eq. (20) by Eq. (23) to obtain the update rule which is exactly the same as the iterative updating rule for \(\mathbf {V}\).
Since Eq. (23) is an auxiliary function, \(\textit{F}_{v_{ab}}\) is nonincreasing under this updating rule with Lemma 2.\(\square \)
Rights and permissions
About this article
Cite this article
He, YC., Lu, HT., Huang, L. et al. Non-negative Matrix Factorization with Pairwise Constraints and Graph Laplacian. Neural Process Lett 42, 167–185 (2015). https://doi.org/10.1007/s11063-014-9350-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-014-9350-0