Concept Factorization with Optimal Graph Learning for Data Representation

Shu, Zhenqiu; Wu, Xiao-jun; Fan, Honghui; You, Congzhe; Liu, Zhen; Zhang, Jie

doi:10.1007/978-3-030-34110-7_7

Zhenqiu Shu^14,15,
Xiao-jun Wu¹⁵,
Honghui Fan¹⁴,
Congzhe You¹⁴,
Zhen Liu¹⁵ &
…
Jie Zhang¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11902))

Included in the following conference series:

International Conference on Image and Graphics

1930 Accesses
1 Citations

Abstract

In recent years, concept factorization methods become a popular data representation technique in many real applications. However, conventional concept factorization methods cannot capture the intrinsic geometric structure embedded in data using the fixed nearest neighbor graph. To overcome this problem, we propose a novel method, called Concept Factorization with Optimal Graph Learning (CF_OGL), for data representation. In CF_OGL, a novel rank constraint is imposed on the Laplacian matrix of the initial graph model, which encourages the learned graph with exactly c connected components for the data with c clusters. Then the learned optimal graph regularizer is integrated into the model of concept factorization. Therefore, this learned structure is benefit to the clustering analysis. In addition, we develop an efficient and effective iterative optimization algorithm to solve our proposed model. Extensive experimental results on three benchmark datasets have demonstrated that our proposed method can effectively improve the performance of clustering.

This work was supported by the National Natural Science Foundation of China [Grant No. 61603159, 61672265, U1836218], Natural Science Foundation of Jiangsu Province [Grant No. BK20160293], China Postdoctoral Science Foundation [Grant No. 2017M611695] and Jiangsu Province Postdoctoral Science Foundation [Grant No. 1701094B] and Excellent Key Teachers of QingLan Project in Jiangsu Province.

You have full access to this open access chapter, Download conference paper PDF

Feature Selection for Adaptive Dual-Graph Regularized Concept Factorization for Data Representation

Article 31 August 2016

Jun Ye & Zhong Jin

Hyper-graph regularized discriminative concept factorization for data representation

Article 16 May 2017

Jun Ye & Zhong Jin

Graph Regularized Discriminative Joint Concept Factorization for Data Representation

Keywords

1 Introduction

Data representation methods have been widely applied to various fields in pattern recognition and machine learning [1,2,3]. Over the past few decades, matrix factorization methods have become one of the most popular data representation techniques due to its efficiency and effectiveness. Many classical matrix factorization techniques, such as Singular Value Decomposition (SVD) [4], Principal Component Analysis (PCA) [5], Nonnegative Matrix Factorization (NMF) [6] and Concept Factorization (CF) [7], have shown the encouraging performances in image classification, object tracking, document clustering, etc. [8, 9].

NMF has aroused increasing interests due to its physical and theoretical interpretations. NMF naturally leads to a part-based representation of data by imposing the nonnegative constraint on both coefficient and basis matrices. The basic idea behind NMF is to seek two nonnegative matrices to approximate the original data matrix. However, NMF cannot deal with the data matrix containing with some negative elements due to the noise or outlier. Therefore, Xu et al. [7] proposed a variation of NMF, called Concept Factorization (CF), for document clustering. Different from the NMF methods, CF can deal with the data matrix mixed with nonnegative elements. In order to discover the local geometric structure of data, Cai et al. [10] proposed a Locally Consistent Concept Factorization (LCCF) method for data representation. It models the manifold structure of data using the graph regularizer. Shu et al. [11] proposed a Local Learning Concept Factorization (LLCF) method to learn the discriminant structure and the local geometric structure, simultaneously, by adding the local learning regularization term into the model of CF. Motivated by the deep learning, Li et al. [12] proposed a multilayer concept factorization method to discover the structure information hidden in data using the multilayer framework. Pei et al. [13] developed a CF with adaptive neighbors method for clustering. The idea of this proposed method is to integrate an ANs regularizer into the CF decomposition. However, the aforementioned methods cannot update dynamically the optimal graph model, which is used to explore the intrinsic geometric manifold structure of data in matrix decomposition.

To solve this issue, we propose a novel method named as Concept Factorization with Optimal Graph Learning (CF_OGL) in this paper. Specifically, we impose a rank constraint on the Laplacian matrix of the initially given graph, and then iteratively update it. Therefore, the learned graph has exactly c connected components, whose structure is beneficial to the clustering applications. Then the learned graph regularizer is used to constrain the model of the concept factorization method, and thus the geometric structure of data can be better preserved in low dimensional feature space. Extensive experimental results on three datasets demonstrate that our proposed CF_OGL method outperforms other state-of-the-art methods in clustering.

This paper is organized as follows: We briefly describe both CF and LCCF algorithms in Sect. 2. In Sect. 3, we introduce our proposed CF_OGL algorithm and then derive its updating rules. In Sect. 4, we carry out some experiments to investigate the proposed CF_OGL algorithm. Finally, conclusions are drawn in Sect. 5.

2 The Relative Work

In this section, the models of both CF and LCCF are briefly presented.

2.1 CF

Concept factorization is a popular matrix factorization technique to deal with high dimensional data. Given a data matrix $X=[x_1, x_2, ..., x_n] \in {R^{m \times n}}$, $x_i$ denotes a m-dimensional vector. In CF, the entire data points are used to lineally represent each underlying concept, and all the concepts seek to lineally approximate to each data point, simultaneously. Therefore, we can give the objective function of CF as

$$\begin{aligned} X = XU{V^T} \end{aligned}$$

(1)

where $ U \in {R^{n \times k}}$ and $ V \in {R^{n \times k}}$. Using the Euclidean distance metric to measure the reconstruction error, its minimization problem can be given as follows:

$$\begin{aligned} \begin{array}{c} \mathop {\min }\limits _{U, V} \left\| {X - XU{V^T}} \right\| _F^2 \\ s.t. U \ge 0, V \ge 0 \\ \end{array} \end{aligned}$$

(2)

where ${\left\| \cdot \right\| _F}$ denotes the matrix Frobenius norm. Using the multiplicative updating algorithm, we derive the updating rules of Eq. (2) as follows:

$$\begin{aligned} \begin{array}{c} u_{ij}^{t + 1} \leftarrow u_{ij}^t\frac{{{{({{KV}})}_{ij}}}}{{{{({{KU}}{{{V}}^T}{{V}})}_{ij}}}}\\ v_{ij}^{t + 1} \leftarrow v_{ij}^t\frac{{{{({{KW}})}_{ij}}}}{{{{({{V}}{{{W}}^T}{{KW}})}_{ij}}}} \\ \end{array} \end{aligned}$$

(3)

where ${{K}} = {{{X}}^T}{{X}}$. To deal with the nonlinear data, CF is easily kernelized using kernel trick.

2.2 LCCF

Traditional CF method fails to consider the manifold structure information of data. To solve this issue, Cai et al. [10] proposed the LCCF method, which models the manifold structure embedded in data using the fixed graph model. Therefore, the objective function of LCCF can be given as follow:

$$\begin{aligned} \begin{array}{c} \mathop {\min }\limits _{U, V} \left\| {X -XU{V^T}} \right\| _F^2 + \lambda tr({V^T}LV)\\ s.t. U \ge 0, V \ge 0 \\ \end{array} \end{aligned}$$

(4)

where $\lambda $ stands for a balance parameter, and tr(.) denotes the trace of a matrix. D is a diagonal matrix, ${D_{ii}} = \sum \nolimits _S {{W_{ij}}}$, $L = D - W$. Similarly, we derive the updating rules of Eq. (4) as follows:

$$\begin{aligned} \begin{array}{c} u_{ij}^{t + 1} \leftarrow u_{ij}^t\frac{{{{({{KV}})}_{ij}}}}{{{{({{KU}}{{{V}}^T}{{V}})}_{ij}}}}\\ v_{ij}^{t + 1} \leftarrow v_{ij}^t\frac{{{{({{KW+\lambda WV}})}_{ij}}}}{{{{({{V}}{{{W}}^T}{{KW+\lambda DV}})}_{ij}}}}\\ \end{array} \end{aligned}$$

(5)

According to the rules (5), we can achieve a local minimum of Eq. (4).

3 The Proposed Method

3.1 Motivation

Traditional CF methods cannot effectively explore the intrinsic geometric manifold structure embedded in high dimensional data using the fixed graph model. By addint the rank constraint into the Laplacian matrix of the initially given graph, we learn an optimal graph model with exactly c connected components. In CF_AGL, the learned graph regularizer is further constructed, and then imposed on the model of CF. Therefore, our proposed method explores the semantic information hidden in high dimensional data effectively.

3.2 Constrained Laplacian Rank (CLR)

A graph learning method, called Constrained Laplacian Rank (CLR), was proposed to explore the intrinsic geometric structure of data, whose goal is to learn an optimal graph model [14]. Therefore, the CLR method is formulated by the following optimization problem:

$$\begin{aligned} {\ J_{CLR}={\min _{\sum \nolimits _j {{a_{ij}} = 1}, {a_{ij}} \ge 0, rank({L_Q}) = n - k}} \left\| {Q - W} \right\| _F^2} \end{aligned}$$

(6)

where $L_Q$ stands for the Laplacian matrix of the matrix Q. Denote ${\sigma _i}({L_Q})$ as the i-th smallest eigenvalue of $L_A$. It is worth noting that ${\sigma _i}({L_Q}) > 0$ because of its positive semidefinition. Therefore, Eq. (6) can be reformulated as the following problem for a large enough value of ${\sigma _i}$:

$$\begin{aligned} {J_{CLR}} = \mathop {\min }\limits _{\sum \nolimits _j {{q_{ij}} = 1}, {q_{ij}} \ge 0} \left\| {Q - W} \right\| _F^2 + 2\lambda \sum \limits _{i = 1}^k {{\sigma _i}({L_Q})} \end{aligned}$$

(7)

According to the Ky Fans Theorem, we have the following equivalent definition as

$$\begin{aligned} {\sum \limits _{i = 1}^k {{\sigma _i}({L_Q})} = \mathop {\min }\limits _{F \in {^{n \times k}},{F^T}F = I} Tr({F^T}{L_Q}F)} \end{aligned}$$

(8)

Therefore, we can further rewrite the problem (7) as follows:

$$\begin{aligned} \begin{array}{l} \mathop {\min }\limits _Q \left\| {Q - W} \right\| _F^2 + \lambda Tr(F{L_Q}{F^T}) \\ s.t.\,F{F^T} = I, Q1 = 1, Q \ge 0, Q \in {R^{n \times n}}\\ \end{array} \end{aligned}$$

(9)

3.3 Our Proposed Method

By integrating the learned graph regularization term into the model of CF, the objective function of our proposed CF_OGL method can given as follows:

$$\begin{aligned} \begin{array}{c} \mathop {\min }\limits _{Q, U, V} (\left\| {Q - W} \right\| _F^2 + \beta Tr(F{L_Q}{F^T}) + \lambda Tr(V{L_Q}{V^T})\\ +\,\mu \left\| {X - XU{V^T}} \right\| ) \\ s.t.F{F^T} = I, V \ge 0, U \ge 0, Q1 = 1, Q \ge 0, Q \in {R^{n \times n}}\\ \end{array} \end{aligned}$$

(10)

It is impractical to find the global optimal solution of problem (10) because it is not a convex problem in U, V and A together. Fortunately, we can achieve a local solution by optimizing the variables alternatively. Therefore, the optimization scheme of our proposed CF_OGL method mainly consists of two parts:

Fixing Q and F, Update U and V. By fixing the variables A and F, the Eq. (10) can be rewritten as the following problem:

$$\begin{aligned} \begin{array}{c} \mathop {\min }\limits _{U, V} (\lambda Tr(V{L_Q}{V^T}) + \mu \left\| {X - XU{V^T}} \right\| ) \\ s.t.{V{V^T} = I, Q \ge 0, U \ge 0} \\ \end{array} \end{aligned}$$

(11)

Similarly, it is easy to derive the updating rules of problem (10) as follows:

$$\begin{aligned} \begin{array}{c} u_{ij}^{t + 1} \leftarrow u_{ij}^t\frac{{{{({{KV}})}_{ij}}}}{{{{({{KU}}{{{V}}^T}{{V}})}_{ij}}}}\\ \end{array} \end{aligned}$$

(12)

$$\begin{aligned} \begin{array}{c} v_{ij}^{t + 1} \leftarrow v_{ij}^t\frac{{{{({{KW+\lambda WV}})}_{ij}}}}{{{{({{V}}{{{W}}^T}{{KW+\lambda DV}})}_{ij}}}} \\ \end{array} \end{aligned}$$

(13)

Fixing U and V, Update Q and F. By fixing U and V, we can rewrite the Eq. (10) as the following optimization problem:

$$\begin{aligned} \begin{array}{c} \mathop {\min }\limits _{Q,U,V} (\left\| {Q - W} \right\| _F^2 + \beta Tr(F{L_Q}{F^T})) \mathrm{{ }}\\ s.t. F{F^T} = I, Q1 = 1,Q \ge 0,Q \in {R^{n \times n}}\\ \end{array} \end{aligned}$$

(14)

(A) When Q is fixed, the Eq. (14) becomes

$$\begin{aligned} \begin{array}{c} \mathop {\min }\limits _{F{F^T} = I} (\beta Tr(F{L_Q}{F^T})) \mathrm{{ }}\\ \end{array} \end{aligned}$$

(15)

It is easy to know that the solution scheme of F can be converted into solving the k smallest eigenvalues problem of $L_Q$.

(B) When F is fixed, the problem (14) becomes the following optimization problem:

$$\begin{aligned} \begin{array}{c} \mathop {\min \sum \limits _{i,j = 1} {{{\left( {{q_{i,j}} - {w_{i,j}}} \right) }^2}} + \frac{\beta }{2}}\limits \sum \limits _{i,j = 1} {\left\| {{f_i} - {f_j}} \right\| _2^2{q_{i,j}}} \\ \mathrm{{ }}s.t.\sum \nolimits _j {{q_{ij}}} = 1,{q_{ij}} \ge 0 \\ \end{array} \end{aligned}$$

(16)

For each row ${w_i}$, we have the vector form as

$$\begin{aligned} \begin{array}{c} \mathop {\min }\limits _{{q_i} \ge 0,{q_i}1 = 1} \left\| {{q_i} - ({w_i}}-\frac{\beta }{2}d_i^T) \right\| _2^2 \end{array} \end{aligned}$$

(17)

where $d_{ij}={\left\| {{f_i} - {f_j}} \right\| _2^2}$. The problem (17) can be solved by the optimization algorithm proposed in [14].

4 Experimental Results

In this section, we carry out some experiments to investigate the proposed CF_OGL method on the Yale, ORL and FERET datasets. To demonstrate its effectiveness, the proposed CF_OGL method is compared with several state-of-the-art methods, such as K-means, PCA, NMF, CF and LCCF. Two well accepted measurements, such as accuracy (AC) and normalized mutual information (NMI), are used as metrics to quantify the performance of data representation in clustering.

4.1 Yale Face Dataset

The Yale face dataset includes a total of 165 face images from 15 individuals. In each experiment, the P categories images were randomly sampled from the Yale dataset to evaluate the performances of all methods. We run all methods ten times for each value of P, and recorded their average results. The results of all methods on the Yale dataset are shown in Table 1. We can clearly see that our proposed CF_OGL method outperforms other state-of-the-art methods regardless of the choices of P. Specifically, the average AC and NMI of the proposed CF_OGL method are 3.7% and 5.1% higher than those of LCCF, respectively. The main reason is that our propose CF_OGL method can learn an optimal graph structure, which can significantly improve the clustering performance than LCCF.

Table 1. The clustering performances on the Yale face dataset

Full size table

4.2 ORL Face Dataset

The ORL face dataset contrains 400 face images from 40 distinct subjects. For some subjects, the face images were taken at different times, varying the lighting and facial expressions. In this experiment, we adopted the above similar experimental scheme to investigate the effectiveness of our proposed CF_OGL method. Table 2 provides the clustering results of six methods on the ORL face dataset. It can be observed that CF_OGL can achieve the best performance among all the compared methods. The main reason is that our proposed CF_OGL method can learn the optimal graph and thus effectively preserve the intrinsic geometric structure of data. Therefore, it outperforms other state-of-the-art methods on this dataset (Figs. 1 and 2).

Table 2. The clustering performances on the ORL face dataset

Full size table

4.3 FERET Face Dataset

The FERET face database contains 200 different individuals with about 7 face samples for each individual. Here, we randomly chose P categories samples from the FERET dataset, and mixed them as the experimental subset for clustering. All methods were run ten times, and then their average performances were recorded as the final results. The clustering performances for each method on the FERET dataset are summarized in Table 3. It is easy to find that the average performance of our proposed CF_OGL method has certain advantage compared with other methods in clustering (Fig. 3).

Table 3. The clustering performances on the FERET face dataset

Full size table

4.4 The Analysis of the Parameters

In our proposed CF_OGL, the parameters $\beta $, $\lambda $ and $\mu $ have an effect on the clustering performance. Specifically, we randomly chose 10, 20, 80 categories samples as the dataset to carry out the experiments. However, the parameter selection of the proposed CF_OGL method is still an open problem. Therefore, we determine the parameters by grid search at first and then change them within certain ranges. Here, we only investigate the parameters $\beta $ and $\lambda $. The Figs. 4 and 5 show the performances of all methods varied with different values of $\beta $ and $\lambda $ on three datasets, respectively. It is clear to see that our proposed method can achieve a relative stable performance in a large range.

5 Conclusion

In this paper, a novel matrix factorization technique, called Concept Factorization with adaptive graph learning (CF_OGL), is proposed for data representation. In order to learn an optimal graph, we impose a rank constraint on the Laplacian matrix of the initially given graph. Then the learned graph regularizer is integrated into the model of CF. Therefore, our proposed CF_OGL method effectively exploits the geometric manifold structure embedded in high dimensional data. Experimental results have shown that the proposed CF_OGL algorithm achieves better performance in comparison with other state-of-the-art algorithms.

References

Wu, X., Josef, K., Yang, J., et al.: A new direct LDA (D-LDA) algorithm for feature extraction in face recognition. In: International Conference on Pattern Recognition, vol. 4, pp. 545–548. IEEE Xplore (2004)
Google Scholar
Shu, Z., Wu, X., Huang, P., et al.: Multiple graph regularized concept factorization with adaptive weights. IEEE Access 6, 64938–64945 (2018)
Article Google Scholar
Zheng, Y., Yang, J., Yang, J., et al.: Nearest neighbour line nonparametric discriminant analysis for feature extraction. Electron. Lett. 42(12), 679–680 (2006)
Article Google Scholar
Zhou, B., Chen, J.: A geometric distortion resilient image watermarking algorithm based on SVD. Chin. J. Image Graph. 9, 506–512 (2004)
Google Scholar
Turk, M., Pentland, A.: Eigenfaces for recognition. J. Cogn. Neurosci. 3(1), 71–86 (1991)
Article Google Scholar
Lee, D., Seung, H.: Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999)
Article Google Scholar
Xu, W., Gong, Y.: Document clustering by concept factorization. In: Proceedings of ACM SIGIR, pp. 202–209 (2004)
Google Scholar
Shu, Z., Wu, X.J., Hu, C., et al.: Structured discriminative concept factorization for data representation. In: The Third IEEE International Smart Cities Conference (ISC2), pp. 1–4 (2017)
Google Scholar
Shu, Z., Wu, X., Hu, C.: Structure preserving sparse coding for data representation. Neural Process. Lett. 48, 1–15 (2018)
Article Google Scholar
Cai, D., He, X., Han, J.: Locally consistent concept factorization for document clustering. IEEE Trans. Knowl. Data Eng. 23(6), 902–913 (2011)
Article Google Scholar
Shu, Z., Zhou, J., Huang, P., et al.: Local and global regularized sparse coding for data representation. Neurocomputing 198(29), 188–197 (2016)
Article Google Scholar
Li, X., Shen, X., Shu, Z., Ye, Q., Zhao, C.: Graph regularized multilayer concept factorization for data representation. Neurocomputing 238, 139–151 (2017)
Article Google Scholar
Pei, X., Chen, C., Gong, W.: Concept factorization with adaptive neighbors for document clustering. IEEE Trans. Neural Netw. Learn. Syst. 29(2), 343–352 (2018)
Article MathSciNet Google Scholar
Nie, F., Wang, X., Jordan, M.I., et al.: The constrained Laplacian rank algorithm for graph-based clustering. In: Thirtieth AAAI Conference on Artificial Intelligence. AAAI Press (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Engineering, Jiangsu University of Technology, Changzhou, 231001, China
Zhenqiu Shu, Honghui Fan, Congzhe You & Jie Zhang
Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Jiangnan University, Wuxi, 214122, China
Zhenqiu Shu, Xiao-jun Wu & Zhen Liu

Authors

Zhenqiu Shu
View author publications
You can also search for this author in PubMed Google Scholar
Xiao-jun Wu
View author publications
You can also search for this author in PubMed Google Scholar
Honghui Fan
View author publications
You can also search for this author in PubMed Google Scholar
Congzhe You
View author publications
You can also search for this author in PubMed Google Scholar
Zhen Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jie Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiao-jun Wu .

Editor information

Editors and Affiliations

Beijing Jiaotong University, Beijing, China
Yao Zhao
The Australian National University, Canberra, Australia
Nick Barnes
Peking University, Peking, China
Baoquan Chen
The Technical University of Munich, München, Bayern, Germany
Rüdiger Westermann
Zhejiang University, Hangzhou, China
Xiangwei Kong
Beijing Jiaotong University, Beijing, China
Chunyu Lin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shu, Z., Wu, Xj., Fan, H., You, C., Liu, Z., Zhang, J. (2019). Concept Factorization with Optimal Graph Learning for Data Representation. In: Zhao, Y., Barnes, N., Chen, B., Westermann, R., Kong, X., Lin, C. (eds) Image and Graphics. ICIG 2019. Lecture Notes in Computer Science(), vol 11902. Springer, Cham. https://doi.org/10.1007/978-3-030-34110-7_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-34110-7_7
Published: 28 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34109-1
Online ISBN: 978-3-030-34110-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Concept Factorization with Optimal Graph Learning for Data Representation

Abstract

Similar content being viewed by others

Feature Selection for Adaptive Dual-Graph Regularized Concept Factorization for Data Representation

Hyper-graph regularized discriminative concept factorization for data representation

Graph Regularized Discriminative Joint Concept Factorization for Data Representation

Keywords

1 Introduction