Abstract
Graph is at the heart of many dimensionality reduction (DR) methods. Despite its importance, how to establish a high-quality graph is currently a pursued problem. Recently, a new DR algorithm called graph-optimized locality preserving projections (GoLPP) was proposed to perform graph construction with DR simultaneously in a unified objective function, resulting in an automatically optimized graph rather than pre-specified one as involved in typical LPP. However, GoLPP is unsupervised and can not naturally incorporate supervised information due to a strong sum-to-one constraint of weights of graph in its model. To address this problem, in this paper we give an improved GoLPP model by relaxing the constraint, and then develop a semi-supervised GoLPP (S-GoLPP) algorithm by incorporating pairwise constraint information into its modeling. Interestingly, we obtain a semi-supervised closed-form graph-updating formulation with natural possibility explanation. The feasibility and effectiveness of the proposed method is verified on several publicly available UCI and face data sets.
Similar content being viewed by others
Notes
Here, “possibilistic” is used to distinguish from “probabilistic” for denoting the row sum is not always 1.
In fact, such obtained solution is not exact, which is involved in the trace ratio and ratio trace problems and goes beyond our main focus. See [22] for more details.
References
Yan SC, Xu D, Zhang BY, Zhang HJ, Yang Q, Lin S (2007) Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Trans Pattern Anal Mach Intell 29(1):40–51
He XF, Niyogi P (2004) Locality preserving projections. Neural Inf Process Syst (NIPS) 16:153–160
Wang H, Zheng W (2014) Robust sparsity-preserved learning with application to image visualization. Knowl Inf Syst 39(2):287–304
Matthias Dehmer FE-S (2007) Comparing large graphs efficiently by margins of feature vectors. Appl Math Comput 188(2):1699–1710
Wan M, Lai Z, Jin Z (2011) Feature extraction using two-dimensional local graph embedding based on maximum margin criterion. Appl Math Comput 217(23):9659–9668
Kim YG, Song YJ, Chang UD, Kim DW, Yun TS, Ahn JH (2008) Face recognition using a fusion method based on bidirectional 2DPCA. Appl Math Comput 205(2):601–607
Musa AB (2014) A comparison of ℓ1-regularizion, PCA, KPCA and ICA for dimensionality reduction in logistic regression. Int J Mach Learn Cybern 5(6):861–873
Hasan BAS, Gan JQ, Tsui CSL (2014) A filter-dominating hybrid sequential forward floating search method for feature subset selection in high-dimensional space. Int J Mach Learn Cybern 5(3):413–423
Von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416
Fang Y, Wang R, Dai B (2012) Graph-oriented learning via automatic group sparsity for data analysis. In: IEEE 12th international conference on data mining (ICDM), pp 251–259
Zhu X (2008) Semi-supervised learning literature survey. Technical report, University of Wisconsin, Madison
Liu W, Chang S-F (2009) Robust multi-class transductive learning with graphs. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 381–388
Maier M, Luxburg U (2008) Influence of graph construction on graph-based clustering measures. Neural Inf Process Syst (NIPS)
Fadi Dornaika AA (2013) Enhanced and parameterless locality preserving projections for face recognition. Neurocomputing 99:448–457
Zhao HT, Wong WK (2012) Supervised optimal locality preserving projection. Pattern Recognit 45:186–197
Bo Yang SC (2010) Sample-dependent graph construction with application to dimensonality reduction. Neurocomputing 74(1–3):301–314
Zhang L, Qiao L, Chen S (2010) Graph-optimized locality preserving projections Pattern Recognit 43(6):1993–2002
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning : data mining, inference, and prediction, 2nd edn. Springer, New York
Cai D, He XF, Han JW (2007) Semi-supervised discriminant analysis. In: IEEE 11th international conference on computer vision (ICCV), pp 1–7
Mizutani K, Miyamoto S (2005) Possibilistic approach to kernel-based fuzzy c-means clustering with entropy regularization. In: Torra V, Narukawa Y, Miyamoto S (eds) Modeling decisions for artificial intelligence. Springer, Berlin, Heidelberg
Pal NR, Pal K, Keller JM, Bezdek JC (2005) A possibilistic fuzzy c-means clustering algorithm. IEEE Trans Fuzzy Syst 13(4):517–530
Wang H, Yan SC, Xu D, Tang XO, Huang T (2007) Trace ratio vs. ratio trace for dimensionality reduction. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1–8
Lee KC, Ho J, Kriegman DJ (2005) Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans Pattern Anal Mach Intell 27(5):684–698
Qiao L, Zhang L, Chen S (2013) Dimensionality reduction with adaptive graph. Front Comput Sci 7(5):745–753
Acknowledgments
This work was partly supported by National Natural Science Foundations of China and Shandong under Grant Nos: 61300154, 11326182, 61402215 and ZR2012FQ005.
Author information
Authors and Affiliations
Corresponding author
Appendix: How to solve the weight matrix S = (S ij ) n×n in problem (5)
Appendix: How to solve the weight matrix S = (S ij ) n×n in problem (5)
As seen from Step 2 in Sect. 3.3, given W, computing S = (S ij ) n×n in model (4) is equivalent to solving the following optimization problem:
Obviously, from the constraints of (5) we can get that,
So, we only solve the weight value S ij corresponding to the samples without constraints. That is, we just consider the following problem:
where \((x_{i} ,x_{j} ) \notin M\begin{array}{*{20}c} {} \\ \end{array} {\text{and}}\begin{array}{*{20}c} {} \\ \end{array} (x_{i} ,x_{j} ) \notin C\). To optimize S ij , we establish the lagrangian function as follows:
By the KKT condition,
so the lagrangian function is simplified as
Let \(\frac{\partial L}{{\partial S_{ij} }} = ||W^{T} x_{i} - W^{T} x_{j} ||^{2} + \eta (\ln (S_{ij} /\alpha ) + 1) = 0\), then we have
where \((x_{i} ,x_{j} ) \notin M ,\begin{array}{*{20}c} {} \\ \end{array} (x_{i} ,x_{j} ) \notin C,\begin{array}{*{20}c} {} \\ \end{array} \alpha\) is a positive parameter. Intuitively, one expects the weight S ij approximates 1 when the distance of two samples tends to 0. With this intuition, we set α = e, and obtain
Lastly, we sum up the solution of problem (5) as follows:
Rights and permissions
About this article
Cite this article
Zhang, L., Qiao, L. A graph optimization method for dimensionality reduction with pairwise constraints. Int. J. Mach. Learn. & Cyber. 8, 275–281 (2017). https://doi.org/10.1007/s13042-014-0321-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-014-0321-6