Fisher׳s linear discriminant embedded metric learning
Introduction
The performance of many classification and clustering algorithms (e.g., k-NN and k-means) depends critically on the distance metric adopted. Some popular distance measures (e.g., the Euclidean and Manhattan distances), however, are often not an optimal choice, because they may not fit for the samples or applications. Particularly, different purposes may prefer different distance metrics. For instance, the distance metric between face images for identity authentication should be different from the metric for age estimation. Therefore, a data-driven distance metric learning algorithm is preferable.
In fact, the importance of metric learning has caught increasing attention since it was first proposed in [1]. The ultimate goals of various distance metric learning algorithms are similar, i.e., to make the metric between similar samples small and to make the metric between dissimilar samples large. While some desired metrics have been derived from information and probability theories [2], [3], [4], [5], [6], [7], it appears more popular to directly optimize a distance metric between sample pairs [1], [8], [9], [10], [11], [12], [13], [14]. The distance metrics to be optimized include the Mahalanobis distance [1], [8], [9], [10], [11], [13], Cosine similarity [12] and Hamming distance [14], etc. Despite the encouraging progress made by them, many of these metric learning methods neglect the distributions of sample pairs and may be suboptimal when lacking training samples.
Therefore, this paper aims to develop a more reliable metric learning algorithm, which explicitly exploits the distributions of sample pairs. Specifically, we propose a new optimization model with Fisher׳s linear discriminant (FLD) embedded into the classical maximum margin criterion. This model aims for a distance metric that can make the gap between similar sample pairs and dissimilar sample pairs as wide as possible while maintaining a large mean squared distance ratio.
To efficiently solve the proposed optimization problem, we propose an iterative approximate method by transforming the original semidefinite optimization problem into a series of standard quadratic programming problems, as presented in Section 3. In fact, this approximate method is generic and can be used for other semidefinite optimization problems. In Section 4, we empirically compare our algorithm with some popular algorithms. Three comparative experiments are conducted, on simulated datasets, UCI datasets and a challenging face verification dataset called Labeled Faces in the Wild (LFW) [15]. Our method shows superior performance in almost all the cases.
Section snippets
Problem formulation
Throughout this paper, we assume that our training data include two sets of sample pairs: pairs in one set are labeled as similar pairs, and pairs in the other set are labeled as dissimilar pairs; the identity of each sample is unknown. Let be the given samples, and and indicate a set of Npos pairs of similar samples and a set of Nneg pairs of dissimilar samples, respectively.
As with most metric learning algorithms, we consider a Mahalanobis distance metric for each sample pair
Solving the problem
Optimization problem (2) involves both the PSD constraint and a quadratic objective function. To solve it we propose an iterative approximate method.
To ensure that the feasible solution set is not empty, we incorporate slack variables ξij for sample pair into the formulation and turn (2) into the following problem:where C is the penalty factor for sample pairs.
Experimental results
In this section, we compare our model (short as FeDML) with several popular metric learning methods including Xing׳s algorithm [1], a logistic discriminant based algorithm called LDML [7], DML-eig algorithm [11], the original maximum margin model (short as mmDML) [9], and the baseline algorithm (Euc.) that uses the squared Euclidean distance. We only do experiments on the verification task, because it is difficult and it can reflect more precisely the performance of metric learning as the
Conclusion
This paper has presented a novel, reliable distance metric learning method, by embedding FLD into maximum margin metric learning. To solve its optimization problem, we have proposed a generic approximate method, which can also be used to solve other semidefinite optimization problems. The experimental results verify the reliability and effectiveness of our method.
Acknowledgments
This work was supported by the National Basic Research Program of China (973 program) under Grant no. 2013CB329403.
Yiwen Guo received the B.E. degree from Wuhan University, China, in 2011. He is currently working toward the Ph.D. degree with the Department of Electronic Engineering in Tsinghua University, China. His current research interests include computer vision, pattern recognition and machine learning.
References (25)
- E.P. Xing, A.Y. Ng, M.I. Jordan, S. Russell, Distance metric learning, with application to clustering with...
- L. Yang, R. Jin, R. Sukthankar, Y. Liu, An efficient algorithm for local distance metric learning, in: Proceedings of...
- S.C.H. Hoi, W. Liu, M.R. Lyu, W.-Y. Ma, Learning distance metrics with contextual constraints for image retrieval, in:...
- J. Goldberger, S. Roweis, G. Hinton, R. Salakhutdinov, Neighbourhood components analysis, in: Proceedings of the...
- A. Globerson, S. Roweis, Metric learning by collapsing classes, in: Proceedings of the Advances in Neural Information...
- J.V. Davis, B. Kulis, P. Jain, S. Sra, I.S. Dhillon, Information-theoretic metric learning, in: Proceedings of the...
- M. Guillaumin, J. Verbeek, C. Schmid, Is that you? Metric learning approaches for face identification, in: Proceedings...
- M. Schultz, T. Joachims, Learning a distance metric from relative comparisons, in: Proceedings of the Advances in...
- S. Shalev-Shwartz, Y. Singer, A.Y. Ng, Online and batch learning of pseudo-metrics, in: Proceedings of the...
- et al.
Distance metric learning for large margin nearest neighbor classification
J. Mach. Learn. Res.
(2009)
Distance metric learning with eigenvalue optimization
J. Mach. Learn. Res.
Cited by (5)
Geometrically interpretable Variance Hyper Rectangle learning for pattern classification
2022, Engineering Applications of Artificial IntelligenceCitation Excerpt :If this mapping is non-linear, the interpretability is greatly reduced. There is also a similar model, Fisher’s linear Discriminant (Guo et al., 2014). Its essence is a linear mapping, so it has good interpretability.
Large-scale distance metric learning for k-nearest neighbors regression
2016, NeurocomputingCitation Excerpt :Recently, distance metric learning has been widely applied to many interesting machine learning problems (see [3,4] for recent surveys), such as information retrieval [5–7], classification [8–12], computer vision [13,14] and bioinformatics [15,16]. Due to the important role of distance metric learning in metric-related pattern recognition tasks, a number of distance metric learning methods have been proposed [2,17–21]. These methods generally fall into two categories: methods based on eigenvalue optimization and methods based on convex or nonconvex optimization.
An online generalized eigenvalue version of Laplacian Eigenmaps for visual big data
2016, NeurocomputingCitation Excerpt :One more classical linear method is multi-dimensional scaling, which can only see the flat euclidean structure and is unable to find the non-linear structure in the data [7]. In [8] the authors have proposed a novel distance metric learning model based on standard fisher׳s linear discriminant analysis technique into the classical maximum margin criterion. The basic idea of this paper is to maximize the margin between dissimilar samples; while maintaining a larger mean squared distance ratio.
Feature genes selection using Fisher transformation method
2018, Journal of Intelligent and Fuzzy Systems
Yiwen Guo received the B.E. degree from Wuhan University, China, in 2011. He is currently working toward the Ph.D. degree with the Department of Electronic Engineering in Tsinghua University, China. His current research interests include computer vision, pattern recognition and machine learning.
Xiaoqing Ding received the B.E. degree from Tsinghua University, China, in 1962. She is currently a Professor and a Ph.D. Supervisor with the Department of Electronic Engineering, Tsinghua University. Her research interests include computer vision, pattern recognition, machine learning, image processing, character recognition and biometric identification, etc.
Chi Fang received the B.S. degree in Wireless Technology in 1994, and the M.S. degree in Communication and Electronic Engineering in 1997, both from Zhejiang University, China. He received the Ph.D. degree in Signal and Information Processing from Tsinghua University, China, in 2001. His research interests include face recognition and analysis, pattern recognition, computer vision and image processing, etc.
Jing-Hao Xue received the B.Eng. degree in telecommunication and information systems in 1993 and the Dr.Eng. degree in signal and information processing in 1998, both from Tsinghua University, the M.Sc. degree in medical imaging and the M.Sc. degree in statistics, both from Katholieke Universiteit Leuven in 2004, and the degree of Ph.D. in statistics from the University of Glasgow in 2008. He has worked in the Department of Statistical Science at University College London as a Lecturer since 2008. His research interests include statistical and machine-learning techniques for pattern recognition, data mining and image processing, in particular supervised, unsupervised and incompletely supervised learning for complex and high-dimensional data.