Non-linear metric learning using pairwise similarity and dissimilarity constraints and the geometrical structure of data
Introduction
Distance metrics are a key issue in many machine learning algorithms [1]. Over the past few years, there has been considerable research on distance metric learning [2]. Many of the earlier studies optimize the metric with class labels for classification tasks [3], [4], [5], [6], [7], [8]. More recently, researchers have given much attention to distance learning for semi-supervised clustering tasks. As class label information is not generally available for clustering tasks, constraints are used as more natural supervisory information for these tasks. Pairwise similarity (positive) and dissimilarity (negative) constraints are the most popular kind of side information that has been used for semi-supervised clustering. However, other kinds of side information like relative comparisons have also been considered in some studies.
Over the last few years, the problem of clustering with side information (semi-supervised clustering) has received increasing attention [9], [10] and distance learning has been considered as a powerful approach for this problem. The two most frequently used approaches that include side information in the clustering algorithms are constraint-based [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21] and distance function learning [22], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32], [33], [34] approaches [24]. In the former approach, the clustering algorithm itself is modified to use the available labels or constraints to bias the search for an appropriate data clustering. However, in the latter approach, the algorithm learns a distance function prior to clustering. The learned distance function tries to put similar points close together and dissimilar points far away from each other. This approach is more flexible in the choice of distance function [33]. Additionally, it has received considerable attention in recent studies [1], [25], [28], [29], [30], [31], [33], [34] and we also use this approach.
Distance learning based on constraints has been studied by many researchers [22], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32], [33], [34]. Klein et al. [22] introduced a metric adaptation method for semi-supervised clustering. This method finds a distance measure according to the shortest path in a version of the similarity graph that has been altered by positive constraints. However, negative constraints have been employed after the metric adaptation phase during the complete-link clustering. Some latter studies [1], [23], [25], [28], [34] have considered a more popular approach that learns a global Mahalanobis metric from pairwise constraints. Xing et al. [23] proposed a convex optimization problem to learn a global Mahalanobis metric according to pairwise constraints. Bar-Hillel et al. [25] devised a more efficient, non-iterative algorithm called relevant component analysis (RCA) for learning a Mahalanobis metric. This method can only incorporate positive constraints. An extension of the RCA method that can consider both positive and negative constraints has also been introduced by Yeung and Chang [28].
More recently, some non-linear metric learning methods for semi-supervised clustering have been introduced. Chang and Yeung [29] proposed a locally linear metric learning method that considers only positive constraints. The objective function of this method has many local optima and the topology cannot be preserved well during this approach [30]. Chang and Yeung [31] proposed also a metric adaptation method. This method adjusts the location of data points iteratively, so that similar points tend to get closer and dissimilar points tend to move away from each other. As this method lacks an explicit transformation map, it cannot project new data points onto the transformed space straightforwardly [31]. Additionally, the movement of data points in this method may interfuse the structure of the data. In [30], two kernel-based metric learning methods have been presented that do have some limitations [30]. These kernel-based methods can use only positive constraints.
Among the existing metric learning methods, some of them [1], [23], [28], [34], [39], [40] can incorporate both positive and negative constraints. However, most of these methods [1], [23], [28], [34] learn only a linear transformation that corresponds to a Mahalanobis metric. Although some recent studies [39], [40] have been introduced for kernel learning from positive and negative constraints, they are based on learning non-parametric kernel matrices. These methods can only find distances of the seen data. Additionally, the optimization problems in these methods are usually difficult to solve [40] and the degree of freedom of the corresponding models is very high (i.e., n2 where n denotes the number of data points). In this paper, we propose an efficient non-linear metric learning method that considers both positive and negative constraints and also the topological structure of the data. We formulate the proposed method as a constrained trace ratio optimization problem that can be solved efficiently using algorithms introduced for this purpose (e.g., Xiang et al.'s method [1]). The proposed non-linear method can be considered as an efficient kernel learning method that does not need to learn all items of an matrix. Our method yields an explicit transformation that can project new data points onto the transformed space.
The rest of this paper is organized as follows: Section 2 presents a brief review of related works. In Section 3, first the general form of the proposed optimization problems that incorporate both positive and negative constraints and also the topological structure of the data is introduced. Then, we present special problems that can be solved efficiently for learning linear and non-linear transformations. Finally, we present a kernel-based method and show the relation between the proposed non-linear method and a special form of this kernel-based method. Section 4 presents some experimental results on synthetic and real-world data sets. Concluding remarks are given in the last section.
Section snippets
Related works
In this section, we review those methods that can consider both positive and negative constraints to learn a transformation. A positive constraint denotes a pair of data points that must be in the same cluster while a negative constraint denotes two data points that must be in two different clusters [1]. Most of the existing methods that can use both positive and negative constraints learn a Mahalanobis metric A (where A is a positive semi-definite matrix) or, equivalently, find a
Proposed approach
In this section, we first propose a general framework for learning an appropriate transformation from positive and negative constraints. Based on this framework, we propose problems (that can be solved efficiently) for learning linear and non-linear transformations. Finally, we introduced our kernel-based method and show the relation between a special form of this method and the proposed non-linear metric learning method.
Here, we introduce some notations used in this section.
Experimental results
In this section, we explain experiments that we have conducted to compare our linear and non-linear metric learning methods with some existing methods. We measure the effectiveness of semi-supervised metric-learning algorithms by comparing clustering results obtained from using different metrics. We report results on both synthetic and real-world data sets.
Conclusions and future work
In this paper, we introduced a novel metric learning method for semi-supervised clustering. We proposed a general framework for learning linear and non-linear transformations using both positive and negative constraints. The proposed methods have been formulated as constrained trace ratio problems that can be solved efficiently. We considered the geometrical structure of the data along with the pairwise constraints in the proposed optimization problems. We showed that the proposed non-linear
About the Author—MAHDIEH SOLEYMANI BAGHSHAH received her B.S. and M.S. degrees from Department of Computer Engineering, Sharif University of Technology, Iran in 2003 and 2005. She is now a Ph.D. candidate at Sharif University of Technology. Her research interests include machine learning and pattern recognition with primary emphasis on semi-supervised learning and clustering.
References (40)
- et al.
Learning a Mahalanobis distance metric for data clustering and classification
Pattern Recognition
(2008) - et al.
Extending the relevant component analysis algorithm for metric learning using both positive and negative equivalence constraints
Pattern Recognition
(2006) - et al.
Locally linear metric adaptation with application to semi-supervised clustering and image retrieval
Pattern Recognition
(2006) - et al.
Relaxational metric adaptation and its application to semi-supervised clustering and content-based image retrieval
Pattern Recognition
(2006) - et al.
A generalized Foley–Sammon transform based on generalized fisher discriminant criterion and its application to face recognition
Pattern Recognition Letters
(2003) - L. Yang, R. Jin, Distance metric learning: a comprehensive survey, Technical Report, Michigan State University...
- J. Goldberger, S. Roweis, G. Hinton, R. Salakhutdinov, Neighborhood components analysis, in: Advances in NIPS, MIT...
- K. Weinberger, J. Blitzer, L. Saul, Distance metric learning for large margin nearest neighbor classification, in:...
- et al.
Discriminant adaptive nearest neighbor classification
IEEE Transactions on Pattern Analysis and Machine Intelligence
(1996) - A. Globerson, S. Roweis, Metric learning by collapsing classes, in: Advances in NIPS, MIT Press, Cambridge, MA, USA,...
Correlation clustering
Machine Learning
Cited by (37)
Decision tree pairwise metric learning against adversarial attacks
2021, Computers and SecurityCitation Excerpt :Despite its effectiveness on imbalanced and limited data samples, metric learning based models for adversarial sample detection such as (Chen et al., 2018; Pang et al., 2018; 2019; Wen et al., 2016) learn on the single Mahalanobis distance which is limited in its capacity to capture the shape of complex data (Baghshah and Shouraki, 2010; Duan et al., 2020; Ma and Zheng, 2016; Wang et al., 2018; Xiong et al., 2012). To resolve the aforementioned limitation, the authors (Baghshah and Shouraki, 2010; Chang, 2012; Frome et al., 2017; Ma and Zheng, 2016; Xiong et al., 2012) introduced a non-negative and symmetric metric that is able to implicitly adapt it distance function through the feature space thereby, making their distance metric learning based models adaptable to complex data. Kontschieder et al. (2016) and Hehn and Hamprecht (2018), introduced the concept of decision trees in deep learning to transform the weak Softmax Cross Entropy loss into strong classifiers.
Sparse feature selection: Relevance, redundancy and locality structure preserving guided by pairwise constraints
2020, Applied Soft Computing JournalSparse Bayesian approach for metric learning in latent space
2019, Knowledge-Based SystemsA data-driven metric learning-based scheme for unsupervised network anomaly detection
2019, Computers and Electrical EngineeringSparse Bayesian similarity learning based on posterior distribution of data
2018, Engineering Applications of Artificial IntelligenceDetection of evolving concepts in non-stationary data streams: A multiple kernel learning approach
2018, Expert Systems with Applications
About the Author—MAHDIEH SOLEYMANI BAGHSHAH received her B.S. and M.S. degrees from Department of Computer Engineering, Sharif University of Technology, Iran in 2003 and 2005. She is now a Ph.D. candidate at Sharif University of Technology. Her research interests include machine learning and pattern recognition with primary emphasis on semi-supervised learning and clustering.
About the Author—SAEED BAGHERI SHOURAKI received his B.Sc. in Electrical Engineering and M.Sc. in Digital Electronics from Sharif University of Technology, Tehran, Iran, in 1985 and 1987. He joined soon to Computer Engineering Department of Sharif University of Technology as a faculty member. He received his Ph.D. on fuzzy control systems from Tsushin Daigaku (University of Electro-Communications), Tokyo, Japan, in 2000. He continued his activities in Computer Engineering Department up to 2008. He is currently an associate professor in Electrical Engineering Department of Sharif University of Technology. His research interests include control, robotics, artificial life, and soft computing.