Elsevier

Pattern Recognition

Volume 43, Issue 8, August 2010, Pages 2982-2992
Pattern Recognition

Non-linear metric learning using pairwise similarity and dissimilarity constraints and the geometrical structure of data

https://doi.org/10.1016/j.patcog.2010.02.022Get rights and content

Abstract

The problem of clustering with side information has received much recent attention and metric learning has been considered as a powerful approach to this problem. Until now, various metric learning methods have been proposed for semi-supervised clustering. Although some of the existing methods can use both positive (must-link) and negative (cannot-link) constraints, they are usually limited to learning a linear transformation (i.e., finding a global Mahalanobis metric). In this paper, we propose a framework for learning linear and non-linear transformations efficiently. We use both positive and negative constraints and also the intrinsic topological structure of data. We formulate our metric learning method as an appropriate optimization problem and find the global optimum of this problem. The proposed non-linear method can be considered as an efficient kernel learning method that yields an explicit non-linear transformation and thus shows out-of-sample generalization ability. Experimental results on synthetic and real-world data sets show the effectiveness of our metric learning method for semi-supervised clustering tasks.

Introduction

Distance metrics are a key issue in many machine learning algorithms [1]. Over the past few years, there has been considerable research on distance metric learning [2]. Many of the earlier studies optimize the metric with class labels for classification tasks [3], [4], [5], [6], [7], [8]. More recently, researchers have given much attention to distance learning for semi-supervised clustering tasks. As class label information is not generally available for clustering tasks, constraints are used as more natural supervisory information for these tasks. Pairwise similarity (positive) and dissimilarity (negative) constraints are the most popular kind of side information that has been used for semi-supervised clustering. However, other kinds of side information like relative comparisons have also been considered in some studies.

Over the last few years, the problem of clustering with side information (semi-supervised clustering) has received increasing attention [9], [10] and distance learning has been considered as a powerful approach for this problem. The two most frequently used approaches that include side information in the clustering algorithms are constraint-based [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21] and distance function learning [22], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32], [33], [34] approaches [24]. In the former approach, the clustering algorithm itself is modified to use the available labels or constraints to bias the search for an appropriate data clustering. However, in the latter approach, the algorithm learns a distance function prior to clustering. The learned distance function tries to put similar points close together and dissimilar points far away from each other. This approach is more flexible in the choice of distance function [33]. Additionally, it has received considerable attention in recent studies [1], [25], [28], [29], [30], [31], [33], [34] and we also use this approach.

Distance learning based on constraints has been studied by many researchers [22], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32], [33], [34]. Klein et al. [22] introduced a metric adaptation method for semi-supervised clustering. This method finds a distance measure according to the shortest path in a version of the similarity graph that has been altered by positive constraints. However, negative constraints have been employed after the metric adaptation phase during the complete-link clustering. Some latter studies [1], [23], [25], [28], [34] have considered a more popular approach that learns a global Mahalanobis metric from pairwise constraints. Xing et al. [23] proposed a convex optimization problem to learn a global Mahalanobis metric according to pairwise constraints. Bar-Hillel et al. [25] devised a more efficient, non-iterative algorithm called relevant component analysis (RCA) for learning a Mahalanobis metric. This method can only incorporate positive constraints. An extension of the RCA method that can consider both positive and negative constraints has also been introduced by Yeung and Chang [28].

More recently, some non-linear metric learning methods for semi-supervised clustering have been introduced. Chang and Yeung [29] proposed a locally linear metric learning method that considers only positive constraints. The objective function of this method has many local optima and the topology cannot be preserved well during this approach [30]. Chang and Yeung [31] proposed also a metric adaptation method. This method adjusts the location of data points iteratively, so that similar points tend to get closer and dissimilar points tend to move away from each other. As this method lacks an explicit transformation map, it cannot project new data points onto the transformed space straightforwardly [31]. Additionally, the movement of data points in this method may interfuse the structure of the data. In [30], two kernel-based metric learning methods have been presented that do have some limitations [30]. These kernel-based methods can use only positive constraints.

Among the existing metric learning methods, some of them [1], [23], [28], [34], [39], [40] can incorporate both positive and negative constraints. However, most of these methods [1], [23], [28], [34] learn only a linear transformation that corresponds to a Mahalanobis metric. Although some recent studies [39], [40] have been introduced for kernel learning from positive and negative constraints, they are based on learning non-parametric kernel matrices. These methods can only find distances of the seen data. Additionally, the optimization problems in these methods are usually difficult to solve [40] and the degree of freedom of the corresponding models is very high (i.e., n2 where n denotes the number of data points). In this paper, we propose an efficient non-linear metric learning method that considers both positive and negative constraints and also the topological structure of the data. We formulate the proposed method as a constrained trace ratio optimization problem that can be solved efficiently using algorithms introduced for this purpose (e.g., Xiang et al.'s method [1]). The proposed non-linear method can be considered as an efficient kernel learning method that does not need to learn all items of an n×n matrix. Our method yields an explicit transformation that can project new data points onto the transformed space.

The rest of this paper is organized as follows: Section 2 presents a brief review of related works. In Section 3, first the general form of the proposed optimization problems that incorporate both positive and negative constraints and also the topological structure of the data is introduced. Then, we present special problems that can be solved efficiently for learning linear and non-linear transformations. Finally, we present a kernel-based method and show the relation between the proposed non-linear method and a special form of this kernel-based method. Section 4 presents some experimental results on synthetic and real-world data sets. Concluding remarks are given in the last section.

Section snippets

Related works

In this section, we review those methods that can consider both positive and negative constraints to learn a transformation. A positive constraint denotes a pair of data points that must be in the same cluster while a negative constraint denotes two data points that must be in two different clusters [1]. Most of the existing methods that can use both positive and negative constraints learn a Mahalanobis metric A (where A is a positive semi-definite matrix) or, equivalently, find a

Proposed approach

In this section, we first propose a general framework for learning an appropriate transformation from positive and negative constraints. Based on this framework, we propose problems (that can be solved efficiently) for learning linear and non-linear transformations. Finally, we introduced our kernel-based method and show the relation between a special form of this method and the proposed non-linear metric learning method.

Here, we introduce some notations used in this section. X={x1,x2,,xn}

Experimental results

In this section, we explain experiments that we have conducted to compare our linear and non-linear metric learning methods with some existing methods. We measure the effectiveness of semi-supervised metric-learning algorithms by comparing clustering results obtained from using different metrics. We report results on both synthetic and real-world data sets.

Conclusions and future work

In this paper, we introduced a novel metric learning method for semi-supervised clustering. We proposed a general framework for learning linear and non-linear transformations using both positive and negative constraints. The proposed methods have been formulated as constrained trace ratio problems that can be solved efficiently. We considered the geometrical structure of the data along with the pairwise constraints in the proposed optimization problems. We showed that the proposed non-linear

About the Author—MAHDIEH SOLEYMANI BAGHSHAH received her B.S. and M.S. degrees from Department of Computer Engineering, Sharif University of Technology, Iran in 2003 and 2005. She is now a Ph.D. candidate at Sharif University of Technology. Her research interests include machine learning and pattern recognition with primary emphasis on semi-supervised learning and clustering.

References (40)

  • J.H. Friedman, Flexible metric nearest neighbor classification, Technical Report, Statistics Department, Stanford...
  • Z.H. Zhang, J.T. Kwok, D.Y. Yeung, Parametric distance metric learning with label information, in: IJCAI, Acapulco,...
  • M.H.C. Law, Clustering, dimensionality reduction, and side information, Ph.D. Dissertation, Michigan University,...
  • S. Basu, Semi-supervised clustering: probabilistic models, algorithms and experiments, Ph.D. Dissertation, University...
  • M.H.C. Law, A. Topchy, A.K. Jain, Model-based clustering with probabilistic constraints, in: SIAM Conference on Data...
  • S. Basu, A. Banerjee, R.J. Mooney, Semi-supervised clustering by seeding, in: 19th International Conference on Machine...
  • K. Wagstaff, C. Cardie, S. Rogers, S. Schroedl, Constrained K-means clustering with background knowledge, in: 18th...
  • Z. Lu, T. Leen, Semi-supervised learning with penalized probabilistic clustering, in: Advances in NIPS 17, MIT Press,...
  • N. Bansal et al.

    Correlation clustering

    Machine Learning

    (2004)
  • T. Lange, M.H. Law, A.K. Jain, J. Buhmann, Learning with constrained and unlabelled data, in: IEEE Computer Society...
  • Cited by (37)

    • Decision tree pairwise metric learning against adversarial attacks

      2021, Computers and Security
      Citation Excerpt :

      Despite its effectiveness on imbalanced and limited data samples, metric learning based models for adversarial sample detection such as (Chen et al., 2018; Pang et al., 2018; 2019; Wen et al., 2016) learn on the single Mahalanobis distance which is limited in its capacity to capture the shape of complex data (Baghshah and Shouraki, 2010; Duan et al., 2020; Ma and Zheng, 2016; Wang et al., 2018; Xiong et al., 2012). To resolve the aforementioned limitation, the authors (Baghshah and Shouraki, 2010; Chang, 2012; Frome et al., 2017; Ma and Zheng, 2016; Xiong et al., 2012) introduced a non-negative and symmetric metric that is able to implicitly adapt it distance function through the feature space thereby, making their distance metric learning based models adaptable to complex data. Kontschieder et al. (2016) and Hehn and Hamprecht (2018), introduced the concept of decision trees in deep learning to transform the weak Softmax Cross Entropy loss into strong classifiers.

    • Sparse Bayesian similarity learning based on posterior distribution of data

      2018, Engineering Applications of Artificial Intelligence
    View all citing articles on Scopus

    About the Author—MAHDIEH SOLEYMANI BAGHSHAH received her B.S. and M.S. degrees from Department of Computer Engineering, Sharif University of Technology, Iran in 2003 and 2005. She is now a Ph.D. candidate at Sharif University of Technology. Her research interests include machine learning and pattern recognition with primary emphasis on semi-supervised learning and clustering.

    About the Author—SAEED BAGHERI SHOURAKI received his B.Sc. in Electrical Engineering and M.Sc. in Digital Electronics from Sharif University of Technology, Tehran, Iran, in 1985 and 1987. He joined soon to Computer Engineering Department of Sharif University of Technology as a faculty member. He received his Ph.D. on fuzzy control systems from Tsushin Daigaku (University of Electro-Communications), Tokyo, Japan, in 2000. He continued his activities in Computer Engineering Department up to 2008. He is currently an associate professor in Electrical Engineering Department of Sharif University of Technology. His research interests include control, robotics, artificial life, and soft computing.

    View full text