Learning from pairwise constraints by Similarity Neural Networks
Introduction
Similarity learning algorithms induce a similarity measure, that is suitable for comparing data points, exploiting a set of supervised examples. Given two points represented in a feature space, , the assumption behind similarity learning is that is non-Euclidean, so that the similarity degree between and may not be appropriately computed by the classical Euclidean distance. Supervision is generally provided in the form of similarity/dissimilarity labels on pairs of points, also referred to as pairwise constraints. The learned similarity function should approximate the supervisor’s perception of similarity in the given feature space and it can be used to partition data in semi–supervised clustering algorithms.
In the last few decades, the human perception of similarity received a growing attention from researchers in psychology and mathematics, who studied its properties and tried to define appropriate models of similarity functions (Richter, 1992, Tversky, 1977, Wallace, 1958). More recently, the problem of learning a similarity measure has attracted also the machine learning community. In particular, in a wide set of fields, ranging from bioinformatics to information retrieval, from robotics to computer vision, often a supervision on the relationships between two entities is available, and an appropriate criterion to compare new data is to be inferred.
The term similarity measure is frequently used in a generic sense, describing both similarity and dissimilarity functions. Following psychological evidence on how humans learn, similarity measures are not required to satisfy all the metric properties (Santini and Jain, 1999, Tversky, 1977). In particular, the similarity relationship is not necessarily transitive.
In this paper we focus on two major aspects of similarity learning: the inductive learning of a similarity measure from pairwise constraints in a fully supervised scenario in which the trained model provides a natural out-of-sample extension of the similarity criterion to novel pairs of points, and, successively, the application of the learned function to semi-supervised partitional clustering of unlabeled data. The former point is remarkable, and it casts the learning problem in a more challenging scenario with respect to the transductive setting in which many existing algorithms operate (i.e. they compute distances within the training pairs only). For instance, the learned measure could be used to group or compare data that is not available at training time, or that it is incrementally added to the existing collection of patterns as soon as it is acquired or provided to the system. In particular, the contributions of this paper are the following:
- •
the definition of Similarity Neural Networks (SNNs), a neural network model designed to learn non-linear similarity measures from pairwise constraints and to generalize the learned criterion to compare previously unseen data pairs. The network architecture guarantees the symmetry and non negativity of the implemented similarity measures, independently of the available supervision.
- •
the analysis of the theoretical properties and of the approximation capabilities of SNNs, that are proven to be universal approximators for symmetric functions.
- •
the definition of a technique to compute the optimal cluster representatives in semi-supervised partitional clustering, exploiting SNNs to implement the similarity concept. As a matter of fact, due to the non-linearity of the learned function, the SNN model cannot be directly applied to K-Means clustering without considering approximations of data representatives. To overcome this issue, we describe how to compute optimal representatives with respect to the SNN function by means of a modified backpropagation scheme, that is extended to the input layer of the network and biased by a norm regularizer. This approach is a more efficient version of the technique that we proposed in Melacci, Maggini, and Sarti (2009).
- •
the critical review of the most popular seeding and constraining policies, and the definition of new ones, in order to show that both the learning of the similarity measure and the clustering process can be “guided” by the available pairwise constraints, improving the quality of the data partitioning in space regions where SNNs were not able to perfectly model the similarity function;
- •
a detailed experimental analysis, that has been conducted to measure the performances of SNNs. We compare the new model against many popular inductive learning algorithms for similarity measure estimation, considering linear, non-linear, and kernel-based techniques. Experiments are performed on several benchmarks from the UCI repository and on real data from the US Postal System (Asuncion & Newman, 2007). SNNs compare favorably with the considered methods, showing the advantage of their flexible and non-linear model, and the benefits in their application to tasks that require to partition data accordingly to an adaptive similarity criterion.
The paper is organized as follows. In the next section the notation and the properties of the considered pairwise relationships are presented. Section 3 reviews the related work on similarity learning available in the literature. The SNN model and its theoretical properties are described in Section 4. Section 5 describes the application of SNNs to semi-supervised clustering, and in Section 6 a detailed experimental analysis is reported. Finally, Section 7 draws some conclusions and delineates the directions for future research.
Section snippets
Learning pairwise relationships
In this section we introduce the general formulation of learning from pairwise constraints and the special case when pairwise relationships are obtained from the class labels available for each single data point.
General formulation. Given a set of data points , we consider the case when the available supervision is represented by a set of symmetric pairwise relationships, or constraints. In details, , where contains the must-link constraints, or similarity
Related work
In the machine learning literature, similarity-based learning collects a large number of significantly different approaches. In the following we briefly summarize the main contributions, focusing on the techniques that are particularly related to SNNs. The existing algorithms can be roughly divided into three main categories, based on the type of the provided supervision.
Unsupervised. Many unsupervised algorithms that make specific assumptions on the distribution of data are frequently referred
Similarity neural networks
An SNN consists of a feedforward Multi-Layer Perceptron (MLP) (Haykin, 1998) trained to learn a similarity measure for pairs of patterns , , using binary supervisions. Given a set of objects and a set of pairwise constraints , the SNN learning set is defined as
The set collects triples , being the similarity/dissimilarity label of the pair ,
Semi–supervised clustering by Similarity Neural Networks
Partitional clustering algorithms, such as K-Means and K-Medoids (Duda et al., 2000, Kaufman and Rousseeuw, 1987), divide a set of objects into clusters by searching for representatives which minimize the average dissimilarity of all objects to the nearest representative. The representative of the -th cluster is computed as follows
In K-Medoids, , , are referred to as medoids, and they are selected from points of . Differently, the centroids of
Experimental results
In order to evaluate the performances of SNNs and their application to partitional semi-supervised clustering, we selected 7 popular datasets from the UCI repository (Asuncion & Newman, 2007). The resulting optimal setup is tested on the handwritten digit data from the US Postal System.2Table 1 reports the main characteristics of each dataset.
For each benchmark a set of classes is defined. Following the framework described
Conclusions and future work
In this paper we presented the Similarity Neural Network (SNN) model. SNNs are designed to learn similarity measures from pairwise constraints that describe similarity/dissimilarity relationships between patterns. Due to their particular architecture, they guarantee to compute a symmetric and non negative function, independently from the available supervision, and they naturally provide an out-of-sample extension to novel pairs of data points. The approximation capabilities of SNN have been
References (58)
- et al.
Generative models for similarity-based classification
Pattern Recognition
(2008) - et al.
Prototype selection for dissimilarity-based classifiers
Pattern Recognition
(2006) - et al.
Multilayer feedforward networks with a nonpolynomial activation function can approximate any function
Neural Networks
(1993) - et al.
Learning a Mahalanobis distance metric for data clustering and classification
Pattern Recognition
(2008) - et al.
Learning a Mahalanobis distance metric for data clustering and classification
Pattern Recognition
(2008) - Asuncion, A., & Newman, D. (2007). UCI machine learning...
- et al.
Semi-supervised metric learning using pairwise constraints
- et al.
Efficient kernel learning from constraints and unlabeled data
- et al.
Kernel-based metric learning for semi-supervised clustering
Neurocomputing
(2010) - et al.
Non-linear metric learning using pairwise similarity and dissimilarity constraints and the geometrical structure of data
Pattern Recognition
(2010)
Learning distance functions using equivalence relations
Learning a Mahalanobis metric from equivalence constraints
The Journal of Machine Learning Research
A probabilistic framework for semi-supervised clustering
Constrained clustering: advances in algorithms, theory, and applications
Laplacian eigenmaps for dimensionality reduction and data representation
Neural computation
Integrating constraints and metric learning in semi-supervised clustering
Kernel-based metric adaptation with pairwise constraints
Similarity-based classification: concepts and algorithms
The Journal of Machine Learning Research
Multidimensional scaling
Large margin nearest neighbor classifiers
IEEE Transactions on Neural Networks
Pattern classification
Weka: a machine learning workbench for data mining
Neighbourhood components analysis
Neural networks: a comprehensive foundation
Boosting margin based distance functions for clustering
Learning distance functions for image retrieval
A best possible heuristic for the -center problem
Mathematics of Operations Research
Cited by (20)
Discriminative information-based nonparallel support vector machine
2019, Signal ProcessingCitation Excerpt :In detail, must-link means a pair of samples should be allotted to the same cluster, while cannot-link performs the opposite operation. PCs have been applied to clustering [30], semi-supervised classification [31] tasks, supervised classification [32], feature extraction [33], dimension reduction [34] and neural network [35]. In MPC, the weight Wij is relatively large when the two samples are spatially close to each other, while Wij becomes relatively small when the two samples are spatially away from each other, which is the most commonly used spatial distribution learning strategy.
Matrixized learning machine with modified pairwise constraints
2015, Pattern RecognitionCitation Excerpt :As for pairwise constrained classification methods, PC could be used in multi-tech methods, such as the multi-label ensemble learning [28], the multi-feature-oriented scene classification [55], and the ensemble of multi-classifiers [22]. Besides, PC is utilized for semi-supervised classification [16,?,58], the feature extraction [36,53,?], the dimension reduction [43], and the neural network [31]. To overcome this drawback, we wish that the introduced spatial information could make the effect of the traditional PC more significant, i.e., the newly-acquired knowledge should satisfy what Table 2 lists.
On Multi-Robot Data Collection and Offloading for Space-Aerial-Surface Computing
2023, IEEE Wireless CommunicationsAsymptotic stability of singular delayed reaction-diffusion neural networks
2022, Neural Computing and ApplicationsReductive and effective discriminative information-based nonparallel support vector machine
2022, Applied Intelligence