Distance metric learning for augmenting the method of nearest neighbors for ordinal classification with absolute and relative information

doi:10.1016/j.inffus.2020.08.004

Information Fusion

Volume 65, January 2021, Pages 72-83

https://doi.org/10.1016/j.inffus.2020.08.004 Get rights and content

Highlights

•
Absolute and relative information are exploited for augmenting the method of nearest neighbors.
•
Both types of information are considered for learning an appropriate distance metric.
•
Distance metric learning improves ordinal classification performance.

Abstract

The performance of a classifier is often limited by the amount of labeled data (absolute information) available. In order to overcome this limitation, the incorporation of side information into the classification process has become a popular research topic in the field of machine learning. In this work, we propose a new method for ordinal classification that combines absolute information and a specific type of side information: relative information. In particular, this method exploits both types of information to learn an appropriate distance metric and subsequently incorporates the learned distance metric into the classical method of $k$ nearest neighbors. The experimental results show that the proposed method attains a good performance in terms of some of the most popular (ordinal) classification performance measures.

Introduction

Ordinal classification problems are common in many fields of science, such as medicine[1], [2], image processing[3] and social sciences[4]. Typically, absolute information (i.e.,examples with an explicitly given class label) is initially gathered for learning an ordinal classification model. Unfortunately, the performance of this ordinal classification model is limited by the amount of absolute information available for training. Admittedly, collecting a large amount of absolute information is usually time-consuming and costly. In order to improve the performance, additional side information is sometimes considered for ordinal classification[5]. For instance, in some application domains such as recommender systems[6], medical care[7] and food quality assessment[8], collecting relative information (i.e.,preference orders for couples of examples) is actually easier than collecting absolute information. The challenge is thus how to combine absolute and relative information for augmenting the ordinal classification performance, typically in a setting where the former is limited and the latter more abundant.

Some contributions[9], [10] have already shown the benefits of combining absolute and relative information. For example, Herk etal.[10] jointly analyzed absolute information (ratings) and relative information (rankings). They found that both ratings and rankings are frequently used to measure values or preferences, but there is no consensus for choosing one type of information over the other. They concluded that both types of information are important for facilitating the reaching of a consensus and recommended the use of absolute and relative information simultaneously for attaining a complete understanding of datasets. In order to exploit both types of information at the same time, Sader etal.[8] recently proposed a method for ordinal classification that combines absolute evaluations from experts and relative evaluations from novices. This proposal amounts to solving a constrained convex optimization problem that contains many parameters to learn, which makes the model complex and hard to explain. For this very reason, we proposed a new ordinal classification method based on the method of $k$ nearest neighbors ( $k$ -NN) that incorporates absolute and relative information[11].

In $k$ -NN, the Euclidean distance metric is typically considered the standard for identifying the nearest neighbors. However, this distance metric might not adequately describe the hidden structure in a given dataset. Distance metric learning[12], [13], [14] thus became an interesting research topic that has been studied in many different scenarios. For instance, Wang etal.[15] considered distance metric learning in the context of image classification, Feng etal.[16] studied distance metric learning for imbalanced datasets and Wang etal.[17] dealt with distance metric learning in the setting of information coming from different sources. There has also been some interest in the combination of deep neural networks and distance metric learning[18].

Recently, many strategies have been proposed to learn an appropriate distance metric for ordinal classification[19], [20], [21], [22]. For example, Xiao etal.[23] used local neighborhoods to make the pairwise distances between examples with the same class label small and the distances between examples with different class labels large. Fouad etal.[24] incorporated additional information into ordinal classification tasks by changing the distance metric in the input space based on the order relation among the class labels. Their experimental results show that the proposed distance metric learning method improves the ordinal classification performance. Nguyen etal.[25] considered ordinal information as local triplet constraints such that, in case $A ⤚ B ⤚ C$ or $A ⤙ B ⤙ C$ , examples with class label $A$ should be closer to examples with class label $B$ than to examples with class label $C$ . More specifically, they proposed a method that learns a suitable distance metric that (mostly) satisfies these constraints and subsequently incorporates the learned distance metric into $k$ -NN.

All the proposed distance metric learning strategies above only deal with absolute information. However, the setting in which a small amount of absolute information and a large amount of relative information are available is commonplace. Therefore, some additional constraints need to be imposed in order to incorporate the relative information into the learning of an appropriate distance metric. Similarly to the idea behind $k$ -NN, where close examples tend to have the same class label, we also assume that close couples of examples tend to have the same order. Following this assumption, which has been successfully tested in[11], we aim at learning a product distance metric that makes the distances between couples with the same order relation small and the distances between couples with different order relations large.

In this paper, in order to combine absolute and relative information for distance metric learning, we incorporate the corresponding constraints from both types of information into an optimization process to obtain an optimal distance metric. Next, we incorporate the learned distance metric into $k$ -NN for ordinal classification. We test our method on some available benchmark datasets. The experimental performance shows the usefulness of considering absolute and relative information and the effectiveness of our proposed distance metric learning method.

The remainder of this paper is structured as follows. Section2 introduces the preliminaries and Section3 proposes a new distance metric learning method that simultaneously exploits absolute and relative information. Experimental results and a corresponding analysis of these results are presented in Section4. We end with some conclusions and open problems in Section5.

Section snippets

Problem setting

The starting point is that of[11] in which a small amount of absolute information and a large amount of relative information is available. The goal is to exploit both types of information simultaneously in order to classify new examples.

Formally, the data includes two types of information: absolute information and relative information. The first type of information is collected in a set $A = {(x_{1}, y_{1}), (x_{2}, y_{2}), \dots, (x_{n}, y_{n})}$ with a set of input examples $D = {x_{1}, x_{2}, \dots, x_{n}}$ , where the input examples $x_{i} = (x_{i 1}, \dots,$

Combining absolute and relative information for distance metric learning for $k$ -NN

In this section, we extend the distance metric learning method proposed in[25] in order to additionally consider relative information, thus, exploiting absolute and relative information simultaneously.

Here, we combine both absolute and relative information to set the constraints for distance metric learning. Firstly, we use a similar idea to that of Nguyen etal.[25] to set the corresponding triplet constraints for absolute information. Secondly, since there are no explicit class labels for

Experiments

In this section, we describe the datasets, introduce the performance measures and analyze the experimental results. In order to show the effectiveness of combining absolute and relative information for distance metric learning, we compare the performances of the method of $k$ -NN, distance metric learning for $k$ -NN (DML $k$ -NN) as in[25], combining absolute and relative information for $k$ -NN (AR $k$ -NN) as in[11] and the here-proposed method DMLAR $k$ -NN. In particular, we show all results in the context of

Conclusion

We have proposed a new distance metric learning method for ordinal classification for the setting in which a small amount of absolute information and a large amount of relative information is available. Both types of information are incorporated to set the constraints on a local neighborhood for learning an appropriate Mahalanobis distance metric. This learned distance metric is used for replacing the Euclidean distance metric when applying the method of $k$ -NN with absolute and relative

CRediT authorship contribution statement

Mengzi Tang: Conception and design of study, Acquisition of data, Analysis and/or interpretation of data, Drafting the manuscript, Revising the manuscript critically for important intellectual content. Raúl Pérez-Fernández: Conception and design of study, Analysis and/or interpretation of data, Drafting the manuscript, Revising the manuscript critically for important intellectual content. Bernard De Baets: Conception and design of study, Analysis and/or interpretation of data, Drafting the

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

Mengzi Tang is supported by the China Scholarship Council (CSC). Raúl Pérez-Fernández acknowledges the support of the Research Foundation of Flanders, Belgium (FWO17/PDO/160) and the Spanish MINECO (TIN2017-87600-P). This research received funding from the Flemish Government, Belgium under the “Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen” programme. All authors approved the version of the manuscript to be published.

References (43)

TianQ. et al.
Comparative study among three strategies of incorporating spatial structures to ordinal image regression
Neurocomputing
(2014)
FullertonA.S. et al.
The proportional odds with partial proportionality constraints model for ordinal response variables
Soc. Sci. Res.
(2012)
SaderM. et al.
Integrating expert and novice evaluations for augmenting ordinal regression models
Inf. Fusion
(2019)
van HerkH. et al.
Insight into the relative merits of rating and ranking in a cross-national context using three-way correspondence analysis
Food Qual. Pref.
(2007)
TangM. et al.
Fusing absolute and relative information for augmenting the method of nearest neighbors for ordinal classification
Inf. Fusion
(2020)
NguyenB. et al.
Large-scale distance metric learning for k-nearest neighbors regression
Neurocomputing
(2016)
WangH. et al.
Multi-view metric learning based on KL-divergence for similarity measurement
Neurocomputing
(2017)
TianQ. et al.
Ordinal margin metric learning and its extension for cross-distribution image data
Inform. Sci.
(2016)
NguyenB. et al.
Distance metric learning for ordinal classification based on triplet constraints
Knowl.-Based Syst.
(2018)
DeMaesschalckR. et al.
The mahalanobis distance
Chemometr. Intell. Lab. Syst.
(2000)

Cruz-RamírezM. et al.

Metrics to guide a multi-objective evolutionary algorithm for ordinal classification

Neurocomputing

(2014)

WaegemanW. et al.

Learning to rank: a ROC-based graph-theoretic approach

Pattern Recognit. Lett.

(2008)

M. Pérez-Ortiz, P.A. Gutiérrez, C. García-Alonso, L. Salvador-Carulla, J.A. Salinas-Pérez, C. Hervás-Martínez, Ordinal...

DoyleO.M. et al.

Predicting progression of Alzheimer’s disease using ordinal regression

PLoS One

(2014)

J. Chen, X. Liu, S. Lyu, Boosting with side information, in: Proceedings of the Asian Conference on Computer Vision,...

FariasV.F. et al.

Learning preferences with side information

Manage. Sci.

(2019)

NguyenQ. et al.

Learning classification models with soft-label information

J. Am. Med. Inform. Assoc.

(2014)

FriedmanA. et al.

Global-scale location and distance estimates: Common representations and strategies in absolute and relative judgments

J. Exp. Psychol. Learn. Mem. Cogn.

(2006)

WeinbergerK.Q. et al.

Distance metric learning for large margin nearest neighbor classification

J. Mach. Learn. Res.

(2009)

MeiJ. et al.

Logdet divergence-based metric learning with triplet constraints and its applications

IEEE Trans. Image Process.

(2014)

WangH. et al.

Semantic discriminative metric learning for image similarity measurement

IEEE Trans. Multimed.

(2016)

Cited by (12)

An efficient multi-metric learning method by partitioning the metric space
2023, Neurocomputing
Metric learning has attracted significant attention due to its high effectiveness and efficiency for pattern recognition task. Traditional supervised metric learning algorithms attempt to seek a global distance metric with labeled samples. When data are represented with multimodal and only limited supervision information is available, these approaches are insufficient to obtain satisfactory results. In this paper, we develop a robust semi-supervised multi-metric learning method (RSMM) to improve classification performance. The proposed RSMM learns multiple local metrics and a background metric instead of a single global metric. Specifically, we divide the metric space into influential regions and background region, and then regulate the effectiveness of each local metric to be within the related regions. Simultaneously, a geometrically interpretable, symmetric distance is defined with local metrics and background metric. Based on the resultant learning bounds, we obtain the regularization term to improve the classifier’s generalization ability. Moreover, the manifold regularization term is introduced to preserve the supervision information as well as geometry structure. The substantial unlabeled samples may cause potential threats and large uncertainties, so the logarithmic loss function is utilized to enhance the robustness. An efficient gradient descent algorithm is exploited to solve the non-convex challenging problem. To further understand the proposed algorithm, we theoretically derive its robustness and generalization error bounds. Finally, numerical experiments on UCI datasets and image datasets demonstrate the feasibility and validity of the RSMM.
Ordinal classification with a spectrum of information sources
2022, Expert Systems with Applications
The amount of labeled data (referred to as absolute information) obtained from experts has an obvious effect on the performance of an ordinal classifier. In order to make up for a lack of sufficient labeled data, a possibility is to access other sources of information. Inspired by the popularity of public crowdsourcing platforms, an effective way is to invite (numerous) novices for expressing their preference orders for given couples of examples, resulting in frequentist relative information. In this work, we extend a recently introduced nearest neighbor method (with or without an associated learned distance metric) that is able to deal with both absolute and relative information to the case of such frequentist relative information. In particular, we design a new labeling rule and propose a distance metric learning method specifically designed to deal with the combination of absolute and frequentist relative information. We test the proposed method(s) in several problem settings on a variety of benchmark datasets. The experimental results confirm that incorporating frequentist relative information helps to improve the ordinal classification performance.
Partial Discharge Diagnosis in GIS based on Pulse Sequence Features and Optimized Machine Learning Classification Techniques
2022, Electric Power Systems Research
Citation Excerpt :
PD pulse sequence only necessitates the measurement of PD phase appearance and its corresponding instantaneous voltage [20]. After the extraction of PD pulse sequence features, the PD diagnosis itself was implemented using the following machine learning classification techniques: decision tree classification (DT) [21–24], ensemble methods (EN) [25–27], k-nearest neighboring (KNN) [28,29], Discriminant analysis (DA) [30,31], and Naïve Bayes classification (NB) [32–34]. The paper is constructed as follows.
Partial discharge (PD) diagnostics in Gas Insulated Switchgear (GIS) is important for reliable and secure operation of electrical utilities. Different techniques were used for PD diagnosis in GIS. In this work, PD diagnosis in GIS is proposed based on PD pulse sequence. PD pulse sequence only requires the measurement of PD phase appearance and its corresponding instantaneous voltage. The PD diagnosis of various defect types is implemented using five optimized machine learning classification techniques: decision tree classification, ensemble methods, k-nearest neighbouring, Discriminant analysis, and Naïve Bayes classification. The features used for PD pulse sequence are the voltage change and phase angle change between successive PD pulses. Three scenarios are proposed for predicting the defect types in GIS. The first scenario is built based on the extracted features for two successive PD pulses, the second scenario is built based on the extracted features for three successive PD pulses, while the last scenario is built based on the extracted features for four successive PD pulses. The results illustrate the superior detecting accuracy of the second scenario with the proposed five ML classification techniques. The optimized ML classification techniques are implemented and carried out based on MATLAB software package. The ensemble classification method exhibited the highest accuracy for PD-based diagnosis in GIS with an overall accuracy of 97.1%.
An axiomatic distance methodology for aggregating multimodal evaluations
2022, Information Sciences
Citation Excerpt :
Problem R-OA is NP-hard whether the input rankings are complete [15] or incomplete [26]. Before proceeding, it is worthwhile to explain that, while some researchers have defined analogous versions of the COA objective function given by (13) to aggregate multimodal evaluations (e.g. [35–39]), this is where the fundamental similarities with the proposed approach stop. The full specification of the optimization models is starkly different.
This work introduces a multimodal data aggregation methodology featuring optimization models and algorithms for jointly aggregating heterogeneous ordinal and cardinal evaluation inputs into a consensus evaluation. Specifically, this work derives mathematical modeling components to enforce three types of logical couplings between the collective ordinal and cardinal evaluations: Rating and ranking preferences, numerical and ordinal estimates, and rating and approval preferences. The proposed methodology is based on axiomatic distances rooted in social choice theory. Moreover, it adequately deals with highly incomplete evaluations, tied values, and other complicating aspects of group decision-making contexts. We illustrate the practicality of the proposed methodology in a case study involving an academic student paper competition. The methodology’s advantages and computational aspects are further explored via synthetic instances sampled from distributions parametrized by ground truths and varying noise levels. These results show that multimodal aggregation effectively extracts a collective truth from noisy information sources and successfully captures the distinctive evaluation qualities of rating and ranking preference data.
Classification of cervical cells leveraging simultaneous super-resolution and ordinal regression
2022, Applied Soft Computing
Citation Excerpt :
To the best of our knowledge, our method is the first one that combines the CNN based super-resolution and classification network and achieves good classification performance. Traditionally, the methods to handle ordinal regression are among machine learning techniques such as support vector machines (SVM) [62,63], extreme learning machine (ELM) [64,64] and nearest neighbors [65]. None of them considered deep structures or images as input data.
Automatic classification of cervical cells plays a critical role in the Computer-assisted Cytology Test (CCT) system. The efficiency of the CCT system can be promoted by sacrificing the microscopic image resolution to speed up the microscopic image acquisition. In this case, the low resolution of the cell image will severely deteriorate the performance of available Convolutional Neural Networks (CNN) based classification methods. Inspired by the positive effect of super-resolution in addressing classification or recognition tasks, we propose a cervical cell classification algorithm leveraging simultaneous super-resolution, which is achieved using Generative Adversarial Network (GAN) techniques. Our framework is designed in an end-to-end manner wherein the classification loss is back-propagated into the super-resolution network during training. Moreover, we perform ordinal regression with smooth L1 loss to further improve the classification results. Extensive experiments have verified the effectiveness of our method. Our simultaneous super-resolution based method achieves 93.5% classification accuracy on the 6-class Heer dataset, outperforming the method using only the state-of-the-art classifier by an obvious margin of 3.2%. Besides, our ordinal regression method significantly improves the MAE (Mean Absolute Error) by 0.0143 and 1-off accuracy by 0.95% on the 4-class Heer dataset. For the Herlev dataset, our method yields the classification accuracy of 98.1% and 97.6% for the 2-class and 7-class problems, which is still competitive even with low-resolution input.
A comparative study of machine learning methods for ordinal classification with absolute and relative information
2021, Knowledge-Based Systems
Citation Excerpt :
However, the present work is more ambitious and aims at augmenting other popular ordinal classification methods and comparing their performance. It is anticipated that, as in the classical setting, there is no augmented method that will outperform all others and that the method proposed by the present authors in [22] will stand the test in the comparison with the augmented versions of the most popular methods for ordinal classification found in the literature. In this section, we first introduce our problem setting and then augment several machine learning methods for ordinal classification to deal with both absolute and relative information.
The performance of an ordinal classifier is highly affected by the amount of absolute information (labelled data) available for training. In order to make up for a lack of sufficient absolute information, an effective way out is to consider additional types of information. In this work, we focus on ordinal classification problems that are provided with additional relative information. We augment several classical machine learning methods by considering both absolute and relative information as constraints in the corresponding optimization problems. We compare these augmented methods on popular benchmark datasets. The experimental results show the effectivenesses of these methods for combining absolute and relative information.

View all citing articles on Scopus

View full text

Distance metric learning for augmenting the method of nearest neighbors for ordinal classification with absolute and relative information

Highlights

Abstract

Introduction

Section snippets

Problem setting

Combining absolute and relative information for distance metric learning for k-NN

Experiments

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgments

Neurocomputing

Soc. Sci. Res.

Inf. Fusion

Food Qual. Pref.

Inf. Fusion

Neurocomputing

Neurocomputing

Inform. Sci.

Knowl.-Based Syst.

Chemometr. Intell. Lab. Syst.

Neurocomputing

Pattern Recognit. Lett.

Predicting progression of Alzheimer’s disease using ordinal regression

PLoS One

Learning preferences with side information

Manage. Sci.

Learning classification models with soft-label information

J. Am. Med. Inform. Assoc.

Global-scale location and distance estimates: Common representations and strategies in absolute and relative judgments

J. Exp. Psychol. Learn. Mem. Cogn.

Distance metric learning for large margin nearest neighbor classification

J. Mach. Learn. Res.

Logdet divergence-based metric learning with triplet constraints and its applications

IEEE Trans. Image Process.

Semantic discriminative metric learning for image similarity measurement

IEEE Trans. Multimed.

Combining absolute and relative information for distance metric learning for $k$ -NN