One Dependence Value Difference Metric

doi:10.1016/j.knosys.2011.01.005

Knowledge-Based Systems

Volume 24, Issue 5, July 2011, Pages 589-594

https://doi.org/10.1016/j.knosys.2011.01.005 Get rights and content

Abstract

Many distance-related algorithms depend upon a good distance metric to be successful. The Value Difference Metric, simply VDM, is proposed to find reasonable distance metric between each pair of instances with nominal attribute values only. In VDM, all of the attributes are assumed to be fully independent, and the difference between two values of an attribute is only considered to be closer if they have more similar correlation with the output classes. It is obvious that the attribute independence assumption in VDM is rarely true in reality, which would harm its performance in the applications with complex attribute dependencies. In this paper, we single out an improved Value Difference Metric by relaxing its unrealistic attribute independence assumption. We call it One Dependence Value Difference Metric, simply ODVDM. In ODVDM, the structure learning algorithms for Bayesian network classifiers, such as tree augmented naive Bayes, are used to find the dependence relationships among the attributes. Our experimental results validate its effectiveness in terms of classification accuracy.

Section snippets

Introduction and related work

In instance based learning [1], [2], [3], the distance metric plays the most important role. In fact, distance metrics are also widely used in other paradigms of machine learning, such as classification and clustering, and other research fields, such as statistics, pattern recognition, and recommender systems [4]. Many distance metrics have been proposed. When all attributes are nominal, the simplest distance metric is the Overlap Metric. We simply denote it OM in this paper, which can be

One Dependence Value Difference Metric

Our research starts from looking back to the prediction of Bayesian network classifiers. Given a test instance x, Bayesian network classifiers use Eq. (3) to predict its class label $c (x) = \arg \max_{c \in C} P (c) P (a_{1} (x), \dots, a_{n} (x) | c)$

According to Eq. (3), we need to estimate the conditional probability P(a₁(x),…,a_n(x)∣c). However, fully estimating it is an NP-hard problem [12]. Similar to the attribute independence assumption made by VDM, naive Bayesian classifiers (simply NB) assume that all of the attributes

Experimental methodology and results

Distance weighted k-nearest neighbor (simply KNNDW) is the most representative instance based learning algorithm, in which the distance metric is used twice: finding and weighting neighbors. It should be the first test bed to demonstrate the effectiveness of a distance metric. Therefore, we use it to validate the effectiveness of our ODVDM.

In our experiments, 10 UCI [23] classification data sets are used. We use them because they represent a wide range of domains and data characteristics and

Conclusions and future work

The Value Difference Metric (simply VDM) is widely used for distance metric between each pair of instances with nominal attribute values only. In this paper, we single out an improved Value Difference Metric by relaxing its unrealistic attribute independence assumption. We call it One Dependence Value Difference Metric, simply ODVDM. In our ODVDM, the structure learning algorithms for Bayesian network classifiers, such as tree augmented naive Bayes, are used to find the dependence relationships

Acknowledgements

We thank professor Liangxiao Jiang, Harry Zhang, and Jian Yu for their kindly help. We thank anonymous reviewers for their very useful comments and suggestions. The work was supported by the National Natural Science Foundation of China under Grant Nos. 60905033 and 61071188, the Provincial Natural Science Foundation of Hubei under Grant Nos. 2009CDB139 and 2009CDB077, and the Fundamental Research Funds for the Central Universities under Grant No. CUGL090248.

Chaoqun Li, is currently a Ph.D. candidate at China University of Geosciences (Wuhan). Her research interests include data mining and machine learning.

References (30)

D. Aha
Tolerating noisy, irrelevant, and novel attributes in instance-based learning algorithms
International Journal of Man–Machine Studies
(1992)
J. Bobadilla et al.
A new collaborative filtering metric that improves the behavior of recommender systems
Knowledge-Based Systems
(2010)
M.G. Madden
On the classification performance of TAN and general Bayesian networks
Knowledge-Based Systems
(2009)
J. Xiao et al.
Structure identification of Bayesian classifiers based on GMDH
Knowledge-Based Systems
(2009)
W. Wu et al.
Data mining for exploring hidden patterns between KM and its performance
Knowledge-Based Systems
(2010)
J. Hu et al.
Learning a locality discriminating projection for classification
Knowledge-Based Systems
(2009)
D. Aha et al.
Instance-based learning algorithms
Machine Learning
(1991)
T.M. Mitchell
Instance-based Learning
(1997)
C.G. Atkeson et al.
Locally weighted learning
Artificial Intelligence Review
(1997)
E. Frank et al.
Locally weighted naive Bayes

L. Jiang et al.

Decision tree with better class probability estimation

International Journal of Pattern Recognition and Artificial Intelligence

(2009)

D. Randall Wilson et al.

Improved heterogeneous distance functions

Journal of Artificial Intelligence Research

(1997)

C. Stanfill et al.

Toward memory-based reasoning

Communications of the ACM

(1986)

N. Friedman et al.

Bayesian network classifiers

Machine Learning

(1997)

C.K. Chow et al.

Approximating discrete probability distributions with dependence trees

IEEE Transactions on Information Theory

(1968)

Cited by (27)

Instance-based learning using the half-space proximal graph
2022, Pattern Recognition Letters
Citation Excerpt :
Some other methods include assigning different weights to each neighbor, with the idea that closer neighbors should contribute more for assigning the class label to the query [17]. Other approaches include the design of distance functions like Mahalanobis [18,19], adaptive Euclidian [20], or the Value Difference Metric (VDM) [21,22]. For each of the proposed methods and improvements of kNN, there is an additional time-consuming stage, online or offline, to estimate a proper k value of each test sample.
The primary example of instance-based learning is the $k$ -nearest neighbor rule (kNN), praised for its simplicity and the capacity to adapt to new unseen data and toss away old data. The main disadvantages often mentioned are the classification complexity, which is $O (n)$ , and the estimation of the parameter $k$ , the number of nearest neighbors to be used. The use of indexes at classification time lifts the former disadvantage, while there is no conclusive method for the latter.
This paper presents a parameter-free instance-based learning algorithm using the Half-Space Proximal (HSP) graph. The HSP neighbors simultaneously possess proximity and variety concerning the center node. To classify a given query, we compute its HSP neighbors and apply a simple majority rule over them. In our experiments, the resulting classifier bettered $K N N$ for any $k$ in a battery of datasets. This improvement sticks even when applying weighted majority rules to both kNN and HSP classifiers.
Surprisingly, when using a probabilistic index to approximate the HSP graph and consequently speeding-up the classification task, our method could improve its accuracy in stark contrast with the kNN classifier, which worsens with a probabilistic index.
Financial crisis prediction model using ant colony optimization
2020, International Journal of Information Management
Financial decisions are often based on classification models which are used to assign a set of observations into predefined groups. Different data classification models were developed to foresee the financial crisis of an organization using their historical data. One important step towards the development of accurate financial crisis prediction (FCP) model involves the selection of appropriate variables (features) which are relevant for the problems at hand. This is termed as feature selection problem which helps to improve the classification performance. This paper proposes an Ant Colony Optimization (ACO) based financial crisis prediction (FCP) model which incorporates two phases: ACO based feature selection (ACO-FS) algorithm and ACO based data classification (ACO-DC) algorithm. The proposed ACO-FCP model is validated using a set of five benchmark dataset includes both qualitative and quantitative. For feature selection design, the developed ACO-FS method is compared with three existing feature selection algorithms namely genetic algorithm (GA), Particle Swarm Optimization (PSO) algorithm and Grey Wolf Optimization (GWO) algorithm. In addition, a comparison of classification results is also made between ACO-DC and state of art methods. Experimental analysis shows that the ACO-FCP ensemble model is superior and more robust than its counterparts. In consequence, this study strongly recommends that the proposed ACO-FCP model is highly competitive than traditional and other artificial intelligence techniques.
Using fine-tuned conditional probabilities for data transformation of nominal attributes
2019, Pattern Recognition Letters
Citation Excerpt :
However, this assumption usually doesn't hold. Therefore, many variants of VDM [14,17–19] were proposed from the perspective of relaxing the assumption. For example, One Dependence Value Difference Metric [18] calculates corresponding conditional probability terms according to the pairwise dependence of nominal attributes with class variable; Independently Weighted Value Difference Metric [17] is weighted for attributes, without the requirement for the above assumption, by using the importance of attributes that is determined by the joint mutual information between attributes and label variable.
Most of existing machine learning algorithms do not natively support nominal attributes, so it is essential to develop the data transformation of nominal attributes into high-quality numeric ones. Conditional Probability Transformation (CPT), using conditional probability terms to transform categories in nominal attributes, is competitive with state-of-the-art transformation methods such as One-Hot Encoding (OHE) and Separability Split Value Transformation (SSVT). However, it may be difficult to accurately estimate conditional probability terms when training data is insufficient or there exist strong dependencies among its attributes. Inspired by the fine-tuning method for improving conditional probability terms in distance measures, we proposed a Fine-Tuned Conditional Probability Transformation (FTCPT). In addition, we proposed an Improved SSV (ISSV) based on fine-tuned conditional probability terms, and used our Modified MIC-based Feature Selection method to further improve the performance of FTCPT. Experiment results show that the proposed methods can improve the quality of data transformation, thereby contribute to improving the classification performance of subsequent machine learning algorithm.
Using differential evolution for improving distance measures of nominal values
2018, Applied Soft Computing Journal
Enhancing distance measures is the key to improve the performance of instance-based learning (IBL) and many machine learning (ML) algorithms. The value difference metrics (VDM) and inverted specific-class distance measure (ISCDM) are among the top performing distance measures that address nominal attribute. They use conditional probability terms to estimate the distance between nominal values; therefore, their accuracy mainly depends on the accurate estimation of these terms. An accurate estimation of conditional probability terms can be difficult if the training data is scarce. In this study, different metaheuristic approaches are used to find better estimations these terms for both VDM and ISCDM independently. We transform the conditional probability estimation problem into an optimization problem, and exploit three meta-heuristic approaches to solve it, namely, multi-parent differential evolution (MPDE), genetic algorithms (GA), and simulated annealing (SA). The goal of the objective function is to maximize the classification accuracy of the k-nearest neighbors (kNN) algorithm. We propose a new fine-tuning method which we name modified selective fine-tuning (MSFT) method, a new hybrid fine-tuning method (i.e., a combination of two fine-tuning methods), and three different ways for creating initial populations by manipulating the original estimated conditional probability terms used in VDM and ISCDM, and the fine-tuned conditional probability terms obtained from using other fine-tuning methods. We compare the performance of all approaches with the original distance measures using 53 general benchmark datasets. The experimental results show that the proposed methods significantly improve the classification and generalization accuracy of the VDM and ISCDM measures.
Kernelized random KISS metric learning for person re-identification
2018, Neurocomputing
Person re-identification is critical for human tracking in the video surveillance which has attracted more and more attention in recent years. Various recent approaches have made great progress in re-identification performance using metric learning techniques and among them, Keep It Simple and Straightforward (KISS) metric learning method has shown remarkably importance because of its simpleness and high-efficiency. The KISS method is based on an assumption that the differences between feature pairs obey the Gaussian distribution. However, for most existing features of person re-identification, the distributions of differences between feature pairs are irregular and undulant. Therefore, prior to the Guassian based metric learning step, it's important to augment the Guassian distribution of data without losing discernment. Moreover, most metric learning methods were greatly influenced by the small sample size (SSS) problem and the KISS method is no exception, which causing the inexistence of inverse of covariance matrices. To solve the above two problems, we present Kernelized Random KISS (KRKISS) metric learning method. By transforming the original features into kernelized features, the differences between feature pairs can better fit the Gaussian distribution and thus they can be more suitable for the Guassian assumption based models. To solve the inverse of covariance matrix estimation problem, we apply a random subspace ensemble method to obtain exact estimation of covariance matrix by randomly selecting and combining several different subspaces. In each subspace, the influence of SSS problem can be minimized. Experimental results on three challenging person re-identification datasets demonstrate that the KRKISS method significantly improves the KISS method and achieves better performance than most existing metric learning approaches.
Differential evolution for filter feature selection based on information theory and feature ranking
2018, Knowledge-Based Systems
Feature selection is an essential step in various tasks, where filter feature selection algorithms are increasingly attractive due to their simplicity and fast speed. A common filter is to use mutual information to estimate the relationships between each feature and the class labels (mutual relevancy), and between each pair of features (mutual redundancy). This strategy has gained popularity resulting a variety of criteria based on mutual information. Other well-known strategies are to order each feature based on the nearest neighbor distance as in ReliefF, and based on the between-class variance and the within-class variance as in Fisher Score. However, each strategy comes with its own advantages and disadvantages. This paper proposes a new filter criterion inspired by the concepts of mutual information, ReliefF and Fisher Score. Instead of using mutual redundancy, the proposed criterion tries to choose the highest ranked features determined by ReliefF and Fisher Score while providing the mutual relevance between features and the class labels. Based on the proposed criterion, two new differential evolution (DE) based filter approaches are developed. While the former uses the proposed criterion as a single objective problem in a weighted manner, the latter considers the proposed criterion in a multi-objective design. Moreover, a well known mutual information feature selection approach (MIFS) based on maximum-relevance and minimum-redundancy is also adopted in single-objective and multi-objective DE algorithms for feature selection. The results show that the proposed criterion outperforms MIFS in both single objective and multi-objective DE frameworks. The results also indicate that considering feature selection as a multi-objective problem can generally provide better performance in terms of the feature subset size and the classification accuracy.

View all citing articles on Scopus

Chaoqun Li, is currently a Ph.D. candidate at China University of Geosciences (Wuhan). Her research interests include data mining and machine learning.

Hongwei Li, the doctoral supervisor of Chaoqun Li, is currently a professor in Department of Mathematics at China University of Geosciences (Wuhan).

View full text

One Dependence Value Difference Metric

Abstract

Section snippets

Introduction and related work

One Dependence Value Difference Metric

Experimental methodology and results

Conclusions and future work

Acknowledgements

International Journal of Man–Machine Studies

Knowledge-Based Systems

Knowledge-Based Systems

Knowledge-Based Systems

Knowledge-Based Systems

Knowledge-Based Systems

Instance-based learning algorithms

Machine Learning

Instance-based Learning

Locally weighted learning

Artificial Intelligence Review

Locally weighted naive Bayes

Decision tree with better class probability estimation

International Journal of Pattern Recognition and Artificial Intelligence

Improved heterogeneous distance functions

Journal of Artificial Intelligence Research

Toward memory-based reasoning

Communications of the ACM

Bayesian network classifiers

Machine Learning

Approximating discrete probability distributions with dependence trees

IEEE Transactions on Information Theory