One Dependence Value Difference Metric
Section snippets
Introduction and related work
In instance based learning [1], [2], [3], the distance metric plays the most important role. In fact, distance metrics are also widely used in other paradigms of machine learning, such as classification and clustering, and other research fields, such as statistics, pattern recognition, and recommender systems [4]. Many distance metrics have been proposed. When all attributes are nominal, the simplest distance metric is the Overlap Metric. We simply denote it OM in this paper, which can be
One Dependence Value Difference Metric
Our research starts from looking back to the prediction of Bayesian network classifiers. Given a test instance x, Bayesian network classifiers use Eq. (3) to predict its class label
According to Eq. (3), we need to estimate the conditional probability P(a1(x),…,an(x)∣c). However, fully estimating it is an NP-hard problem [12]. Similar to the attribute independence assumption made by VDM, naive Bayesian classifiers (simply NB) assume that all of the attributes
Experimental methodology and results
Distance weighted k-nearest neighbor (simply KNNDW) is the most representative instance based learning algorithm, in which the distance metric is used twice: finding and weighting neighbors. It should be the first test bed to demonstrate the effectiveness of a distance metric. Therefore, we use it to validate the effectiveness of our ODVDM.
In our experiments, 10 UCI [23] classification data sets are used. We use them because they represent a wide range of domains and data characteristics and
Conclusions and future work
The Value Difference Metric (simply VDM) is widely used for distance metric between each pair of instances with nominal attribute values only. In this paper, we single out an improved Value Difference Metric by relaxing its unrealistic attribute independence assumption. We call it One Dependence Value Difference Metric, simply ODVDM. In our ODVDM, the structure learning algorithms for Bayesian network classifiers, such as tree augmented naive Bayes, are used to find the dependence relationships
Acknowledgements
We thank professor Liangxiao Jiang, Harry Zhang, and Jian Yu for their kindly help. We thank anonymous reviewers for their very useful comments and suggestions. The work was supported by the National Natural Science Foundation of China under Grant Nos. 60905033 and 61071188, the Provincial Natural Science Foundation of Hubei under Grant Nos. 2009CDB139 and 2009CDB077, and the Fundamental Research Funds for the Central Universities under Grant No. CUGL090248.
Chaoqun Li, is currently a Ph.D. candidate at China University of Geosciences (Wuhan). Her research interests include data mining and machine learning.
References (30)
Tolerating noisy, irrelevant, and novel attributes in instance-based learning algorithms
International Journal of Man–Machine Studies
(1992)- et al.
A new collaborative filtering metric that improves the behavior of recommender systems
Knowledge-Based Systems
(2010) On the classification performance of TAN and general Bayesian networks
Knowledge-Based Systems
(2009)- et al.
Structure identification of Bayesian classifiers based on GMDH
Knowledge-Based Systems
(2009) - et al.
Data mining for exploring hidden patterns between KM and its performance
Knowledge-Based Systems
(2010) - et al.
Learning a locality discriminating projection for classification
Knowledge-Based Systems
(2009) - et al.
Instance-based learning algorithms
Machine Learning
(1991) Instance-based Learning
(1997)- et al.
Locally weighted learning
Artificial Intelligence Review
(1997) - et al.
Locally weighted naive Bayes
Decision tree with better class probability estimation
International Journal of Pattern Recognition and Artificial Intelligence
Improved heterogeneous distance functions
Journal of Artificial Intelligence Research
Toward memory-based reasoning
Communications of the ACM
Bayesian network classifiers
Machine Learning
Approximating discrete probability distributions with dependence trees
IEEE Transactions on Information Theory
Cited by (27)
Instance-based learning using the half-space proximal graph
2022, Pattern Recognition LettersCitation Excerpt :Some other methods include assigning different weights to each neighbor, with the idea that closer neighbors should contribute more for assigning the class label to the query [17]. Other approaches include the design of distance functions like Mahalanobis [18,19], adaptive Euclidian [20], or the Value Difference Metric (VDM) [21,22]. For each of the proposed methods and improvements of kNN, there is an additional time-consuming stage, online or offline, to estimate a proper k value of each test sample.
Financial crisis prediction model using ant colony optimization
2020, International Journal of Information ManagementUsing fine-tuned conditional probabilities for data transformation of nominal attributes
2019, Pattern Recognition LettersCitation Excerpt :However, this assumption usually doesn't hold. Therefore, many variants of VDM [14,17–19] were proposed from the perspective of relaxing the assumption. For example, One Dependence Value Difference Metric [18] calculates corresponding conditional probability terms according to the pairwise dependence of nominal attributes with class variable; Independently Weighted Value Difference Metric [17] is weighted for attributes, without the requirement for the above assumption, by using the importance of attributes that is determined by the joint mutual information between attributes and label variable.
Using differential evolution for improving distance measures of nominal values
2018, Applied Soft Computing JournalKernelized random KISS metric learning for person re-identification
2018, NeurocomputingDifferential evolution for filter feature selection based on information theory and feature ranking
2018, Knowledge-Based Systems
Chaoqun Li, is currently a Ph.D. candidate at China University of Geosciences (Wuhan). Her research interests include data mining and machine learning.
Hongwei Li, the doctoral supervisor of Chaoqun Li, is currently a professor in Department of Mathematics at China University of Geosciences (Wuhan).