Abstract
The study develops a new method for finding the optimal non-Euclidean distance metric in the nearest neighbour algorithm. The data used to develop this method is a real world doctor shopper classification problem. A statistical measure derived from Shannon’s information theory – known as mutual information - is used to weight attributes in the distance metric. This weighted distance metric produced a much better agreement rate on a five-class classification task than the Euclidean distance metric (63% versus 51%). The agreement rate increased to 77% and 73% respectively when a genetic algorithm and simulated annealing were used to further optimise the weights. This excellent performance paves the way for the development of a highly accurate system for detecting high risk doctor-shoppers both automatically and efficiently.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Dasarathy, B. V.: 1991, NN (Nearest Neighbour) Norms: NN pattern Classification Techniques, IEEE CS Press, Los Alamitos, Calif.
Haykin, S.: 1994, Neural Networks: A Comprehensive Foundation, McMillan, New York.
Holland, J. H.: 1992, Adaptation in Natural and Artificial Systems, MIT Press, Cambridge, MA.
Kirkpatrick, S.: 1983, Optimisation by simulated annealing, Science 220, 671–680.
Kirkpatrick, S.: 1984, Optimisation by simulated annealing: Quantitative studies, Journal of Statistical Physics 34, 975–986.
Linsker, R.: 1990, Connectionist modelling and brain function: The developing interface, MIT Press, Cambridge, MA, pp. 351–392.
Shannon, C. E. and Weaver, W.: 1949, The Mathematical Theory of Communication, University of Illinois Press, Urbana, IL.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
He, H., Hawkins, S. (1999). Optimising the Distance Metric in the Nearest Neighbour Algorithm on a Real-World Patient Classification Problem. In: Zhong, N., Zhou, L. (eds) Methodologies for Knowledge Discovery and Data Mining. PAKDD 1999. Lecture Notes in Computer Science(), vol 1574. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48912-6_49
Download citation
DOI: https://doi.org/10.1007/3-540-48912-6_49
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65866-5
Online ISBN: 978-3-540-48912-2
eBook Packages: Springer Book Archive