Distance metrics for instance-based learning

Salzberg, Steven

doi:10.1007/3-540-54563-8_103

Steven Salzberg¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 542))

Included in the following conference series:

International Symposium on Methodologies for Intelligent Systems

252 Accesses
12 Citations

Abstract

Instance-based learning techniques use a set of stored training instances to classify new examples. The most common such learning technique is the nearest neighbor method, in which new instances are classified according to the closest training instance. A critical element of any such method is the metric used to determine distance between instances. Euclidean distance is by far the most commonly used metric; no one, however, has systematically considered whether a different metric, such as Manhattan distance, might perform equally well on naturally occurring data sets. Some evidence from psychological research indicates that Manhattan distance might be preferable in some circumstances. This paper examines three different distance metrics and presents experimental comparisons using data from three domains: malignant cancer classification, heart disease diagnosis, and diabetes prediction. The results of these studies indicate that the Manhattan distance metric works works quite well, although not better than the Euclidean metric that has become a standard for machine learning experiments. Because the nearest neighbor technique provides a good benchmark for comparisons with other learning algorithms, the results below include a number of such comparisons, which show that nearest neighbor, using any distance metric, compares quite well to other machine learning techniques.

Supported in part by the Air Force Office of Scientific Research under Grant AFOSR-89-0151.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aha, D. & D. Kibler (1989) Noise-tolerant instance-based learning algorithms. Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 794–799) Detroit, MI: Morgan Kaufmann.
Google Scholar
Aha, D., D. Kibler, D. & M. Albert (1991) Instance-based learning algorithms. Machine Learning, 6:1.
Google Scholar
Bennett, P., T. Burch, and M. Miller (1971) Diabetes mellitus in American (Pima) Indians. Lancet, 2, 125–128.
Google Scholar
Cost, S. and S. Salzberg (1990) Exemplar-based learning to predict protein folding. Proceedings of the 1990 Symposium on Computer Applications in Medical Care, Washington, D.C., November 1990.
Google Scholar
Cover, T. & P. Hart (1967). Nearest Neighbor Pattern Classification. IEEE Transactions on Information Theory, 13, 21–27.
Google Scholar
Gennari, J., P. Langley, & D. Fisher (1989) Models of incremental concept formation. Artificial Intelligence, 40, 11–61.
Google Scholar
Mangasarian, O., R. Setiono, and W. Wolberg (1989) Pattern recognition via linear programming: theory and application to medical diagnosis. Technical Report #878, Computer Sciences Department, University of Wisconsin-Madison, Sept. 1989.
Google Scholar
Medin, D. and M. Schaffer (1978) Context theory of classification learning. Psychological Review, 85:3, 207–238.
Google Scholar
Nosofsky, R. (1984) Choice, similarity, and the context theory of classification. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10:1, 104–114.
Google Scholar
Quinlan, J. R. (1986) Induction of decision trees. Machine Learning 1, 81–106.
Google Scholar
Rumelhart, D., G. Hinton, and R. Williams (1986) Learning representations by back-propagating errors. Nature 323:9, 533–536.
Google Scholar
Salzberg, S. (1991) A nearest hyperrectangle learning method. Machine Learning, 6:3, 251–276.
Google Scholar
Smith, J., J. Everhart, W. Dickson, W. Knowler, and R. Johannes (1988) Using the ADAP Learning Algorithm to Forecast the Onset of Diabetes Mellitus. Proceedings of the 1988 Symposium on Computer Applications in Medical Care, Washington, D.C., 261–265.
Google Scholar
Stanfill, C. and D. Waltz (1986) Toward memory-based reasoning. Communications of the ACM, 29:12, 1213–1228.
Google Scholar
Tversky, A. and I. Gati (1982) Similarity, separability, and the triangle inequality. Psychological Review, 89, 123–154.
Google Scholar
Wolberg, W. and O. Mangasarian (1989) Multisurface method of pattern separation applied to breast cytology diagnosis. Manuscript, Department of Surgery, Clinical Science Center, University of Wisconsin, Madison, WI.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Johns Hopkins University, 21218, Baltimore, MD
Steven Salzberg

Authors

Steven Salzberg
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Z. W. Ras M. Zemankova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Salzberg, S. (1991). Distance metrics for instance-based learning. In: Ras, Z.W., Zemankova, M. (eds) Methodologies for Intelligent Systems. ISMIS 1991. Lecture Notes in Computer Science, vol 542. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-54563-8_103

Download citation

DOI: https://doi.org/10.1007/3-540-54563-8_103
Published: 28 May 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-54563-7
Online ISBN: 978-3-540-38466-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics