Skip to main content

Distance metrics for instance-based learning

  • Communications
  • Conference paper
  • First Online:
Methodologies for Intelligent Systems (ISMIS 1991)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 542))

Included in the following conference series:

Abstract

Instance-based learning techniques use a set of stored training instances to classify new examples. The most common such learning technique is the nearest neighbor method, in which new instances are classified according to the closest training instance. A critical element of any such method is the metric used to determine distance between instances. Euclidean distance is by far the most commonly used metric; no one, however, has systematically considered whether a different metric, such as Manhattan distance, might perform equally well on naturally occurring data sets. Some evidence from psychological research indicates that Manhattan distance might be preferable in some circumstances. This paper examines three different distance metrics and presents experimental comparisons using data from three domains: malignant cancer classification, heart disease diagnosis, and diabetes prediction. The results of these studies indicate that the Manhattan distance metric works works quite well, although not better than the Euclidean metric that has become a standard for machine learning experiments. Because the nearest neighbor technique provides a good benchmark for comparisons with other learning algorithms, the results below include a number of such comparisons, which show that nearest neighbor, using any distance metric, compares quite well to other machine learning techniques.

Supported in part by the Air Force Office of Scientific Research under Grant AFOSR-89-0151.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Aha, D. & D. Kibler (1989) Noise-tolerant instance-based learning algorithms. Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 794–799) Detroit, MI: Morgan Kaufmann.

    Google Scholar 

  • Aha, D., D. Kibler, D. & M. Albert (1991) Instance-based learning algorithms. Machine Learning, 6:1.

    Google Scholar 

  • Bennett, P., T. Burch, and M. Miller (1971) Diabetes mellitus in American (Pima) Indians. Lancet, 2, 125–128.

    Google Scholar 

  • Cost, S. and S. Salzberg (1990) Exemplar-based learning to predict protein folding. Proceedings of the 1990 Symposium on Computer Applications in Medical Care, Washington, D.C., November 1990.

    Google Scholar 

  • Cover, T. & P. Hart (1967). Nearest Neighbor Pattern Classification. IEEE Transactions on Information Theory, 13, 21–27.

    Google Scholar 

  • Gennari, J., P. Langley, & D. Fisher (1989) Models of incremental concept formation. Artificial Intelligence, 40, 11–61.

    Google Scholar 

  • Mangasarian, O., R. Setiono, and W. Wolberg (1989) Pattern recognition via linear programming: theory and application to medical diagnosis. Technical Report #878, Computer Sciences Department, University of Wisconsin-Madison, Sept. 1989.

    Google Scholar 

  • Medin, D. and M. Schaffer (1978) Context theory of classification learning. Psychological Review, 85:3, 207–238.

    Google Scholar 

  • Nosofsky, R. (1984) Choice, similarity, and the context theory of classification. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10:1, 104–114.

    Google Scholar 

  • Quinlan, J. R. (1986) Induction of decision trees. Machine Learning 1, 81–106.

    Google Scholar 

  • Rumelhart, D., G. Hinton, and R. Williams (1986) Learning representations by back-propagating errors. Nature 323:9, 533–536.

    Google Scholar 

  • Salzberg, S. (1991) A nearest hyperrectangle learning method. Machine Learning, 6:3, 251–276.

    Google Scholar 

  • Smith, J., J. Everhart, W. Dickson, W. Knowler, and R. Johannes (1988) Using the ADAP Learning Algorithm to Forecast the Onset of Diabetes Mellitus. Proceedings of the 1988 Symposium on Computer Applications in Medical Care, Washington, D.C., 261–265.

    Google Scholar 

  • Stanfill, C. and D. Waltz (1986) Toward memory-based reasoning. Communications of the ACM, 29:12, 1213–1228.

    Google Scholar 

  • Tversky, A. and I. Gati (1982) Similarity, separability, and the triangle inequality. Psychological Review, 89, 123–154.

    Google Scholar 

  • Wolberg, W. and O. Mangasarian (1989) Multisurface method of pattern separation applied to breast cytology diagnosis. Manuscript, Department of Surgery, Clinical Science Center, University of Wisconsin, Madison, WI.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Z. W. Ras M. Zemankova

Rights and permissions

Reprints and permissions

Copyright information

© 1991 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Salzberg, S. (1991). Distance metrics for instance-based learning. In: Ras, Z.W., Zemankova, M. (eds) Methodologies for Intelligent Systems. ISMIS 1991. Lecture Notes in Computer Science, vol 542. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-54563-8_103

Download citation

  • DOI: https://doi.org/10.1007/3-540-54563-8_103

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-54563-7

  • Online ISBN: 978-3-540-38466-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics