Weighting features

Wettschereck, Dietrich; Aha, David W.

doi:10.1007/3-540-60598-3_31

Dietrich Wettschereck¹ &
David W. Aha²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1010))

Included in the following conference series:

International Conference on Case-Based Reasoning

360 Accesses
74 Citations

Abstract

Many case-based reasoning algorithms retrieve cases using a derivative of the k-nearest neighbor (k-NN) classifier, whose similarity function is sensitive to irrelevant, interacting, and noisy features. Many proposed methods for reducing this sensitivity parameterize k-NN's similarity function with feature weights. We focus on methods that automatically assign weight settings using little or no domain-specific knowledge. Our goal is to predict the relative capabilities of these methods for specific dataset characteristics. We introduce a five-dimensional framework that categorizes automated weight-setting methods, empirically compare methods along one of these dimensions, summarize our results with four hypotheses, and describe additional evidence that supports them. Our investigation revealed that most methods correctly assign low weights to completely irrelevant features, and methods that use performance feedback demonstrate three advantages over other methods (i.e., they require less pre-processing, better tolerate interacting features, and increase learning rate).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aha, D. W. (1990). A study of instance-based learning algorithms for supervised learning tasks: Mathematical, empirical, and psychological evaluations (TR 90-42). Irvine, CA: University of California, Department of Information and Computer Science.
Google Scholar
Aha, D. W. (1991). Incremental constructive induction: An instance-based approach. In Proceedings of the Eighth International Workshop on Machine Learning (pp. 117–121). Evanston, IL: Morgan Kaufmann.
Google Scholar
Aha, D. W., & Bankert, R. L. (1994). Feature selection for case-based classification of cloud types: An empirical comparison. In D. W. Aha (Ed.) Case-Based Reasoning: Papers from the 1994 Workshop (TR WS-94-01). Menlo Park, CA: AAAI Press.
Google Scholar
Aha, D. W., & Goldstone, R. L. (1992). Concept learning and flexible weighting. In Proceedings of the Fourteenth Annual Conference of the Cognitive Science Society (pp. 534–539). Bloomington, IN: Lawrence Erlbaum.
Google Scholar
Cain, T., Pazzani, M. J., & Silverstein, G. (1991). Using domain knowledge to influence similarity judgement. In Proceedings of the Case-Based Reasoning Workshop (pp. 191–202). Washington, DC: Morgan Kaufmann.
Google Scholar
Cardie, C. (1993). Using decision trees to improve case-based learning. In Proceedings of the Tenth International Conference on Machine Learning (pp. 25–32). Amherst, MA: Morgan Kaufmann.
Google Scholar
Creecy, R. H., Masand, B. M., Smith, S. J., & Waltz, D. L. (1992). Trading MIPS and memory for knowledge engineering. Communications of the ACM, 35, 48–64.
Google Scholar
Daelemans, W., van den Bosch, A. (1992). Generalization performance of backpropagation learning on a syllabification task. In Proceedings of TWLT3: Connectionism and Natural Language Processing (pp. 27–37). Enschede, The Netherlands: Unpublished.
Google Scholar
Devijver, P. A., & Kittler, J. (1982). Pattern recognition: A statistical approach. Englewood Cliffs, NJ: Prentice-Hall.
Google Scholar
Domingos, P. Context-sensitive feature selection for lazy learners. To appear in Artificial Intelligence Review.
Google Scholar
Fayyad, U. M., & Irani, K. B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (pp. 1022–1029). Chambery, France: Morgan Kaufmann.
Google Scholar
Kelly, J. D., Jr., & Davis, L. (1991). A hybrid genetic algorithm for classification. In Proceedings of the Twelfth International Joint Conference on Artificial Intelligence (pp. 645–650). Sydney, Australia: Morgan Kaufmann.
Google Scholar
Kira, K., & Rendell, L. A. (1992). A practical approach to feature selection. In Proceedings of the Ninth International Conference on Machine Learning (pp. 249–256). Aberdeen, Scotland: Morgan Kaufmann.
Google Scholar
Kohavi, R., Langley, P., & Yun, Y. (1995). Heuristic search for feature weights in instance-based learning. Unpublished manuscript.
Google Scholar
Kononenko, I. (1994). Estimating attributes: Analysis and extensions of RELIEF. In Proceedings of the 1994 European Conference on Machine Learning (pp. 171–182). Catania, Italy: Springer Verlag.
Google Scholar
Lowe, D. (1995). Similarity metric learning for a variable-kernal classifier. Neural Computation, 7, 72–85.
Google Scholar
Mohri, T., & Tanaka, H. (1994). An optimal weighting criterion of case indexing for both numeric and symbolic attributes. In D. W. Aha (Ed.), Case-Based Reasoning: Papers from the 1994 Workshop (TR WS-94-01). Menlo Park, CA: AAAI Press.
Google Scholar
Moore, A. W., & Lee, M. S. (1994). Efficient algorithms for minimizing cross validation error. In Proceedings of the Eleventh International Conference on Machine Learning (pp. 190–198). New Brunswick, NJ: Morgan Kaufmann.
Google Scholar
Murphy, P. (1995). UCI Repository of machine learning databases [Machine-readable data repository @ics.uci.edu]. Irvine, CA: University of California, Department of Information and Computer Science.
Google Scholar
Ricci, F., & Avesani, P. (1995). Learning a local similarity metric for case-based reasoning. To appear in Proceedings of the First International Conference on Case-Based Reasoning. Sesimbra, Portugal: Springer-Verlag.
Google Scholar
Salzberg, S. L. (1991). A nearest hyperrectangle learning method. Machine Learning, 6, 251–276.
Google Scholar
Shannon, C. E. (1948). A mathematical theory of communication. Bell Systems Technology Journal, 27, 379–423.
Google Scholar
Skalak, D. (1994). Prototype and feature selection by sampling and random mutation hill climbing algorithms. In Proceedings of the Eleventh International Machine Learning Conference (pp. 293–301). New Brunswick, NJ: Morgan Kaufmann.
Google Scholar
Stanfill, C., & Waltz, D. (1986). Toward memory-based reasoning. Communications of the ACM, 29, 1213–1228.
Google Scholar
Wettschereck, D. (1994). A study of distance-based machine learning algorithms. Doctoral dissertation, Department of Computer Science, Oregon State University, Corvallis, OR.
Google Scholar
Wettschereck, D., Aha, D. W. & Mohri, T (1995). A review and comparative evaluation of feature weighting methods for lazy learning algorithms (TR AIC-95-012). Washington, DC: Naval Research Laboratory, Navy Center for Applied Research in Artificial Intelligence.
Google Scholar
Wettschereck, D., & Dietterich, T. G. (1995). An experimental comparison of the nearest neighbor and nearest hyperrectangle algorithms. Machine Learning, 19, 5–28.
Google Scholar

Download references

Author information

Authors and Affiliations

German National Research Center for Computer Science, 53754, Sankt Augustin, Germany
Dietrich Wettschereck
Naval Research Laboratory, Navy Center for Applied Research in AI, 20375, Washington, DC, USA
David W. Aha

Authors

Dietrich Wettschereck
View author publications
You can also search for this author in PubMed Google Scholar
David W. Aha
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Manuela Veloso Agnar Aamodt

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wettschereck, D., Aha, D.W. (1995). Weighting features. In: Veloso, M., Aamodt, A. (eds) Case-Based Reasoning Research and Development. ICCBR 1995. Lecture Notes in Computer Science, vol 1010. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60598-3_31

Download citation

DOI: https://doi.org/10.1007/3-540-60598-3_31
Published: 05 August 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60598-0
Online ISBN: 978-3-540-48446-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics