Choosing the Metric: A Simple Model Approach

François, Damien; Wertz, Vincent; Verleysen, Michel

doi:10.1007/978-3-642-20980-2_3

Choosing the Metric: A Simple Model Approach

Damien François³,
Vincent Wertz³ &
Michel Verleysen⁴

Chapter

1214 Accesses
5 Citations

Part of the book series: Studies in Computational Intelligence ((SCI,volume 358))

Abstract

One the earliest challenges a practitioner is faced with when using distance-based tools lies in the choice of the distance, for which there often is very few information to rely on. This chapter proposes to find a compromise between an a priori unoptimized choice (e.g. the Euclidean distance) and a fully-optimized, but computationally expensive, choice made by means of some resampling method. The compromise is found by choosing distance definition according to the results obtained with a very simple regression model – that is one which has few or no meta-parameters – and then use that distance in some other, more elaborate regression model. The rationale behind this heuristic is that the similarity measure which best reflects the notion of similarity with respect to the application should be the optimal one whatever model is used for classification or regression. This idea is tested against nine datasets and five prediction models. The results show that this approach is a reasonable compromise between the default choice and a fully-optimized choice of the metric.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)
Google Scholar
Park, J., Sandberg, I.W.: Universal approximation using radial basis function networks. Neural Computations 3, 246–257 (1991)
Article Google Scholar
Aha, D., Kibler, D., Albert, M.: Instance-based learning algorithms. Machine Learning 6, 37–66 (1991)
Google Scholar
Deza, M.-M., Deza, E.: Dictionary of Distances. Elsevier Science, Amsterdam (2006)
Google Scholar
François, D.: High-dimensional data analysis: from optimal metrics to feature selection. VDM Verlag Dr. Muller (2008)
Google Scholar
Battiti, R.: Using the mutual information for selecting features in supervised neural net learning. IEEE Transactions on Neural Networks 5, 537–550 (1994)
Article Google Scholar
Pfahringer, B., Bensusan, H., Giraud-Carrier, C.: Meta-learning by landmarking various learning algorithms. In: Proceedings of the Seventeenth International Conference on Machine Learning, ICML 2000, pp. 743–750. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Navarro, G.: A guided tour to approximate string matching. ACM Computing Surveys 33(1), 31–88 (2001)
Article Google Scholar
Yen, L., Saerens, M., Mantrach, A., Shimbo, M.: A family of dissimilarity measures between nodes generalizing both the shortest-path and the commute-time distances. In: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008, pp. 785–793. ACM, New York (2008)
Chapter Google Scholar
François, D., Wertz, V., Verleysen, M.: The concentration of fractional distances. IEEE Transactions on Knowledge and Data Engineering 19(7), 873–886 (2007)
Article Google Scholar
Moody, J.E., Darken, C.: Fast learning in networks of locally-tuned processing units. Neural Computation 1, 281–294 (1989)
Article Google Scholar
Orr, M.J.L.: Regularisation in the selection of radial basis function centres. Neural Computation 7(3), 606–623 (1995)
Article Google Scholar
Suykens, J., Van Gestel, T., De Brabanter, J., De Moor, B., Vandewalle, J.: Least Squares Support Vector Machines. World Scientific, Singapore (2002)
Book MATH Google Scholar
Yu, K., Ji, L., Zhang, X.: Kernel nearest-neighbor algorithm. Neural Processing Letters 15(2), 147–156 (2002)
Article MATH Google Scholar
Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional space. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 420–434. Springer, Heidelberg (2000)
Chapter Google Scholar
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97(1-2), 273–324 (1997)
Article MATH Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
Article MATH Google Scholar
Stefánsson, A., Koncar, N., Jones, A.J.: A note on the gamma test. Neural Computing & Applications 5(3), 131–133 (1997)
Article Google Scholar
Reyhani, N., Hao, J., Ji, Y., Lendasse, A.: Mutual information and gamma test for input selection. In: European Symposium on Artificial Neural Networks, ESANN 2005, Bruges, Belgium, April 27-29, pp. 503–508 (2005)
Google Scholar
Berchtold, S., Bohm, C., Keim, D.A., Kriegel, H.-P.: A cost model for nearest neighbor search in high-dimensional data space. In: 16th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), Tucson, Arizona, USA, May 12-14, pp. 78–86. ACM Press, New York (1997)
Chapter Google Scholar
Friedman, J.: Multivariate adaptive regression splines (with discussion). Annals of Statistics 9(1), 1–141 (1991)
Article Google Scholar
Borggaard, C., Thodberg, H.H.: Optimal minimal neural interpretation of spectra. Analytical Chemistry 64, 545–551 (1992)
Article Google Scholar
Asuncion, A., Newman, D.J.: UCI machine learning repository. School of Information and Computer Sciences. University of California, Irvine (2007)
Google Scholar
Ong, C.S., Mary, X., Canu, S., Smola, A.J.: Learning with non-positive kernels. In: Proceedings of the Twenty-First International Conference on Machine Learning, ICML 2004, p. 81. ACM Press, New York (2004)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Machine Learning Group, ICTEAM/INMA, Université catholique de Louvain, av. G. Lemaître 4, B-1358, Louvain-la-Neuve, Belgium
Damien François & Vincent Wertz
Machine Learning Group, ICTEAM/ELEN, Université catholique de Louvain, Pl. du Levant, 3, B-1358, Louvain-la-Neuve, Belgium
Michel Verleysen

Authors

Damien François
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Wertz
View author publications
You can also search for this author in PubMed Google Scholar
Michel Verleysen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics, Nicolaus Copernicus University, ul. Grudzia̧dzka 5, 87-100, Toruń, Poland
Norbert Jankowski , Włodzisław Duch & Krzysztof Gra̧bczewski , &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

François, D., Wertz, V., Verleysen, M. (2011). Choosing the Metric: A Simple Model Approach. In: Jankowski, N., Duch, W., Gra̧bczewski, K. (eds) Meta-Learning in Computational Intelligence. Studies in Computational Intelligence, vol 358. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20980-2_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-20980-2_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20979-6
Online ISBN: 978-3-642-20980-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics