Abstract
One the earliest challenges a practitioner is faced with when using distance-based tools lies in the choice of the distance, for which there often is very few information to rely on. This chapter proposes to find a compromise between an a priori unoptimized choice (e.g. the Euclidean distance) and a fully-optimized, but computationally expensive, choice made by means of some resampling method. The compromise is found by choosing distance definition according to the results obtained with a very simple regression model – that is one which has few or no meta-parameters – and then use that distance in some other, more elaborate regression model. The rationale behind this heuristic is that the similarity measure which best reflects the notion of similarity with respect to the application should be the optimal one whatever model is used for classification or regression. This idea is tested against nine datasets and five prediction models. The results show that this approach is a reasonable compromise between the default choice and a fully-optimized choice of the metric.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)
Park, J., Sandberg, I.W.: Universal approximation using radial basis function networks. Neural Computations 3, 246–257 (1991)
Aha, D., Kibler, D., Albert, M.: Instance-based learning algorithms. Machine Learning 6, 37–66 (1991)
Deza, M.-M., Deza, E.: Dictionary of Distances. Elsevier Science, Amsterdam (2006)
François, D.: High-dimensional data analysis: from optimal metrics to feature selection. VDM Verlag Dr. Muller (2008)
Battiti, R.: Using the mutual information for selecting features in supervised neural net learning. IEEE Transactions on Neural Networks 5, 537–550 (1994)
Pfahringer, B., Bensusan, H., Giraud-Carrier, C.: Meta-learning by landmarking various learning algorithms. In: Proceedings of the Seventeenth International Conference on Machine Learning, ICML 2000, pp. 743–750. Morgan Kaufmann, San Francisco (2000)
Navarro, G.: A guided tour to approximate string matching. ACM Computing Surveys 33(1), 31–88 (2001)
Yen, L., Saerens, M., Mantrach, A., Shimbo, M.: A family of dissimilarity measures between nodes generalizing both the shortest-path and the commute-time distances. In: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008, pp. 785–793. ACM, New York (2008)
François, D., Wertz, V., Verleysen, M.: The concentration of fractional distances. IEEE Transactions on Knowledge and Data Engineering 19(7), 873–886 (2007)
Moody, J.E., Darken, C.: Fast learning in networks of locally-tuned processing units. Neural Computation 1, 281–294 (1989)
Orr, M.J.L.: Regularisation in the selection of radial basis function centres. Neural Computation 7(3), 606–623 (1995)
Suykens, J., Van Gestel, T., De Brabanter, J., De Moor, B., Vandewalle, J.: Least Squares Support Vector Machines. World Scientific, Singapore (2002)
Yu, K., Ji, L., Zhang, X.: Kernel nearest-neighbor algorithm. Neural Processing Letters 15(2), 147–156 (2002)
Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional space. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 420–434. Springer, Heidelberg (2000)
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97(1-2), 273–324 (1997)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
Stefánsson, A., Koncar, N., Jones, A.J.: A note on the gamma test. Neural Computing & Applications 5(3), 131–133 (1997)
Reyhani, N., Hao, J., Ji, Y., Lendasse, A.: Mutual information and gamma test for input selection. In: European Symposium on Artificial Neural Networks, ESANN 2005, Bruges, Belgium, April 27-29, pp. 503–508 (2005)
Berchtold, S., Bohm, C., Keim, D.A., Kriegel, H.-P.: A cost model for nearest neighbor search in high-dimensional data space. In: 16th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), Tucson, Arizona, USA, May 12-14, pp. 78–86. ACM Press, New York (1997)
Friedman, J.: Multivariate adaptive regression splines (with discussion). Annals of Statistics 9(1), 1–141 (1991)
Borggaard, C., Thodberg, H.H.: Optimal minimal neural interpretation of spectra. Analytical Chemistry 64, 545–551 (1992)
Asuncion, A., Newman, D.J.: UCI machine learning repository. School of Information and Computer Sciences. University of California, Irvine (2007)
Ong, C.S., Mary, X., Canu, S., Smola, A.J.: Learning with non-positive kernels. In: Proceedings of the Twenty-First International Conference on Machine Learning, ICML 2004, p. 81. ACM Press, New York (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
François, D., Wertz, V., Verleysen, M. (2011). Choosing the Metric: A Simple Model Approach. In: Jankowski, N., Duch, W., Gra̧bczewski, K. (eds) Meta-Learning in Computational Intelligence. Studies in Computational Intelligence, vol 358. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20980-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-20980-2_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20979-6
Online ISBN: 978-3-642-20980-2
eBook Packages: EngineeringEngineering (R0)