Abstract
Modern data analysis tools have to work on high-dimensional data, whose components are not independently distributed. High-dimensional spaces show surprising, counter-intuitive geometrical properties that have a large influence on the performances of data analysis tools. Among these properties, the concentration of the norm phenomenon results in the fact that Euclidean norms and Gaussian kernels, both commonly used in models, become inappropriate in high-dimensional spaces. This papers presents alternative distance measures and kernels, together with geometrical methods to decrease the dimension of the space. The methodology is applied to a typical time series prediction example.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ramsay, J.O., Silverman, B.W.: Functional Data Analysis. Springer, Heidelberg (1997)
Borggaard, C., Thodberg, H.: Optimal minimal neural interpretation of spectra. Analytical Chemistry 64, 545–551 (1992)
Bellmann, R.: Adaptive Control Processes: A Guided Tour. Princeton University Press, Princeton (1961)
Silverman, B.: Density estimation for statistics and data analysis. Chapman and Hall, Boca Raton (1986)
Demartines, P.: Analyse de donnée par réseaux de neurones auto-organisées. Ph.D. dissertation (in French), Institut National Polytechnique de Grenoble (France) (1994)
Beyer, K.S., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)
Hinneburg, A., Aggarwal, C., Keim, D.: What is the nearest neighbor in high dimensional spaces? The VLDB Journal, 506–515 (2000)
Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional spaces. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 420–434. Springer, Heidelberg (2000)
Francois, D., Wertz, V., Verleysen, M.: Non-euclidean metrics for similarity search in noisy datasets. In: ESANN 2005, European Symposium on Artificial Neural Networks, Bruges (Belgium) (2005) (accepted)
Francois, D., Wertz, V., Verleysen, M.: On the locality of kernels in highdimensional spaces. In: ASMDA 2005, Applied Stochastic Models and Data Analysis, Brest, France (2005) (submitted on invitation)
Kambhatla, N., Leen, T.: Dimension reduction by local principal component analysis. Neural Computation 9, 1493–1516 (1997)
Schölkopf, B., Smola, A., Müller, K.R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation 10, 1299–1319 (1998)
Sammon, J.: A nonlinear mapping for data structure analysis. IEEE Transactions on Computers 18, 401–409 (1969)
Demartines, P., Herault, J.: Curvilinear component analysis: A self-organizing neural network for nonlinear mapping of data sets. IEEE Transaction on Neural Networks 8 (1997)
Lee, J.A., Lendasse, A., Donckers, N., Verleysen, M.: Curvlinear distance analysis versus isomap. In: ESANN 2002 European Symposition on Artificial Neural Networks, Bruges, Belgium, pp. 185–192 (2002)
Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2313 (2000)
Kohonen, T.: Self-Organizing Maps. Springer, Heidelberg (1995)
Kramer, M.: Nonlinear principal component analysis using autoassociative neural networks. AIChE Journal 37, 233 (1991)
Takens, F.: On the numerical determination of the dimension of an attractor. Lectures Notes in Mathematics, vol. 1125, pp. 99–106 (1985)
Grassberger, P., Procaccia, I.: Measuring the strangeness of strange attractors. Physica D 9, 189–208 (1983)
Lendasse, A., Lee, J., Bodt, E.D., Wertz, V., Verleysen, M.: Dimension reduction of technical indicators for the prediction of financial time series - application to the bel 20 market index. European Journal of Economic and Social Systems 15 (2001)
Refenes, A., Burgess, A., Bentz, Y.: Neural networks in financial engineering: A study in methodology. IEEE Transactions on Neural Networks 8, 1222–1267 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Verleysen, M., François, D. (2005). The Curse of Dimensionality in Data Mining and Time Series Prediction. In: Cabestany, J., Prieto, A., Sandoval, F. (eds) Computational Intelligence and Bioinspired Systems. IWANN 2005. Lecture Notes in Computer Science, vol 3512. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11494669_93
Download citation
DOI: https://doi.org/10.1007/11494669_93
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26208-4
Online ISBN: 978-3-540-32106-4
eBook Packages: Computer ScienceComputer Science (R0)