The Curse of Dimensionality in Data Mining and Time Series Prediction

Verleysen, Michel; François, Damien

doi:10.1007/11494669_93

Michel Verleysen¹⁹ &
Damien François²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3512))

Included in the following conference series:

International Work-Conference on Artificial Neural Networks

3678 Accesses
258 Citations
7 Altmetric

Abstract

Modern data analysis tools have to work on high-dimensional data, whose components are not independently distributed. High-dimensional spaces show surprising, counter-intuitive geometrical properties that have a large influence on the performances of data analysis tools. Among these properties, the concentration of the norm phenomenon results in the fact that Euclidean norms and Gaussian kernels, both commonly used in models, become inappropriate in high-dimensional spaces. This papers presents alternative distance measures and kernels, together with geometrical methods to decrease the dimension of the space. The methodology is applied to a typical time series prediction example.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ramsay, J.O., Silverman, B.W.: Functional Data Analysis. Springer, Heidelberg (1997)
MATH Google Scholar
Borggaard, C., Thodberg, H.: Optimal minimal neural interpretation of spectra. Analytical Chemistry 64, 545–551 (1992)
Article Google Scholar
Bellmann, R.: Adaptive Control Processes: A Guided Tour. Princeton University Press, Princeton (1961)
Google Scholar
Silverman, B.: Density estimation for statistics and data analysis. Chapman and Hall, Boca Raton (1986)
MATH Google Scholar
Demartines, P.: Analyse de donnée par réseaux de neurones auto-organisées. Ph.D. dissertation (in French), Institut National Polytechnique de Grenoble (France) (1994)
Google Scholar
Beyer, K.S., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)
Chapter Google Scholar
Hinneburg, A., Aggarwal, C., Keim, D.: What is the nearest neighbor in high dimensional spaces? The VLDB Journal, 506–515 (2000)
Google Scholar
Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional spaces. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 420–434. Springer, Heidelberg (2000)
Chapter Google Scholar
Francois, D., Wertz, V., Verleysen, M.: Non-euclidean metrics for similarity search in noisy datasets. In: ESANN 2005, European Symposium on Artificial Neural Networks, Bruges (Belgium) (2005) (accepted)
Google Scholar
Francois, D., Wertz, V., Verleysen, M.: On the locality of kernels in highdimensional spaces. In: ASMDA 2005, Applied Stochastic Models and Data Analysis, Brest, France (2005) (submitted on invitation)
Google Scholar
Kambhatla, N., Leen, T.: Dimension reduction by local principal component analysis. Neural Computation 9, 1493–1516 (1997)
Article Google Scholar
Schölkopf, B., Smola, A., Müller, K.R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation 10, 1299–1319 (1998)
Article Google Scholar
Sammon, J.: A nonlinear mapping for data structure analysis. IEEE Transactions on Computers 18, 401–409 (1969)
Article Google Scholar
Demartines, P., Herault, J.: Curvilinear component analysis: A self-organizing neural network for nonlinear mapping of data sets. IEEE Transaction on Neural Networks 8 (1997)
Google Scholar
Lee, J.A., Lendasse, A., Donckers, N., Verleysen, M.: Curvlinear distance analysis versus isomap. In: ESANN 2002 European Symposition on Artificial Neural Networks, Bruges, Belgium, pp. 185–192 (2002)
Google Scholar
Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2313 (2000)
Google Scholar
Kohonen, T.: Self-Organizing Maps. Springer, Heidelberg (1995)
Google Scholar
Kramer, M.: Nonlinear principal component analysis using autoassociative neural networks. AIChE Journal 37, 233 (1991)
Article Google Scholar
Takens, F.: On the numerical determination of the dimension of an attractor. Lectures Notes in Mathematics, vol. 1125, pp. 99–106 (1985)
Google Scholar
Grassberger, P., Procaccia, I.: Measuring the strangeness of strange attractors. Physica D 9, 189–208 (1983)
Article MATH MathSciNet Google Scholar
Lendasse, A., Lee, J., Bodt, E.D., Wertz, V., Verleysen, M.: Dimension reduction of technical indicators for the prediction of financial time series - application to the bel 20 market index. European Journal of Economic and Social Systems 15 (2001)
Google Scholar
Refenes, A., Burgess, A., Bentz, Y.: Neural networks in financial engineering: A study in methodology. IEEE Transactions on Neural Networks 8, 1222–1267 (1997)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Machine Learning Group, Universit’e catholique de Louvain, Place du Levant, 3, 1380, Louvain-la-Neuve, Belgium
Michel Verleysen
Machine Learning Group, Universit’e catholique de Louvain, Avenue G. Lemaitre, 4, 1380, Louvain-la-Neuve, Belgium
Damien François

Authors

Michel Verleysen
View author publications
You can also search for this author in PubMed Google Scholar
Damien François
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Departamento de Ingeniería Electrónica, Universitat Politècnica de Catalunya (UPC). E.T.S.I. de Telecomunicación, Campus Norte, Edificio C4, C/ Jordi Girona, 1-3, E08034, Barcelona, Spain
Joan Cabestany
Department of Computer Architecture and Computer Technology, University of Granada,
Alberto Prieto
Grupo ISIS, Dpto. Tecnología Electrónica ETSI Telecomunicación, Universidad de Málaga, Campus de Teatinos, 29071, Málaga, Spain
Francisco Sandoval

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Verleysen, M., François, D. (2005). The Curse of Dimensionality in Data Mining and Time Series Prediction. In: Cabestany, J., Prieto, A., Sandoval, F. (eds) Computational Intelligence and Bioinspired Systems. IWANN 2005. Lecture Notes in Computer Science, vol 3512. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11494669_93

Download citation

DOI: https://doi.org/10.1007/11494669_93
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26208-4
Online ISBN: 978-3-540-32106-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics