Skip to main content

The Curse of Dimensionality in Data Mining and Time Series Prediction

  • Conference paper
Computational Intelligence and Bioinspired Systems (IWANN 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3512))

Included in the following conference series:

Abstract

Modern data analysis tools have to work on high-dimensional data, whose components are not independently distributed. High-dimensional spaces show surprising, counter-intuitive geometrical properties that have a large influence on the performances of data analysis tools. Among these properties, the concentration of the norm phenomenon results in the fact that Euclidean norms and Gaussian kernels, both commonly used in models, become inappropriate in high-dimensional spaces. This papers presents alternative distance measures and kernels, together with geometrical methods to decrease the dimension of the space. The methodology is applied to a typical time series prediction example.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ramsay, J.O., Silverman, B.W.: Functional Data Analysis. Springer, Heidelberg (1997)

    MATH  Google Scholar 

  2. Borggaard, C., Thodberg, H.: Optimal minimal neural interpretation of spectra. Analytical Chemistry 64, 545–551 (1992)

    Article  Google Scholar 

  3. Bellmann, R.: Adaptive Control Processes: A Guided Tour. Princeton University Press, Princeton (1961)

    Google Scholar 

  4. Silverman, B.: Density estimation for statistics and data analysis. Chapman and Hall, Boca Raton (1986)

    MATH  Google Scholar 

  5. Demartines, P.: Analyse de donnée par réseaux de neurones auto-organisées. Ph.D. dissertation (in French), Institut National Polytechnique de Grenoble (France) (1994)

    Google Scholar 

  6. Beyer, K.S., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  7. Hinneburg, A., Aggarwal, C., Keim, D.: What is the nearest neighbor in high dimensional spaces? The VLDB Journal, 506–515 (2000)

    Google Scholar 

  8. Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional spaces. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 420–434. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  9. Francois, D., Wertz, V., Verleysen, M.: Non-euclidean metrics for similarity search in noisy datasets. In: ESANN 2005, European Symposium on Artificial Neural Networks, Bruges (Belgium) (2005) (accepted)

    Google Scholar 

  10. Francois, D., Wertz, V., Verleysen, M.: On the locality of kernels in highdimensional spaces. In: ASMDA 2005, Applied Stochastic Models and Data Analysis, Brest, France (2005) (submitted on invitation)

    Google Scholar 

  11. Kambhatla, N., Leen, T.: Dimension reduction by local principal component analysis. Neural Computation 9, 1493–1516 (1997)

    Article  Google Scholar 

  12. Schölkopf, B., Smola, A., Müller, K.R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation 10, 1299–1319 (1998)

    Article  Google Scholar 

  13. Sammon, J.: A nonlinear mapping for data structure analysis. IEEE Transactions on Computers 18, 401–409 (1969)

    Article  Google Scholar 

  14. Demartines, P., Herault, J.: Curvilinear component analysis: A self-organizing neural network for nonlinear mapping of data sets. IEEE Transaction on Neural Networks 8 (1997)

    Google Scholar 

  15. Lee, J.A., Lendasse, A., Donckers, N., Verleysen, M.: Curvlinear distance analysis versus isomap. In: ESANN 2002 European Symposition on Artificial Neural Networks, Bruges, Belgium, pp. 185–192 (2002)

    Google Scholar 

  16. Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2313 (2000)

    Google Scholar 

  17. Kohonen, T.: Self-Organizing Maps. Springer, Heidelberg (1995)

    Google Scholar 

  18. Kramer, M.: Nonlinear principal component analysis using autoassociative neural networks. AIChE Journal 37, 233 (1991)

    Article  Google Scholar 

  19. Takens, F.: On the numerical determination of the dimension of an attractor. Lectures Notes in Mathematics, vol. 1125, pp. 99–106 (1985)

    Google Scholar 

  20. Grassberger, P., Procaccia, I.: Measuring the strangeness of strange attractors. Physica D 9, 189–208 (1983)

    Article  MATH  MathSciNet  Google Scholar 

  21. Lendasse, A., Lee, J., Bodt, E.D., Wertz, V., Verleysen, M.: Dimension reduction of technical indicators for the prediction of financial time series - application to the bel 20 market index. European Journal of Economic and Social Systems 15 (2001)

    Google Scholar 

  22. Refenes, A., Burgess, A., Bentz, Y.: Neural networks in financial engineering: A study in methodology. IEEE Transactions on Neural Networks 8, 1222–1267 (1997)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Verleysen, M., François, D. (2005). The Curse of Dimensionality in Data Mining and Time Series Prediction. In: Cabestany, J., Prieto, A., Sandoval, F. (eds) Computational Intelligence and Bioinspired Systems. IWANN 2005. Lecture Notes in Computer Science, vol 3512. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11494669_93

Download citation

  • DOI: https://doi.org/10.1007/11494669_93

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26208-4

  • Online ISBN: 978-3-540-32106-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics