Skip to main content

A Novel Intrinsic Dimensionality Estimator Based on Rank-Order Statistics

  • Conference paper
  • First Online:
  • 1107 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7627))

Abstract

In the past two decades the estimation of the intrinsic dimensionality of a dataset has gained considerable importance, since it is a relevant information for several real life applications. Unfortunately, although a great deal of research effort has been devoted to the development of effective intrinsic dimensionality estimators, the problem is still open. For this reason, in this paper we propose a novel robust intrinsic dimensionality estimator that exploits the information conveyed by the normalized nearest neighbor distances, through a technique based on rank-order statistics that limits common underestimation issues related to the edge effect. Experiments performed on both synthetic and real datasets highlight the robustness and the effectiveness of the proposed algorithm when compared to state-of-the-art methodologies.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    By default, capital letters (such as UX) will denote random variables (rv) and small letters (ux) their corresponding realization.

  2. 2.

    The pdf of a random variable X following a Beta distribution with parameters \(\alpha \) and \(\beta \) is defined as \(f_{X}(x|\alpha ,\beta ) = (\mathsf {B}(\alpha ,\beta ))^{-1} x^{\alpha -1}(1-x)^{\beta -1}\) where \(\mathsf {B}\) is the Beta function providing for the normalization factor.

  3. 3.

    The pdf of a random variable Y following an Exponential distribution is defined as \(f_{Y}(y|\lambda ) = \lambda e^{-\lambda y}\).

  4. 4.

    http://www.eecs.umich.edu/~hero/IntrinsicDim/, http://www.stat.lsa.umich.edu/~elevina/mledim.m, http://research.microsoft.com/en-us/um/cambridge/projects/infernet/blogs/bayesianpca.aspx.

  5. 5.

    http://cseweb.ucsd.edu/~lvdmaaten/dr/download.php.

  6. 6.

    Note that, when the true value of the id is not known, we considered the mean value of the range as \(d_{{\varvec{\mathcal {M}}}}\).

References

  1. Bishop, C.M.: Bayesian PCA. In: Proceedings of NIPS 11, pp. 382–388 (1998)

    Google Scholar 

  2. Camastra, F., Filippone, M.: A comparative evaluation of nonlinear dynamics methods for time series prediction. Neural Comput. Appl. 18(8), 1021–1029 (2009)

    Article  Google Scholar 

  3. Camastra, F., Vinciarelli, A.: Estimating the intrinsic dimension of data with a fractal-based method. IEEE Trans. PAMI 24, 1404–1407 (2002)

    Article  Google Scholar 

  4. Carter, K.M., Hero, A.O., Raich, R.: De-biasing for intrinsic dimension estimation. In: IEEE/SP 14th Workshop on Statistical Signal Processing, SSP 2007, pp. 601–605, Aug 2007

    Google Scholar 

  5. Ceruti, C., Bassis, S., Rozza, A., Lombardi, G., Casiraghi, E., Campadelli, P.: Danco: an intrinsic dimensionalty estimator exploiting angle and norm concentration. Elsevier, Pattern Recogn. 47(8), 2569–2581 (2014)

    Article  MATH  Google Scholar 

  6. Chua, L., Komuro, M., Matsumoto, T.: The double scroll. IEEE Trans. Circuits Syst. 32, 797–818 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  7. Costa, J.A., Hero, A.O.: Geodesic entropic graphs for dimension and entropy estimation in manifold learning. IEEE Trans. Sig. Process. 52(8), 2210–2221 (2004)

    Article  MathSciNet  Google Scholar 

  8. Costa, J.A., Hero, A.O.: Learning intrinsic dimension and entropy of high-dimensional shape spaces. In: Proceedings of EUSIPCO (2004)

    Google Scholar 

  9. Costa, J.A., Hero, A.O.: Learning intrinsic dimension and entropy of shapes. In: Statistics and Analysis of Shapes, Birkhauser (2005)

    Google Scholar 

  10. Friedman, J.H., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning - Data Mining, Inference and Prediction. Springer, Berlin (2009)

    MATH  Google Scholar 

  11. Fukunaga, K.: Intrinsic dimensionality extraction. In: Krishnaiah, P.R., Kanal, L.N. (eds.) Classification, Pattern Recognition and Reduction of Dimensionality. North Holland, Amsterdam (1982)

    Google Scholar 

  12. Fukunaga, K., Olsen, D.R.: An algorithm for finding intrinsic dimensionality of data. IEEE Trans. Comput. 20, 176–183 (1971)

    Article  MATH  Google Scholar 

  13. Grassberger, P., Procaccia, I.: Measuring the strangeness of strange attractors. Physica D: Nonlinear Phenom. 9, 189–208 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  14. Guan, Y., Dy, J.G.: Sparse probabilistic principal component analysis. J. Mach. Learn. Res. - Proc. Track 5, 185–192 (2009)

    Google Scholar 

  15. Hein, M.: Intrinsic dimensionality estimation of submanifolds in euclidean space. In: Proceedings of ICML, pp. 289–296 (2005)

    Google Scholar 

  16. Johnson, N.L., Kotz, S., Balakrishnan, N.: Continuous Univariate Distributions, vol. 2. Wiley, New York (1995)

    MATH  Google Scholar 

  17. Jollife, I.T.: Adaptive Control Processes: A Guided Tour. Princeton University Press, Princeton (1961)

    Google Scholar 

  18. Jollife, I.T.: Principal Component Analysis. Springer Series in Statistics. Springer, New York (1986)

    Book  Google Scholar 

  19. Kirby, M.: Geometric Data Analysis: an Empirical Approach to Dimensionality Reduction and the Study of Patterns. Wiley, New York (1998)

    Google Scholar 

  20. Kotz, S., Kozubowski, T.J., Podgórski, K.: Maximum likelihood estimation of asymmetric laplace parameters. Ann. Inst. Stat. Math. 54(4), 816–826 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  21. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)

    Article  Google Scholar 

  22. Levina, E., Bickel, P.J.: Maximum likelihood estimation of intrinsic dimension. In: Proceedings of NIPS 17(1), pp. 777–784 (2005)

    Google Scholar 

  23. Li, J., Tao, D.: Simple exponential family PCA. In: Proceedings of AISTATS, pp. 453–460 (2010)

    Google Scholar 

  24. Lombardi, G., Rozza, A., Ceruti, C., Casiraghi, E., Campadelli, P.: Minimum neighbor distance estimators of intrinsic dimension. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011, Part II. LNCS, vol. 6912, pp. 374–389. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  25. Ott, E.: Chaos in Dynamical Systems. Cambridge University Press, Cambridge (1993)

    MATH  Google Scholar 

  26. Pineda, F.J., Sommerer, J.C.: Estimating generalized dimensions and choosing time delays: A fast algorithm. In: Time Series Prediction. Forecasting the Future and Understanding the Past, pp. 367–385 (1994)

    Google Scholar 

  27. Rozza, A., Lombardi, G., Ceruti, C., Casiraghi, E., Campadelli, P.: Novel high intrinsic dimensionality estimators. Mach. Learn. J. 89(1–2), 37–65 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  28. Rozza, A., Lombardi, G., Rosa, M., Casiraghi, E., Campadelli, P.: IDEA: intrinsic dimension estimation algorithm. In: Maino, G., Foresti, G.L. (eds.) ICIAP 2011, Part I. LNCS, vol. 6978, pp. 433–442. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  29. Tenenbaum, J., Silva, V., Langford, J.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)

    Article  Google Scholar 

  30. Tipping, M.E., Bishop, C.M.: Probabilistic principal component analysis. J. Royal Stat. Soc., Ser. B 61(Pt. 3), 611–622 (1997)

    MathSciNet  MATH  Google Scholar 

  31. Van der Maaten, L.J.P.: An introduction to dimensionality reduction using matlab. Technical report, Delft University of Technology (2007)

    Google Scholar 

  32. Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)

    MATH  Google Scholar 

  33. Verveer, P.J., Duin, R.P.W.: An evaluation of intrinsic dimensionality estimators. IEEE Trans. PAMI 17, 81–86 (1995)

    Article  Google Scholar 

  34. Wilks, S.S.: Mathematical Statistics. Wiley Publications in Statistics. John Wiley, New York (1962)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Rozza .

Editor information

Editors and Affiliations

A Algorithm Implementation

A Algorithm Implementation

In this appendix the pseudocode of our algorithm is reported. In Algorithm 1 \({\texttt {DROS}}\) is shown, where \(kNN({\varvec{X}}_N,{\varvec{x}},k)\) is the procedure that employs a k-nearest neighbor search returning the set of the k ordered nearest neighbors of \({\varvec{x}}\) in \({\varvec{X}}_N\) and their corresponding distances.

figure a

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bassis, S., Rozza, A., Ceruti, C., Lombardi, G., Casiraghi, E., Campadelli, P. (2015). A Novel Intrinsic Dimensionality Estimator Based on Rank-Order Statistics. In: Masulli, F., Petrosino, A., Rovetta, S. (eds) Clustering High--Dimensional Data. CHDD 2012. Lecture Notes in Computer Science(), vol 7627. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48577-4_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-48577-4_7

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-48576-7

  • Online ISBN: 978-3-662-48577-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics