Abstract
In the past two decades the estimation of the intrinsic dimensionality of a dataset has gained considerable importance, since it is a relevant information for several real life applications. Unfortunately, although a great deal of research effort has been devoted to the development of effective intrinsic dimensionality estimators, the problem is still open. For this reason, in this paper we propose a novel robust intrinsic dimensionality estimator that exploits the information conveyed by the normalized nearest neighbor distances, through a technique based on rank-order statistics that limits common underestimation issues related to the edge effect. Experiments performed on both synthetic and real datasets highlight the robustness and the effectiveness of the proposed algorithm when compared to state-of-the-art methodologies.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
By default, capital letters (such as U, X) will denote random variables (rv) and small letters (u, x) their corresponding realization.
- 2.
The pdf of a random variable X following a Beta distribution with parameters \(\alpha \) and \(\beta \) is defined as \(f_{X}(x|\alpha ,\beta ) = (\mathsf {B}(\alpha ,\beta ))^{-1} x^{\alpha -1}(1-x)^{\beta -1}\) where \(\mathsf {B}\) is the Beta function providing for the normalization factor.
- 3.
The pdf of a random variable Y following an Exponential distribution is defined as \(f_{Y}(y|\lambda ) = \lambda e^{-\lambda y}\).
- 4.
- 5.
- 6.
Note that, when the true value of the id is not known, we considered the mean value of the range as \(d_{{\varvec{\mathcal {M}}}}\).
References
Bishop, C.M.: Bayesian PCA. In: Proceedings of NIPS 11, pp. 382–388 (1998)
Camastra, F., Filippone, M.: A comparative evaluation of nonlinear dynamics methods for time series prediction. Neural Comput. Appl. 18(8), 1021–1029 (2009)
Camastra, F., Vinciarelli, A.: Estimating the intrinsic dimension of data with a fractal-based method. IEEE Trans. PAMI 24, 1404–1407 (2002)
Carter, K.M., Hero, A.O., Raich, R.: De-biasing for intrinsic dimension estimation. In: IEEE/SP 14th Workshop on Statistical Signal Processing, SSP 2007, pp. 601–605, Aug 2007
Ceruti, C., Bassis, S., Rozza, A., Lombardi, G., Casiraghi, E., Campadelli, P.: Danco: an intrinsic dimensionalty estimator exploiting angle and norm concentration. Elsevier, Pattern Recogn. 47(8), 2569–2581 (2014)
Chua, L., Komuro, M., Matsumoto, T.: The double scroll. IEEE Trans. Circuits Syst. 32, 797–818 (1985)
Costa, J.A., Hero, A.O.: Geodesic entropic graphs for dimension and entropy estimation in manifold learning. IEEE Trans. Sig. Process. 52(8), 2210–2221 (2004)
Costa, J.A., Hero, A.O.: Learning intrinsic dimension and entropy of high-dimensional shape spaces. In: Proceedings of EUSIPCO (2004)
Costa, J.A., Hero, A.O.: Learning intrinsic dimension and entropy of shapes. In: Statistics and Analysis of Shapes, Birkhauser (2005)
Friedman, J.H., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning - Data Mining, Inference and Prediction. Springer, Berlin (2009)
Fukunaga, K.: Intrinsic dimensionality extraction. In: Krishnaiah, P.R., Kanal, L.N. (eds.) Classification, Pattern Recognition and Reduction of Dimensionality. North Holland, Amsterdam (1982)
Fukunaga, K., Olsen, D.R.: An algorithm for finding intrinsic dimensionality of data. IEEE Trans. Comput. 20, 176–183 (1971)
Grassberger, P., Procaccia, I.: Measuring the strangeness of strange attractors. Physica D: Nonlinear Phenom. 9, 189–208 (1983)
Guan, Y., Dy, J.G.: Sparse probabilistic principal component analysis. J. Mach. Learn. Res. - Proc. Track 5, 185–192 (2009)
Hein, M.: Intrinsic dimensionality estimation of submanifolds in euclidean space. In: Proceedings of ICML, pp. 289–296 (2005)
Johnson, N.L., Kotz, S., Balakrishnan, N.: Continuous Univariate Distributions, vol. 2. Wiley, New York (1995)
Jollife, I.T.: Adaptive Control Processes: A Guided Tour. Princeton University Press, Princeton (1961)
Jollife, I.T.: Principal Component Analysis. Springer Series in Statistics. Springer, New York (1986)
Kirby, M.: Geometric Data Analysis: an Empirical Approach to Dimensionality Reduction and the Study of Patterns. Wiley, New York (1998)
Kotz, S., Kozubowski, T.J., Podgórski, K.: Maximum likelihood estimation of asymmetric laplace parameters. Ann. Inst. Stat. Math. 54(4), 816–826 (2002)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)
Levina, E., Bickel, P.J.: Maximum likelihood estimation of intrinsic dimension. In: Proceedings of NIPS 17(1), pp. 777–784 (2005)
Li, J., Tao, D.: Simple exponential family PCA. In: Proceedings of AISTATS, pp. 453–460 (2010)
Lombardi, G., Rozza, A., Ceruti, C., Casiraghi, E., Campadelli, P.: Minimum neighbor distance estimators of intrinsic dimension. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011, Part II. LNCS, vol. 6912, pp. 374–389. Springer, Heidelberg (2011)
Ott, E.: Chaos in Dynamical Systems. Cambridge University Press, Cambridge (1993)
Pineda, F.J., Sommerer, J.C.: Estimating generalized dimensions and choosing time delays: A fast algorithm. In: Time Series Prediction. Forecasting the Future and Understanding the Past, pp. 367–385 (1994)
Rozza, A., Lombardi, G., Ceruti, C., Casiraghi, E., Campadelli, P.: Novel high intrinsic dimensionality estimators. Mach. Learn. J. 89(1–2), 37–65 (2012)
Rozza, A., Lombardi, G., Rosa, M., Casiraghi, E., Campadelli, P.: IDEA: intrinsic dimension estimation algorithm. In: Maino, G., Foresti, G.L. (eds.) ICIAP 2011, Part I. LNCS, vol. 6978, pp. 433–442. Springer, Heidelberg (2011)
Tenenbaum, J., Silva, V., Langford, J.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)
Tipping, M.E., Bishop, C.M.: Probabilistic principal component analysis. J. Royal Stat. Soc., Ser. B 61(Pt. 3), 611–622 (1997)
Van der Maaten, L.J.P.: An introduction to dimensionality reduction using matlab. Technical report, Delft University of Technology (2007)
Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
Verveer, P.J., Duin, R.P.W.: An evaluation of intrinsic dimensionality estimators. IEEE Trans. PAMI 17, 81–86 (1995)
Wilks, S.S.: Mathematical Statistics. Wiley Publications in Statistics. John Wiley, New York (1962)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Algorithm Implementation
A Algorithm Implementation
In this appendix the pseudocode of our algorithm is reported. In Algorithm 1 \({\texttt {DROS}}\) is shown, where \(kNN({\varvec{X}}_N,{\varvec{x}},k)\) is the procedure that employs a k-nearest neighbor search returning the set of the k ordered nearest neighbors of \({\varvec{x}}\) in \({\varvec{X}}_N\) and their corresponding distances.
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bassis, S., Rozza, A., Ceruti, C., Lombardi, G., Casiraghi, E., Campadelli, P. (2015). A Novel Intrinsic Dimensionality Estimator Based on Rank-Order Statistics. In: Masulli, F., Petrosino, A., Rovetta, S. (eds) Clustering High--Dimensional Data. CHDD 2012. Lecture Notes in Computer Science(), vol 7627. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48577-4_7
Download citation
DOI: https://doi.org/10.1007/978-3-662-48577-4_7
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-48576-7
Online ISBN: 978-3-662-48577-4
eBook Packages: Computer ScienceComputer Science (R0)