A Novel Intrinsic Dimensionality Estimator Based on Rank-Order Statistics

Bassis, S.; Rozza, A.; Ceruti, C.; Lombardi, G.; Casiraghi, E.; Campadelli, P.

doi:10.1007/978-3-662-48577-4_7

A Novel Intrinsic Dimensionality Estimator Based on Rank-Order Statistics

S. Bassis¹⁶,
A. Rozza¹⁷,
C. Ceruti¹⁶,
G. Lombardi¹⁶,
E. Casiraghi¹⁶ &
…
P. Campadelli¹⁶

Conference paper
First Online: 25 November 2015

1107 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7627))

Abstract

In the past two decades the estimation of the intrinsic dimensionality of a dataset has gained considerable importance, since it is a relevant information for several real life applications. Unfortunately, although a great deal of research effort has been devoted to the development of effective intrinsic dimensionality estimators, the problem is still open. For this reason, in this paper we propose a novel robust intrinsic dimensionality estimator that exploits the information conveyed by the normalized nearest neighbor distances, through a technique based on rank-order statistics that limits common underestimation issues related to the edge effect. Experiments performed on both synthetic and real datasets highlight the robustness and the effectiveness of the proposed algorithm when compared to state-of-the-art methodologies.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
By default, capital letters (such as U, X) will denote random variables (rv) and small letters (u, x) their corresponding realization.
2.
The pdf of a random variable X following a Beta distribution with parameters \(\alpha \) and \(\beta \) is defined as \(f_{X}(x|\alpha ,\beta ) = (\mathsf {B}(\alpha ,\beta ))^{-1} x^{\alpha -1}(1-x)^{\beta -1}\) where \(\mathsf {B}\) is the Beta function providing for the normalization factor.
3.
The pdf of a random variable Y following an Exponential distribution is defined as \(f_{Y}(y|\lambda ) = \lambda e^{-\lambda y}\).
4.
http://www.eecs.umich.edu/~hero/IntrinsicDim/, http://www.stat.lsa.umich.edu/~elevina/mledim.m, http://research.microsoft.com/en-us/um/cambridge/projects/infernet/blogs/bayesianpca.aspx.
5.
http://cseweb.ucsd.edu/~lvdmaaten/dr/download.php.
6.
Note that, when the true value of the id is not known, we considered the mean value of the range as \(d_{{\varvec{\mathcal {M}}}}\).

References

Bishop, C.M.: Bayesian PCA. In: Proceedings of NIPS 11, pp. 382–388 (1998)
Google Scholar
Camastra, F., Filippone, M.: A comparative evaluation of nonlinear dynamics methods for time series prediction. Neural Comput. Appl. 18(8), 1021–1029 (2009)
Article Google Scholar
Camastra, F., Vinciarelli, A.: Estimating the intrinsic dimension of data with a fractal-based method. IEEE Trans. PAMI 24, 1404–1407 (2002)
Article Google Scholar
Carter, K.M., Hero, A.O., Raich, R.: De-biasing for intrinsic dimension estimation. In: IEEE/SP 14th Workshop on Statistical Signal Processing, SSP 2007, pp. 601–605, Aug 2007
Google Scholar
Ceruti, C., Bassis, S., Rozza, A., Lombardi, G., Casiraghi, E., Campadelli, P.: Danco: an intrinsic dimensionalty estimator exploiting angle and norm concentration. Elsevier, Pattern Recogn. 47(8), 2569–2581 (2014)
Article MATH Google Scholar
Chua, L., Komuro, M., Matsumoto, T.: The double scroll. IEEE Trans. Circuits Syst. 32, 797–818 (1985)
Article MathSciNet MATH Google Scholar
Costa, J.A., Hero, A.O.: Geodesic entropic graphs for dimension and entropy estimation in manifold learning. IEEE Trans. Sig. Process. 52(8), 2210–2221 (2004)
Article MathSciNet Google Scholar
Costa, J.A., Hero, A.O.: Learning intrinsic dimension and entropy of high-dimensional shape spaces. In: Proceedings of EUSIPCO (2004)
Google Scholar
Costa, J.A., Hero, A.O.: Learning intrinsic dimension and entropy of shapes. In: Statistics and Analysis of Shapes, Birkhauser (2005)
Google Scholar
Friedman, J.H., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning - Data Mining, Inference and Prediction. Springer, Berlin (2009)
MATH Google Scholar
Fukunaga, K.: Intrinsic dimensionality extraction. In: Krishnaiah, P.R., Kanal, L.N. (eds.) Classification, Pattern Recognition and Reduction of Dimensionality. North Holland, Amsterdam (1982)
Google Scholar
Fukunaga, K., Olsen, D.R.: An algorithm for finding intrinsic dimensionality of data. IEEE Trans. Comput. 20, 176–183 (1971)
Article MATH Google Scholar
Grassberger, P., Procaccia, I.: Measuring the strangeness of strange attractors. Physica D: Nonlinear Phenom. 9, 189–208 (1983)
Article MathSciNet MATH Google Scholar
Guan, Y., Dy, J.G.: Sparse probabilistic principal component analysis. J. Mach. Learn. Res. - Proc. Track 5, 185–192 (2009)
Google Scholar
Hein, M.: Intrinsic dimensionality estimation of submanifolds in euclidean space. In: Proceedings of ICML, pp. 289–296 (2005)
Google Scholar
Johnson, N.L., Kotz, S., Balakrishnan, N.: Continuous Univariate Distributions, vol. 2. Wiley, New York (1995)
MATH Google Scholar
Jollife, I.T.: Adaptive Control Processes: A Guided Tour. Princeton University Press, Princeton (1961)
Google Scholar
Jollife, I.T.: Principal Component Analysis. Springer Series in Statistics. Springer, New York (1986)
Book Google Scholar
Kirby, M.: Geometric Data Analysis: an Empirical Approach to Dimensionality Reduction and the Study of Patterns. Wiley, New York (1998)
Google Scholar
Kotz, S., Kozubowski, T.J., Podgórski, K.: Maximum likelihood estimation of asymmetric laplace parameters. Ann. Inst. Stat. Math. 54(4), 816–826 (2002)
Article MathSciNet MATH Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)
Article Google Scholar
Levina, E., Bickel, P.J.: Maximum likelihood estimation of intrinsic dimension. In: Proceedings of NIPS 17(1), pp. 777–784 (2005)
Google Scholar
Li, J., Tao, D.: Simple exponential family PCA. In: Proceedings of AISTATS, pp. 453–460 (2010)
Google Scholar
Lombardi, G., Rozza, A., Ceruti, C., Casiraghi, E., Campadelli, P.: Minimum neighbor distance estimators of intrinsic dimension. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011, Part II. LNCS, vol. 6912, pp. 374–389. Springer, Heidelberg (2011)
Chapter Google Scholar
Ott, E.: Chaos in Dynamical Systems. Cambridge University Press, Cambridge (1993)
MATH Google Scholar
Pineda, F.J., Sommerer, J.C.: Estimating generalized dimensions and choosing time delays: A fast algorithm. In: Time Series Prediction. Forecasting the Future and Understanding the Past, pp. 367–385 (1994)
Google Scholar
Rozza, A., Lombardi, G., Ceruti, C., Casiraghi, E., Campadelli, P.: Novel high intrinsic dimensionality estimators. Mach. Learn. J. 89(1–2), 37–65 (2012)
Article MathSciNet MATH Google Scholar
Rozza, A., Lombardi, G., Rosa, M., Casiraghi, E., Campadelli, P.: IDEA: intrinsic dimension estimation algorithm. In: Maino, G., Foresti, G.L. (eds.) ICIAP 2011, Part I. LNCS, vol. 6978, pp. 433–442. Springer, Heidelberg (2011)
Chapter Google Scholar
Tenenbaum, J., Silva, V., Langford, J.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)
Article Google Scholar
Tipping, M.E., Bishop, C.M.: Probabilistic principal component analysis. J. Royal Stat. Soc., Ser. B 61(Pt. 3), 611–622 (1997)
MathSciNet MATH Google Scholar
Van der Maaten, L.J.P.: An introduction to dimensionality reduction using matlab. Technical report, Delft University of Technology (2007)
Google Scholar
Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
MATH Google Scholar
Verveer, P.J., Duin, R.P.W.: An evaluation of intrinsic dimensionality estimators. IEEE Trans. PAMI 17, 81–86 (1995)
Article Google Scholar
Wilks, S.S.: Mathematical Statistics. Wiley Publications in Statistics. John Wiley, New York (1962)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Informatica, Università degli Studi di Milano, via Comelico 39-41, 20135, Milano, Italy
S. Bassis, C. Ceruti, G. Lombardi, E. Casiraghi & P. Campadelli
Research Team, Hyera Software, Via Mattei 2, Coccaglio (BS), Italy
A. Rozza

Authors

S. Bassis
View author publications
You can also search for this author in PubMed Google Scholar
A. Rozza
View author publications
You can also search for this author in PubMed Google Scholar
C. Ceruti
View author publications
You can also search for this author in PubMed Google Scholar
G. Lombardi
View author publications
You can also search for this author in PubMed Google Scholar
E. Casiraghi
View author publications
You can also search for this author in PubMed Google Scholar
P. Campadelli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. Rozza .

Editor information

Editors and Affiliations

DIBRIS, University of Genoa DIBRIS, Genoa, Italy
Francesco Masulli
University of Naples "Parthenope", Naples, Italy
Alfredo Petrosino
DIBRIS, University of Genoa, Genoa, Italy
Stefano Rovetta

A Algorithm Implementation

In this appendix the pseudocode of our algorithm is reported. In Algorithm 1 \({\texttt {DROS}}\) is shown, where \(kNN({\varvec{X}}_N,{\varvec{x}},k)\) is the procedure that employs a k-nearest neighbor search returning the set of the k ordered nearest neighbors of \({\varvec{x}}\) in \({\varvec{X}}_N\) and their corresponding distances.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bassis, S., Rozza, A., Ceruti, C., Lombardi, G., Casiraghi, E., Campadelli, P. (2015). A Novel Intrinsic Dimensionality Estimator Based on Rank-Order Statistics. In: Masulli, F., Petrosino, A., Rovetta, S. (eds) Clustering High--Dimensional Data. CHDD 2012. Lecture Notes in Computer Science(), vol 7627. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48577-4_7

Download citation

DOI: https://doi.org/10.1007/978-3-662-48577-4_7
Published: 25 November 2015
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-48576-7
Online ISBN: 978-3-662-48577-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Abstract

Buying options

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Algorithm Implementation

A Algorithm Implementation

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation