Abstract
The local intrinsic dimensionality (LID) model assesses the complexity of data within the vicinity of a query point, through the growth rate of the probability measure within an expanding neighborhood. In this paper, we show how LID is asymptotically related to the entropy of the lower tail of the distribution of distances from the query. We establish tight relationships for cumulative Shannon entropy, entropy power, and their generalized Tsallis entropy variants, all with the potential for serving as the basis for new estimators of LID, or as substitutes for LID-based characterization and feature representations in classification and other learning contexts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amsaleg, L., et al.: The vulnerability of learning to adversarial perturbation increases with intrinsic dimensionality. In: IEEE Workshop on Information Forensics and Security, pp. 1–6 (2017)
Amsaleg, L., et al.: Extreme-value-theoretic estimation of local intrinsic dimensionality. Data Min. Knowl. Disc. 32(6), 1768–1805 (2018)
Amsaleg, L., Chelly, O., Houle, M.E., Kawarabayashi, K., Radovanović, R., Treeratanajaru, W.: Intrinsic dimensionality estimation within tight localities. In: Proceedings of 2019 SIAM International Conference on Data Mining, pp. 181–189 (2019)
Amsaleg, L., et al.: High intrinsic dimensionality facilitates adversarial attack: theoretical evidence. IEEE Trans. Inf. Forensics Secur. 16, 854–865 (2021)
Ansuini, A., Laio, A., Macke, J.H., Zoccolan, D.: Intrinsic dimension of data representations in deep neural networks. In: Advances in Neural Information Processing Systems, pp. 6111–6122 (2019)
Böhm, K., Keller, F., Müller, E., Nguyen, H.V., Vreeken, J.: CMI: an information-theoretic contrast measure for enhancing subspace cluster and outlier detection. In: Proceedings of the 13th SIAM International Conference on Data Mining, pp. 198–206 (2013)
Bruske, J., Sommer, G.: Intrinsic dimensionality estimation with optimally topology preserving maps. IEEE Trans. Pattern Anal. Mach. Intell. 20(5), 572–575 (1998)
Calì, C., Longobardi, M., Ahmadi, J.: Some properties of cumulative Tsallis entropy. Phys. A 486, 1012–1021 (2017)
Camastra, F., Staiano, A.: Intrinsic dimension estimation: advances and open problems. Inf. Sci. 328, 26–41 (2016)
Campadelli, P., Casiraghi, E., Ceruti, C., Lombardi, G., Rozza, A.: Local intrinsic dimensionality based features for clustering. In: International Conference on Image Analysis and Processing, pp. 41–50 (2013)
Campadelli, P., Casiraghi, E., Ceruti, C., Rozza, A.: Intrinsic dimension estimation: relevant techniques and a benchmark framework. Math. Prob. Eng. 2015, 759567 (2015). https://doi.org/10.1155/2015/759567
Carter, K.M., Raich, R., Finn, W.G., Hero, A.O., III.: FINE: fisher information non-parametric embedding. IEEE Trans. Pattern Anal. Mach. Intell. 31(11), 2093–2098 (2009)
Ceruti, C., Bassis, S., Rozza, A., Lombardi, G., Casiraghi, E., Campadelli, P.: DANCo: an intrinsic dimensionality estimator exploiting angle and norm concentration. Pattern Recogn. 47, 2569–2581 (2014)
Coles, S., Bawa, J., Trenner, L., Dorazio, P.: An Introduction to Statistical Modeling of Extreme Values. Springer Series in Statistics, vol. 208, p. 209. Springer, London (2001). https://doi.org/10.1007/978-1-4471-3675-0
Costa, J.A., Hero, A.O., III.: Entropic graphs for manifold learning. In: The 37th Asilomar Conference on Signals, Systems & Computers, vol. 1, pp. 316–320 (2003)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley Series in Telecommunications and Signal Processing, Wiley, USA (2006)
Di Crescenzo, A., Longobardi, M.: On cumulative entropies. J. Stat. Plan. Inference 139(12), 4072–4087 (2009)
Facco, E., d’Errico, M., Rodriguez, A., Laio, A.: Estimating the intrinsic dimension of datasets by a minimal neighborhood information. Sci. Rep. 7, 12140 (2017)
Farahmand, A.M., Szepesvári, C., Audibert, J.Y.: Manifold-adaptive dimension estimation. In: Proceedings of the 24th International Conference on Machine Learning, pp. 265–272 (2007)
Hein, M., Audibert, J.Y.: Intrinsic dimensionality estimation of submanifolds in \(R^d\). In: Proceedings of the 22nd International Conference on Machine Learning, pp. 289–296 (2005)
Hill, B.M.: A simple general approach to inference about the tail of a distribution. Ann. Stat. 3(5), 1163–1174 (1975)
Houle, M.E.: Dimensionality, discriminability, density and distance distributions. In: IEEE 13th International Conference on Data Mining Workshops, pp. 468–473 (2013)
Houle, M.E.: Local intrinsic dimensionality I: an extreme-value-theoretic foundation for similarity applications. In: International Conference on Similarity Search and Applications, pp. 64–79 (2017)
Houle, M.E.: Local intrinsic dimensionality II: multivariate analysis and distributional support. In: International Conference on Similarity Search and Applications, pp. 80–95, (2017)
Houle, M.E., Kashima, H., Nett, M.: Generalized expansion dimension. In: IEEE 12th International Conference on Data Mining Workshops, pp. 587–594 (2012)
Houle, M.E., Ma, X., Nett, M., Oria, V.: Dimensional testing for multi-step similarity search. In: IEEE 12th International Conference on Data Mining, pp. 299–308 (2012)
Houle, M.E., Schubert, E., Zimek, A.: On the correlation between local intrinsic dimensionality and outlierness. In: International Conference on Similarity Search and Applications, pp. 177–191 (2018)
Johnsson, K., Soneson, C., Fontes, M.: Low bias local intrinsic dimension estimation from expected simplex skewness. IEEE TPAMI 37(1), 196–202 (2015)
Jolliffe, I.T.: Principal Component Analysis. Springer Series in Statistics, Springer, New York (2002). https://doi.org/10.1007/b98835
Kambhatla, N., Leen, T.K.: Dimension reduction by local principal component analysis. Neural Comput. 9(7), 1493–1516 (1997)
Karamata, J.: Sur un mode de croissance régulière. Théorèmes fondamentaux. Bull. Soc. Math. Fr. 61, 55–62 (1933)
Karger, D.R., Ruhl, M.: Finding nearest neighbors in growth-restricted metrics. In: Proceedings of the 34th ACM Symposium on Theory of Computing, pp. 741–750 (2002)
Kostal, L., Lansky, P., Pokora, O.: Measures of statistical dispersion based on Shannon and Fisher information concepts. Inf. Sci. 235, 214–223 (2013). https://doi.org/10.1016/j.ins.2013.02.023
Levina, E., Bickel, P.J.: Maximum likelihood estimation of intrinsic dimension. In: Advances in Neural Information Processing Systems, pp. 777–784 (2004)
Rao, M., Chen, Y., Vemuri, B.C., Wang, F.: Cumulative residual entropy: a new measure of information. IEEE Trans. Inf. Theor. 50(6), 1220–1228 (2004)
Ma, X., et al.: Characterizing adversarial subspaces using local intrinsic dimensionality. In: International Conference on Learning Representations, pp. 1–15 (2018)
Ma, X., et al.: Dimensionality-driven learning with noisy labels. In: International Conference on Machine Learning, pp. 3361–3370 (2018)
Navarro, G., Paredes, R., Reyes, N., Bustos, C.: An empirical evaluation of intrinsic dimension estimators. Inf. Syst. 64, 206–218 (2017)
Nguyen, H.V., Mandros, P., Vreeken, J.: Universal dependency analysis. In: Proceedings of the 2016 SIAM International Conference on Data Mining, pp. 792–800 (2016)
Pele, D.T., Lazar, E., Mazurencu-Marinescu-Pele, M.: Modeling expected shortfall using tail entropy. Entropy 21(12), 1204 (2019)
Pettis, K.W., Bailey, T.A., Jain, A.K., Dubes, R.C.: An intrinsic dimensionality estimator from near-neighbor information. IEEE Trans. Pattern Anal. Mach. Intell. 1, 25–37 (1979)
Pope, P., Zhu, C., Abdelkader, A., Goldblum, M., Goldstein, T.: The intrinsic dimension of images and its impact on learning. In: International Conference on Learning Representations (2021)
Ratz, H.C.: Entropy power factors for linear discrete systems. Can. Electr. Eng. J. 8(2), 73–78 (1983)
Rozza, A., Lombardi, G., Ceruti, C., Casiraghi, E., Campadelli, P.: Novel high intrinsic dimensionality estimators. Mach. Learn. 89(1–2), 37–65 (2012)
Rozza, A., Lombardi, G., Rosa, M., Casiraghi, E., Campadelli, P.: IDEA: Intrinsic dimension estimation algorithm. In: International Conference on Image Analysis and Processing, pp. 433–442 (2011)
Stam, A.J.: Some inequalities satisfied by the quantities of information of Fisher and Shannon. Inf. Control 2, 101–112 (1959)
Tsallis, C.: Possible generalization of Boltzmann-Gibbs statistics. J. Stat. Phys. 52, 479–487 (1988)
Verveer, P.J., Duin, R.P.W.: An evaluation of intrinsic dimensionality estimators. IEEE Trans. Pattern Anal. Mach. Intell. 17(1), 81–86 (1995)
Zhou, S., Tordesillas, A., Pouragha, M., Bailey, J., Bondell, H.: On local intrinsic dimensionality of deformation in complex materials. Nat. Sci. Rep. 11, 10216 (2021)
Acknowledgments
Michael E. Houle acknowledges the financial support of JSPS Kakenhi Kiban (B) Research Grant 18H03296.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Bailey, J., Houle, M.E., Ma, X. (2021). Relationships Between Local Intrinsic Dimensionality and Tail Entropy. In: Reyes, N., et al. Similarity Search and Applications. SISAP 2021. Lecture Notes in Computer Science(), vol 13058. Springer, Cham. https://doi.org/10.1007/978-3-030-89657-7_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-89657-7_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89656-0
Online ISBN: 978-3-030-89657-7
eBook Packages: Computer ScienceComputer Science (R0)