Abstract
Given a finite set of random samples from a smooth Riemannian manifold embedded in ℝd, two important questions are: what is the intrinsic dimension of the manifold and what is the entropy of the underlying sampling distribution on the manifold? These questions naturally arise in the study of shape spaces generated by images or signals for the purposes of shape classification, shape compression, and shape reconstruction. This chapter is concerned with two simple estimators of dimension and entropy based on the lengths of the geodesic minimal spanning tree (GMST) and the k-nearest neighbor (k-NN) graph. We provide proofs of strong consistency of these estimators under weak assumptions of compactness of the manifold and boundedness of the Lebesgue sampling density supported on the manifold. We illustrate these estimators on the MNIST database of handwritten digits.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6):1373–1396, June 2003.
M. Bernstein, V. de Silva, J.C. Langford, and J.B. Tenenbaum. Graph approx-imations to geodesics on embedded manifolds. Technical report, Department of Psychology, Stanford University, Palo Alto, CA, 2000.
F. Camastra and A. Vinciarelli. Estimating the intrinsic dimension of data with a fractal-based method. IEEE Trans. on Pattern Analysis and Machine Intelligence, 24(10):1404–1407, October 2002.
M. Carmo. Riemannian Geometry. Birkhäuser, Boston, 1992.
J.A. Costa and A.O. Hero. Entropic graphs for manifold learning. In Pro-ceedings of the IEEE Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, November 2003.
J.A. Costa and A.O. Hero. Geodesic entropic graphs for dimension and entropy estimation in manifold learning. IEEE Trans. on Signal Processing, 52(8):2210–2221, August 2004.
D. Donoho and C. Grimes. Hessian eigenmaps: locally linear embedding tech-niques for high dimensional data. Proc. Nat. Acad. of Sci., 100(10):5591–5596, 2003.
H. Edelsbrummer, M. Facello, and J. Liang. On the definition and the construc-tion of pockets on macromolecules. Discrete Applied Math., 88:83–102, 1998.
A. Hero, B. Ma, O. Michel, and J. Gorman. Applications of entropic spanning graphs. IEEE Signal Processing Magazine, 19(5):85–95, October 2002.
A.K. Jain and R.C. Dubes. Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs, NJ, 1988.
B. Kégl. Intrinsic dimension estimation using packing numbers. In Neural Information Processing Systems: NIPS, Vancouver, Canada, December 2002.
M. Kirby. Geometric Data Analysis: An Empirical Approach to Dimensionality Reduction and the Study of Patterns. Wiley-Interscience, 2001.
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning ap-plied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, No-vember 1998.
E. Levina and P. Bickel. Maximum likelihood estimation of intrinsic dimen-sion. In Neural Information Processing Systems: NIPS, Vancouver, Canada, December 2004.
F. Mémoli and G. Sapiro. Distance functions and geodesic distances on point clouds. to appear in SIAM Journal of Applied Math., 2005. (Tech. Rep. 1902, IMA, University of Minnesota, Mineapolis).
H. Neemuchwala, A.O. Hero, and P. Carson. Image registration using entropy measures and entropic graphs. European Journal of Signal Processing, Special Issue on Content-based Visual Information Retrieval, 85(2):277–296, 2005.
M. Penrose. A strong law for the largest nearest-neighbour link between random points. J. London Math. Soc., 60(2):951–960, 1999.
M. Penrose and J. Yukich. Weak laws of large numbers in geometric probability. Annals of Applied Probability, 13(1):277–303, 2003.
S. Roweis and L. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(1):2323–2326, 2000.
J.B. Tenenbaum, V. deSilva, and J.C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290:2319–2323, 2000.
K. Weinberger and L. Saul. Unsupervised learning of image manifolds by semi-definite programming. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Washington D.C., 2004.
J.E. Yukich. Probability Theory of Classical Euclidean Optimization Problems, volume 1675 of Lecture Notes in Mathematics. Springer-Verlag, Berlin, 1998.
Z. Zang and H. Zha. Principal manifolds and nonlinear dimension reduction via local tangent space alignment. SIAM Journal of Scientific Computing, 26(1):313–338, 2004.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Birkhäuser Boston
About this chapter
Cite this chapter
Costa, J.A., Hero, A.O. (2006). Determining Intrinsic Dimension and Entropy of High-Dimensional Shape Spaces. In: Krim, H., Yezzi, A. (eds) Statistics and Analysis of Shapes. Modeling and Simulation in Science, Engineering and Technology. Birkhäuser Boston. https://doi.org/10.1007/0-8176-4481-4_9
Download citation
DOI: https://doi.org/10.1007/0-8176-4481-4_9
Publisher Name: Birkhäuser Boston
Print ISBN: 978-0-8176-4376-8
Online ISBN: 978-0-8176-4481-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)