Summary
Principal curves where introduced by Hastie & Stuetzle (1989) as smooth parametric curves passing through the middle of a multidimensional data set. Delicado (2001) defines Principal Curves of Oriented Points, based on the fixed points of a function from ℝp into itself. This definition is nonparametric and smoothing methods are used to find principal curves of a data set. Here we extend this work in two directions. First, we propose a bandwidth choice method based on the Minimum Spanning Tree of the data set. Second, we present an object oriented application that implements the principal curves computation for any dimension in a flexible recursive way. Examples on synthetic and real data are included.





Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Avram, F. & Bertsimas, D. (1992), ‘The minimum spanning tree constant in geometrical probability and under the independent model: a unified approach’, The Annals of Applied Probability 2(1), 113–130.
Avram, F. & Bertsimas, D. (1993), ‘On central limit theorems in geometrical probability’, The Annals of Applied Probability 3(4), 1033–1046.
Banfield, J. D. & Raftery, A. E. (1992), ‘Ice floe identification in satellite images using mathematical morphology and clustering about principal curves’, Journal of the American Statistical Association 87, 7–16.
Beardwood, J., Halton, H. J. & Hammersley, J. M. (1959), ‘The shortest path through many points’, Proc. Cambridge Philos. Soc. 55, 299–327.
Bishop, C. M. & Tipping, M. E. (1998), ‘A hierarchical latent variable model for data visualization’, IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 281–293.
Bishop, C., Svensén, M. & Williams, C. K. I. (1998), ‘GTM: The generative topographic mapping’, Neural Computation 10(1), 215–234.
Caroni, C. & Prescott, P. (1995), ‘On Rohlf’s method for the detection of outliers in multivariate data’, Journal of Multivariate Analysis 52, 295–307.
Chang, K. & Ghosh, J. (2001), ‘A unified model for probabilistic principal surfaces’, IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 22–41.
Delicado, P. (2001), ‘Another look at principal curves and surfaces’, Journal of Multivariate Analysis 77, 84–116.
Delicado, P. & Huerta, M. (2002), Principal Curves of Oriented Points: Theoretical and computational improvements, Technical Report DR 2002/06, EIO, UPC. (Available at https://doi.org/www-eio.upc.es/~delicado/PCOP/).
Dong, D. & McAvoy, T. J. (1996), ‘Nonlinear principal component analysis based on principal curves and neural networks’, Computers Chem. Engng. 20, 65–78.
Friedman, J. H. (1991), ‘Multivariate adaptive regression splines’, The Annals of Statistics 19, 1–141. (With discussion).
Greenacre, M. J. (1993), Correspondence analysis in practice, Academic Press.
Guggenheimer, H. W. (1977), Differential Geometry, Dover Publications.
Hastie, T. (1984), Principal curves and surfaces, Laboratory for Computational Statistics Technical Report 11, Stanford University, Dept. of Statistics.
Hastie, T. & Stuetzle, W. (1989), ‘Principal curves’, Journal of the American Statistical Association 84, 502–516.
IPC (2000), ‘International Data Base’, International Programs Center, U.S. Census Bureau. https://doi.org/www.census.gov/ipc/www/idbnew.html.
Kégl, B., Krzyzak, A., Linder, T. & Zeger, K. (2000), ‘Learning and design of principal curves’, IEEE Trans. Pattern Analysis and Machine Intelligence 22(3), 281–297.
LeBlanc, M. & Tibshirani, R. J. (1994), ‘Adaptive principal surfaces’, Journal of the American Statistical Association 89, 53–64.
Mulier, F. & Cherkassky, V. (1995), ‘Self-organization as an iterative kernel smoothing process’, Neural Computation 7(6), 1165–1177.
Rohlf, F. J. (1975), ‘Generalization of the gap test for the detection of multivariate outliers’, Biometrics 31, 93–102.
Sandilya, S. & Kulkarni, S. R. (2000), Principal curves with bounded turn, in ‘IEEE International Symposium on Information Theory’, p. 321.
Scott, D. (1992), Multivariate Density Estimation, Wiley.
Simonoff, J. (1995), Smoothing Methods in Statistics, Springer, New York.
Smola, A. J., Mika, S., Schölkopf, B. & Williamson, R. C. (2001), ‘Regularized principal manifolds’, Journal of Machine Learning Research 1, 179–209.
Steele, M. (1988), ‘Growth rates of euclidean minimal spanning trees with power weighted edges’, The Annals of Probability 16(4), 1767–1787.
Steele, M. (1993), ‘Probability and problems in euclidean combinatorial optimization’, Statistical Science 8(1), 48–56.
Tan, S. & Mavarovouniotis, M. L. (1995), ‘Reducing data dimensionality through optimizing neural network inputs’, AIChE Journal 41, 1471–1480.
Tarpey, T. & Flury, B. (1996), ‘Self-consistency: A fundamental concept in statistics’, Statistical Science 11, 229–243.
Tibshirani, R. J. (1992), ‘Principal curves revisited’, Statistics and Computing 2, 183–190.
Tipping, M. E. & Bishop, C. M. (1999), ‘Probabilistic principal component analysis’, Journal of the Royal Statistical Society, Series B, Methodological 61, 611–622.
Verbeek, J., Vlassis, N. & Krse, B. (2002), ‘A k-segments algorithm for finding principal curves’, Pattern Recognition Letters 23, 1009–1017.
Wand, M. & Jones, M. (1995), Kernel Smoothing, Chapman and Hall, London.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was partially supported by the Spanish DGES grants PB98-0919 and BFM 2001-2327, and by the European Commission project HPCF CT-2000-00041. We are grateful to Valentín Navarro who helps us to pre-process population pyramids data set. The authors would like to thank two anonymous referees for their helpful comments and suggestions. Running head: Principal Curves of Oriented Points.
Rights and permissions
About this article
Cite this article
Delicado, P., Huerta, M. Principal Curves of Oriented Points: theoretical and computational improvements. Computational Statistics 18, 293–315 (2003). https://doi.org/10.1007/s001800300145
Published:
Issue Date:
DOI: https://doi.org/10.1007/s001800300145