Skip to main content
Log in

Principal component histograms from interval-valued observations

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

The focus of this paper is to propose an approach to construct histogram values for the principal components of interval-valued observations. Le-Rademacher and Billard (J Comput Graph Stat 21:413–432, 2012) show that for a principal component analysis on interval-valued observations, the resulting observations in principal component space are polytopes formed by the convex hulls of linearly transformed vertices of the observed hyper-rectangles. In this paper, we propose an algorithm to translate these polytopes into histogram-valued data to provide numerical values for the principal components to be used as input in further analysis. Other existing methods of principal component analysis for interval-valued data construct the principal components, themselves, as intervals which implicitly assume that all values within an observation are uniformly distributed along the principal components axes. However, this assumption is only true in special cases where the variables in the dataset are mutually uncorrelated. Representation of the principal components as histogram values proposed herein more accurately reflects the variation in the internal structure of the observations in a principal component space. As a consequence, subsequent analyses using histogram-valued principal components as input result in improved accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  • Anderson TW (1963) Asymptotic theory for principal components analysis. Ann Math Stat 34:122–148

    Article  MATH  Google Scholar 

  • Anderson TW (1984) An introduction to multivariate statistical analysis, 2nd edn. Wiley, New York

    MATH  Google Scholar 

  • Bertrand P, Goupil F (2000) Descriptive statistics for symbolic data. In: Bock H-H, Diday E (eds) Analysis of symbolic data: explanatory methods for extracting statistical information from complex data. Springer, Berlin, pp 106–124

    Chapter  Google Scholar 

  • Billard L (2008) Sample covariance functions for complex quantitative data. In: Mizuta M, Nakano J (eds) Proceedings world conference of the international association for statistical computing. Japan, pp 157–163

  • Billard L, Diday E (2003) From the statistics of data to the statistics of knowledge: symbolic data analysis. J Am Stat Assoc 98:470–487

    Article  MathSciNet  Google Scholar 

  • Billard L, Diday E (2006) Symbolic data analysis: conceptual statistics and data mining. Wiley, New York

    Book  Google Scholar 

  • Bock H-H, Diday E (eds) (2000) Analysis of symbolic data: explanatory methods for extracting statistical information from complex data. Springer, Berlin

  • Cazes P (2002) Analyse Factorielle d’un Tableau de Lois de Probabilité. Revue de Statistique Appliquée 50(3):5–24

    MathSciNet  Google Scholar 

  • Cazes P, Chouakria A, Diday E, Schektman Y (1997) Extension de l’Analyse en Composantes Principales à des Données de Type Intervalle. Revue de Statistique Appliquée 45(3):5–24

    Google Scholar 

  • Chouakria A (1998) Extension des Méthodes d’analyse Factorielle a des Données de Type Intervalle. Université Paris, Dauphine, Doctoral Thesis

  • Coppi R, Giordani P, D’Urso P (2006) Component models for fuzzy data. Psychometrika 71:733–761

    Article  MathSciNet  Google Scholar 

  • Davidson KR, Donsig AP (2002) Real analysis with real applications. Prentice Hall, New Jersey

    Google Scholar 

  • Diday E (1987) Introduction à l’Approache Symbolique en Analyse des Données. CEREMADE, Université Paris, Premières Journées Symbolic-Numérique, pp 21–56

  • Douzal-Chouakria A, Billard L, Diday E (2011) Principal component analysis for interval-valued observations. Stat Anal Data Min 4:229–246

    Article  MathSciNet  Google Scholar 

  • Gioia F, Lauro NC (2006) Principal component analysis on interval data. Comput Stat 21:343–363

    Article  MathSciNet  MATH  Google Scholar 

  • Giordani P, Kiers HAL (2004) Principal component analysis of symmetric fuzzy data. Comput Stat Data Anal 45:519–548

    Article  MathSciNet  MATH  Google Scholar 

  • Ichino M (2011) The quantile method for symbolic principal component analysis. Stat Anal Data Min 4:184–198

    Article  MathSciNet  Google Scholar 

  • Irpino A, Lauro NC, Verde R (2003) Visualizing symbolic data by closed shapes. In: Schader M, Gaul W, Vichi M (eds) Between data science and applied data analysis. Springer, Berlin, pp 244–251

    Chapter  Google Scholar 

  • Johnson RA, Wichern DW (2002) Applied multivariate statistical analysis, 5th edn. Prentice Hall, New Jersey

    Google Scholar 

  • Jolliffe IT (2004) Principal component analysis, 2nd edn. Springer, New York

    Google Scholar 

  • Lauro NC, Palumbo F (2000) Principal component analysis of interval data: a symbolic data analysis approach. Comput Stat 15:73–87

    Article  MATH  Google Scholar 

  • Lauro NC, Verde R, Irpino A (2008) Principal component analysis of symbolic data described by intervals. In: Diday E, Noirhomme-Fraiture M (eds) Symbolic data analysis and the SODAS software. Wiley, Chichester, pp 279–311

    Google Scholar 

  • Leroy B, Chouakria A, Herlin I, Diday E (1996) Approche Géométrique et Classification pour la Reconnaissance de Visage. Reconnaissance des Forms et Intelligence Artificelle, INRIA and IRISA and CNRS, France, pp 548–557

  • Le-Rademacher J, Billard L (2012) Symbolic-covariance principal component analysis and visualization for interval-valued data. J Comput Graph Stat 21:413–432

    Article  MathSciNet  Google Scholar 

  • Makosso Kallyth S, Diday E (2010) Analyse en Axes Principaux de Variables Symboliques de Type Histogrammes. Act. XLII Journées de Statistiques, Marseille, France, pp 1–6. http://hal.archives-ouvertes.fr/inria-00494681/

  • Palumbo F, Lauro NC (2003) A PCA for interval-valued data based on midpoints and radii. In: Yanai H, Okada A, Shigemasu K, Kano Y, Meulman J (eds) New developments in psychometrics. Springer, Tokyo, pp 641–648

    Chapter  Google Scholar 

Download references

Acknowledgments

The authors wish to thank the Editor, the Associate Editor, and the referees for their thorough review and thoughful comments. Partial support to both authors from NSF grants is gratefully acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. Le-Rademacher.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Cite this article

Le-Rademacher, J., Billard, L. Principal component histograms from interval-valued observations. Comput Stat 28, 2117–2138 (2013). https://doi.org/10.1007/s00180-013-0399-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-013-0399-4

Keywords

Navigation