Principal component analysis for probabilistic symbolic data: a more generic and accurate algorithm

Chen, Meiling; Wang, Huiwen; Qin, Zhongfeng

doi:10.1007/s11634-014-0178-2

Principal component analysis for probabilistic symbolic data: a more generic and accurate algorithm

Regular Article
Published: 11 July 2014

Volume 9, pages 59–79, (2015)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Meiling Chen¹,
Huiwen Wang¹ &
Zhongfeng Qin¹

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

In the symbolic data framework, probabilistic symbolic data are considered as those whose components are random variables with general probability distributions. Intervals (or uniform distributions), histograms (or empirical distributions), Gaussian distribution and Chi-squared distribution are all the special cases of them. The existing approaches devoted to the subject have a common shortcoming since they can not obtain the distributions of linear combinations (i.e., principal components) of random variables especially for not identically distributed ones. This paper will overcome the shortcoming by providing an exact probability density function for each principal component by using the inversion theorem. Further, the paper defines a covariance matrix for probabilistic symbolic data and presents a new principal component analysis based on this variance–covariance structure. The effectiveness of the proposed method is illustrated by a simulated numerical experiment, and two real-life cases including clustering of oils and fats data, and evaluation of indexed journals of Science Citation Index.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Principal component analysis for histogram-valued data

Article 26 May 2016

Clustering of modal-valued symbolic data

Article 24 October 2020

Symbolic Approach to the General Quadratic Polynomial Decomposition

Article 05 March 2018

References

Aitchison J (1982) The statistical analysis of compositional data. J R Stat Soc Ser B (Methodological) 44(2):139–177
Aitchison J (1986) The statistical analysis of compositional data. Springer, Dordrecht
Book MATH Google Scholar
Billard L, Diday E (2003) From the statistics of data to the statistics of knowledge: symbolic data analysis. J Am Stat Assoc 98(462):470–487
Article MathSciNet Google Scholar
Billard L, Diday E (2006) Symbolic data analysis: conceptual statistics and data mining. Wiley, Chichester
Book Google Scholar
Bock HH, Diday E (2000) Analysis of symbolic data: exploratory methods for extracting statistical information from complex data. Springer, New York
Book Google Scholar
Cazes P (2002) Analyse factorielle d’un tableau de lois de probabilité. Revue de Statistique Appliquée 50(3):5–24
MathSciNet Google Scholar
Cazes P, Chouakria A, Diday E, Schektrman Y (1997) Entension de l’analyse en composantes principales à des données de type intervalle. Revue de Statistique Appliquée 45(3):5–24
Google Scholar
Cazes P, Chouakria A, Diday E (2000) Symbolic principal components analysis. In: Bock HH, Diday E (eds) Analysis of symbolic data. Springer, New York, pp 200–212
Google Scholar
Chouakria A, Diday E, Cazes P (1998) Vertices principal components analysis with an improved factorial representation. In: Rizzi A, Vichi M, Bock HH (eds) Advances in data science and classification. Springer, Berlin, Heidelberg, pp 397–402
Chapter Google Scholar
Diday E (1987) The symbolic approach in clustering and relating methods of data analysis: the basic choices. In: Conference of the International Federation of Classification Societies, pp 673–684
Diday E (1995) Probabilist, possibilist and belief objects for knowledge analysis. Ann Oper Res 55(2):225–276
Article Google Scholar
Diday E, Noirhomme-Fraiture M (2008) Symbolic data analysis and the SODAS software. Wiley-Interscience, Chichester
MATH Google Scholar
Diday E, Vrac M (2005) Mixture decomposition of distributions by copulas in the symbolic data analysis framework. Discrete Appl Math 147(1):27–41
Article MATH MathSciNet Google Scholar
Douzal-Chouakria A, Billard L, Diday E (2011) Principal component analysis for interval-valued observations. Stat Anal Data Min 4(2):229–246
Article MathSciNet Google Scholar
D’Urso P, Giordani P (2004) A least squares approach to principal component analysis for interval valued data. Chemometr Intell Lab Syst 70(2):179–192
Article Google Scholar
Ferraty F, Vieu P (2006) Nonparametric functional data analysis: theory and practice. Springer, New York
Google Scholar
Gioia F, Lauro CN (2006) Principal component analysis on interval data. Comput Stat 21(2):343–363
Article MATH MathSciNet Google Scholar
Ichino M (2011) The quantile method for symbolic principal component analysis. Stat Anal Data Min 4(2):184–198
Article MathSciNet Google Scholar
Ichino M, Yaguchi H (1994) Generalized Minkowski metrics for mixed feature-type data analysis. Syst Man Cybernet IEEE Trans 24(4):698–708
Article MathSciNet Google Scholar
Irpino A, Verde R (2011) Basic statistics for probabilistic symbolic variables: a novel metric-based approach. arXiv:1110.2295 [statME]
Lauro CN, Verde R, Palumbo F (2000) Factorial data analysis on symbolic objects under cohesion constrains. Springer, Berlin
Google Scholar
Le-Rademacher J, Billard L (2012) Symbolic covariance principal component analysis and visualization for interval-valued data. J Comput Graph Stat 21(2):413–432
Article MathSciNet Google Scholar
Makosso-Kallyth S, Diday E (2012) Adaptation of interval PCA to symbolic histogram variables. Adv Data Anal Classif 6(2):147–159
Article MATH MathSciNet Google Scholar
Malerba D, Esposito F, Monopoli M (2002) Comparing dissimilarity measures for probabilistic symbolic objects. Data Min III Ser Manag Inf Syst 6:31–40
Google Scholar
Nagabhushan P, Kumar RP (2007) Histogram PCA. In: Liu D et al (eds) Advances in Neural Networks—ISNN 2007, vol 4492. Springer, Berlin, Heidelberg, pp 1012–1021
Nagabhushan P, Chidananda Gowda K, Diday E (1995) Dimensionality reduction of symbolic data. Pattern Recogn Lett 16(2):219–223
Palumbo F, Lauro CN (2003) A PCA for interval-valued data based on midpoints and radii. In: Yanai H et al (eds) New developments in psychometric. Springer, Tokyo, pp 641–648
Pawlowsky-Glahn V, Buccianti A (2011) Compositional data analysis: theory and applications. Wiley, Chichester
Book Google Scholar
Ramsay J (1982) When the data are functions. Psychometrika 47(4):379–396
Article MATH MathSciNet Google Scholar
Ramsay J (2005) Functional data analysis. Springer, New York
Google Scholar
Rodrıguez O, Diday E, Winsberg S (2000) Generalization of the principal components analysis to histogram data. In: Workshop on simbolic data analysis of the 4th European Conference on principles and practice of knowledge discovery in data bases, Setiembre, pp 12–16
Verde R, Irpino A (2009) New statistics for new data: a proposal for comparing multivalued numerical data. Stat Appl 21(2):185–206
Google Scholar
Wang H, Chen M, Li N, Wang L (2011) Principal component analysis of modal interval-valued data with constant numerical characteristics. In: The 58th World Statistics Congress of the International Statistical Institute. Ireland, Dublin. http://www.2011.isiproceedings.org/papers/950719.pdf
Wang H, Guan R, Wu J (2012) Cipca: complete-information-based principal component analysis for interval-valued data. Neurocomputing 86:158–169
Article Google Scholar

Download references

Acknowledgments

The authors are grateful to the Editor and anonymous reviewers for their insightful comments which have helped to improve the quality of this paper. This work was supported by the National Natural Science Foundation of China (Grant Nos. 71031001, 70771004, 71371019), and the Program for New Century Excellent Talents in University, by Ministry of Education of China (Grant No. NCET-12-0026).

Author information

Authors and Affiliations

School of Economics and Management, Beihang University, Beijng , 100191, China
Meiling Chen, Huiwen Wang & Zhongfeng Qin

Authors

Meiling Chen
View author publications
You can also search for this author in PubMed Google Scholar
Huiwen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhongfeng Qin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhongfeng Qin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, M., Wang, H. & Qin, Z. Principal component analysis for probabilistic symbolic data: a more generic and accurate algorithm. Adv Data Anal Classif 9, 59–79 (2015). https://doi.org/10.1007/s11634-014-0178-2

Download citation

Received: 12 March 2013
Revised: 12 June 2014
Accepted: 16 June 2014
Published: 11 July 2014
Issue Date: March 2015
DOI: https://doi.org/10.1007/s11634-014-0178-2

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Principal component analysis for probabilistic symbolic data: a more generic and accurate algorithm

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Principal component analysis for histogram-valued data

Clustering of modal-valued symbolic data

Symbolic Approach to the General Quadratic Polynomial Decomposition

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now