Abstract
Symbolic Data Analysis (SDA) aims to to describe and analyze complex and structured data extracted, for example, from large databases. Such data, which can be expressed as concepts, are modeled by symbolic objects described by multivalued variables. In the present paper we present a new distance, based on the Wasserstein metric, in order to cluster a set of data described by distributions with finite continue support, or, as called in SDA, by “histograms”. The proposed distance permits us to define a measure of inertia of data with respect to a barycenter that satisfies the Huygens theorem of decomposition of inertia. We propose to use this measure for an agglomerative hierarchical clustering of histogram data based on the Ward criterion. An application to real data validates the procedure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
AITCHISON, J. (1986): The Statistical Analysis of Compositional Data, New York: Chapman Hall.
BOCK, H.H. and DIDAY, E. (2000): Analysis of Symbolic Data, Exploratory methods for extracting statistical information from complex data, Studies in Classification, Data Analysis and Knowledge Organisation, Springer-Verlag.
BILLARD, L., DIDAY, E. (2003): From the Statistics of Data to the Statistics of Knowledge: Symbolic Data Analysis Journal of the American Statistical Association, 98, 462, 470–487.
CHAVENT, M., DE CARVALHO, F.A.T., LECHEVALLIER, Y., and VERDE, R. (2003): Trois nouvelles méthodes de classification automatique des données symbolique de type intervalle, Revue de Statistique Appliquée, LI, 4, 5–29.
GIBBS, A.L. and SU, F.E. (2002): On choosing and bounding probability metrics, International Statistical Review, 70, 419.
IRPINO, A. and VERDE, R.(2005): A New Distance for Symbolic Data Clustering, CLADAG 2005, Book of short papers, MUP, 393–396.
MALLOWS, C. L. (1972): A note on asymptotic joint normality. Annals of Mathematical Statistics, 43(2), 508–515.
WARD, J.H. (1963): Hierarchical Grouping to Optimize an Objective Function, Journal of the American Statistical Association, vol. 58, 238–244.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin · Heidelberg
About this paper
Cite this paper
Irpino, A., Verde, R. (2006). A New Wasserstein Based Distance for the Hierarchical Clustering of Histogram Symbolic Data. In: Batagelj, V., Bock, HH., Ferligoj, A., Žiberna, A. (eds) Data Science and Classification. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg . https://doi.org/10.1007/3-540-34416-0_20
Download citation
DOI: https://doi.org/10.1007/3-540-34416-0_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34415-5
Online ISBN: 978-3-540-34416-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)