Dealing with Distances and Transformations for Fuzzy C-Means Clustering of Compositional Data

Palarea-Albaladejo, Javier; Martín-Fernández, Josep Antoni; Soto, Jesús A.

doi:10.1007/s00357-012-9105-4

Dealing with Distances and Transformations for Fuzzy C-Means Clustering of Compositional Data

Published: 30 May 2012

Volume 29, pages 144–169, (2012)
Cite this article

Journal of Classification Aims and scope Submit manuscript

Javier Palarea-Albaladejo¹,
Josep Antoni Martín-Fernández² &
Jesús A. Soto³

588 Accesses
55 Citations
Explore all metrics

Abstract

Clustering techniques are based upon a dissimilarity or distance measure between objects and clusters. This paper focuses on the simplex space, whose elements—compositions—are subject to non-negativity and constant-sum constraints. Any data analysis involving compositions should fulfill two main principles: scale invariance and subcompositional coherence. Among fuzzy clustering methods, the FCM algorithm is broadly applied in a variety of fields, but it is not well-behaved when dealing with compositions. Here, the adequacy of different dissimilarities in the simplex, together with the behavior of the common log-ratio transformations, is discussed in the basis of compositional principles. As a result, a well-founded strategy for FCM clustering of compositions is suggested. Theoretical findings are accompanied by numerical evidence, and a detailed account of our proposal is provided. Finally, a case study is illustrated using a nutritional data set known in the clustering literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Dongkuan Xu & Yingjie Tian

Data clustering: application and trends

Article 27 November 2022

Gbeminiyi John Oyewole & George Alex Thopil

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

Article Open access 06 November 2019

Najat Ali, Daniel Neagu & Paul Trundle

References

AITCHISON, J. (1986), The Statistical Analysis of Compositional Data, London: Chapman & Hall, reprinted in 2003 by Blackburn Press.
Book MATH Google Scholar
AITCHISON, J. (1992), “On Criteria for Measures of Compositional Difference,” Mathematical Geology, 24, 365–379.
Article MathSciNet MATH Google Scholar
AITCHISON, J., BARCELÓ-VIDAL, C., MARTÍN-FERNÁNDEZ, J.A., and PAWLOWSKY-GLAHN, V. (2000), “Logratio Analysis and Compositional Distance,” Mathematical Geology, 32, 271–275.
Article MATH Google Scholar
AITCHISON, J., and GREENACRE, M. (2002), “Biplots for Compositional Data,” Journal of the Royal Statistical Society, Series C, 51, 375–392.
Article MathSciNet MATH Google Scholar
BAXTER, M.J., and FREESTONE, I.C. (2006), “Log-ratio Compositional Data Analysis in Archeometry,” Archaeometry, 48, 511–531.
Article Google Scholar
BERGET, I., MEVIK, B-H., and NAES, T. (2008), “New Modifications and Applications of Fuzzy C-Means Methodology,” Computational Statistics & Data Analysis, 52, 2403–2418.
Article MathSciNet MATH Google Scholar
BEZDEK, J. (1981), Pattern Recognition with Fuzzy Objective Function Algorithms, New York: Plenum Press.
Book MATH Google Scholar
BILLHEIMER, D., GUTTORP, P., and FAGAN, W. (2001), “Statistical Interpretation of Species Composition,” Journal of the American Statistical Association, 96, 1205–1214.
Article MathSciNet MATH Google Scholar
CHACÓN, J.E., MATEU-FIGUERAS, G., and MARTÍN-FERNÁNDEZ, J.A. (2011), “Gaussian Kernels for Density Estimation with Compositional Data,” Computers & Geosciences, 37, 702–711.
Article Google Scholar
DESARBO, W.S., RAMASWAMY, V., and LENK, P. (1993), “A Latent Class Procedure for the Structural Analysis of Two-Way Compositional Data,” Journal of Classification, 10, 159–193.
Article MATH Google Scholar
DÖRING, C., LESOT, M-J., and KRUSE, R. (2006), “Data Analysis with Fuzzy Clustering Methods,” Computational Statistics & Data Analysis, 51, 192–214.
Article MathSciNet MATH Google Scholar
EGOZCUE, J.J., PAWLOWSKY-GLAHN, V., MATEU-FIGUERAS, G., and BARCELÓ-VIDAL, C. (2003), “Isometric Logratio Transformations for Compositional Data Analysis,” Mathematical Geology, 35, 279–300.
Article MathSciNet Google Scholar
EGOZCUE, J.J., and PAWLOWSKY-GLAHN, V. (2005), “CoDa-Dendrogram: A New Exploratory Tool,” in Proceedings of the Second Compositional Data Analysis Workshop - CoDaWork’05, Girona, Spain.
GABRIEL, K.R. (1971), “The Biplot Graphic Display of Matrices with Application to Principal Component Analysis,” Biometrika, 58, 453–467.
Article MathSciNet MATH Google Scholar
GAVIN, D.G., OSWALD, W.W., WAHL, E.R., and WILLIAMS, J.W. (2003), “A Statistical Approach to Evaluating Distance Metrics and Analog Assignments for Pollen Records,” Quaternary Research, 60, 356–367.
Article Google Scholar
GREENACRE, M. (1988), “Clustering the Rows and Columns of a Contingency Table,” Journal of Classification, 5, 39–51.
Article MathSciNet MATH Google Scholar
HARTIGAN, J.A. (1975), Clustering Algorithms, New York: Wiley & Sons.
MATH Google Scholar
HÖPPNER, F., KLAWONN, F., KRUSE, R., and RUNKLER, T. (1999), Fuzzy Cluster Analysis: Methods for Classification, Data analysis, and Image Recognition, Chichester: John Wiley & Sons.
MATH Google Scholar
LEGENDRE, P., and GALLAGHER, E.D. (2001), “Ecologically Meaningful Transformations for Ordination of Species Data,” Oecologia, 129, 271–280.
Article Google Scholar
MARTÍN, M.C. (1996), “Performance of Eight Dissimilarity Coefficients to Cluster a Compositional Data Set,” in Abstracts of the Fifth Conference of International Federation of Classification Societies (Vol. 1), Kobe, Japan, pp. 215–217.
MARTÍN-FERNÁNDEZ, J.A., BREN, M., BARCELÓ-VIDAL, C., and PAWLOWSKYGLAHN, V. (1999), “A Measure of Difference for Compositional Data Based On Measures of Divergence,” in Proceedings of the Fifth Annual Conference of the International Assotiation for Mathematical Geology (Vol. 1), Trondheim, Norway, pp. 211–215.
MARTÍN-FERNÁNDEZ, J.A., BARCELÓ-VIDAL, C., and PAWLOWSKY-GLAHN, V. (2003), “Dealing with Zeros and Missing Values in Compositional Data Sets,” Mathematical Geology, 35, 253–278.
Article Google Scholar
MILLER, W.E. (2002), “Revisiting the Geometry of a Ternary Diagram with the Half-Taxi Metric,” Mathematical Geology, 34, 275–290.
Article MathSciNet MATH Google Scholar
PALAREA-ALBALADEJO, J., MARTÍN-FERNÁNDEZ, J.A., and GÓMEZ-GARCÍA, J. (2007), “A Parametric Approach for Dealing with Compositional Rounded Zeros,” Mathematical Geology, 39, 625–645.
Article MATH Google Scholar
PALAREA-ALBALADEJO, J., and MARTÍN-FERNÁNDEZ, J.A. (2008), “A Modified EM alr-Algorithm for Replacing Rounded Zeros in Compositional Data Sets,” Computers & Geosciences, 34, 902–917.
Article Google Scholar
PAWLOWSKY-GLAHN, V., and EGOZCUE, J.J. (2001), “Geometric Approach to Statistical Analysis on the Simplex,” Stochastic Environmental Research and Risk Assessment, 15, 384–398.
Article MATH Google Scholar
PAWLOWSKY-GLAHN, V. (2003), “Statistical Modelling on Coordinates,” in Proceedings of the First Compositional Data Analysis Workshop - CoDaWork’03, Girona, Spain.
PAWLOWSKY-GLAHN, V., and EGOZCUE, J.J. (2008), “Compositional Data and Simpson’s Paradox,” in Proceedings of the Third Compositional Data Analysis Workshop - CoDaWork’08, Girona, Spain.
SOTO, J., FLORES-SINTAS, A., and PALAREA-ALBALADEJO, J. (2008), “Improving Probabilities in a Fuzzy Clustering Partition,” Fuzzy Sets & Systems, 159, 406–421.
Article MathSciNet MATH Google Scholar
TEMPL, M., FILZMOSER, P., and REIMANN, C. (2008), “Cluster Analysis Applied to Regional Geochemical Data: Problems and Possibilities,” Applied Geochemistry, 23, 2198–2213.
Article Google Scholar
VÊNCIO, R., VARUZZA, L., PEREIRA, C., BRENTANI, H. and SHMULEVICH, I. (2007), “Simcluster: Clustering Enumeration Gene Expression Data on the Simplex Space,” BMC Bioinformatics, 8, 246.
Article Google Scholar
WAHL, E.R. (2004), “A General Framework for Determining Cut-off Values to Select Pollen Analogs with Dissimilarity Metrics in the Modern Analog Technique,” Review of Palaeobotany and Palynology, 128, 263–280.
Article Google Scholar
WANG, H., LIU, Q., MOK, H.M.K., FU, L., and TSE, W.M. (2007), “A Hyperspherical Transformation Forecasting Model for Compositional Data,” European Journal of Operations Research, 179, 459–468.
Article MATH Google Scholar
WATSON, D.F., and PHILIP, G.M. (1989), “Measures of Variability for Geological Data,” Mathematical Geology, 21, 233–254.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Biomathematics and Statistics Scotland, JCMB, The King’s Buildings, Edinburgh, EH9 3JZ, UK
Javier Palarea-Albaladejo
Universitat de Girona, Girona, Spain
Josep Antoni Martín-Fernández
Universidad Católica San Antonio, Murcia, Spain
Jesús A. Soto

Authors

Javier Palarea-Albaladejo
View author publications
You can also search for this author in PubMed Google Scholar
Josep Antoni Martín-Fernández
View author publications
You can also search for this author in PubMed Google Scholar
Jesús A. Soto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Javier Palarea-Albaladejo.

Additional information

This research has been supported by the Scottish Government, the Spanish Ministry of Science and Innovation under the project “CODA-RSS” Ref. MTM2009-13272; and by the Agència de Gestió d’Ajuts Universitaris i de Recerca of the Generalitat de Catalunya under the project Ref: 2009SGR424. We are in debt with the editor and the referees for their helpful comments and suggestions on an earlier version of this paper.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Palarea-Albaladejo, J., Martín-Fernández, J.A. & Soto, J.A. Dealing with Distances and Transformations for Fuzzy C-Means Clustering of Compositional Data. J Classif 29, 144–169 (2012). https://doi.org/10.1007/s00357-012-9105-4

Download citation

Published: 30 May 2012
Issue Date: July 2012
DOI: https://doi.org/10.1007/s00357-012-9105-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dealing with Distances and Transformations for Fuzzy C-Means Clustering of Compositional Data

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Data clustering: application and trends

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dealing with Distances and Transformations for Fuzzy C-Means Clustering of Compositional Data

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Data clustering: application and trends

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation