Improved Classification for Compositional Data Using the α-transformation

Tsagris, Michail; Preston, Simon; Wood, Andrew T. A.

doi:10.1007/s00357-016-9207-5

Improved Classification for Compositional Data Using the α-transformation

Published: 01 August 2016

Volume 33, pages 243–261, (2016)
Cite this article

Journal of Classification Aims and scope Submit manuscript

Michail Tsagris¹,
Simon Preston² &
Andrew T. A. Wood²

497 Accesses
19 Citations
3 Altmetric
Explore all metrics

Abstract

In compositional data analysis, an observation is a vector containing nonnegative values, only the relative sizes of which are considered to be of interest. Without loss of generality, a compositional vector can be taken to be a vector of proportions that sum to one. Data of this type arise in many areas including geology, archaeology, biology, economics and political science. In this paper we investigate methods for classification of compositional data. Our approach centers on the idea of using the α-transformation to transform the data and then to classify the transformed data via regularized discriminant analysis and the k-nearest neighbors algorithm. Using the α-transformation generalizes two rival approaches in compositional data analysis, one (when α=1) that treats the data as though they were Euclidean, ignoring the compositional constraint, and another (when α = 0) that employs Aitchison’s centered log-ratio transformation. A numerical study with several real datasets shows that whether using α = 1 or α = 0 gives better classification performance depends on the dataset, and moreover that using an intermediate value of α can sometimes give better performance than using either 1 or 0.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

AITCHISON, J. (1982), “The Statistical Analysis of Compositional Data”, Journal of the Royal Statistical Society: Series B, 44, 139–177.
MathSciNet MATH Google Scholar
AITCHISON, J. (1983), “Principal Component Analysis of Compositional Data”, Biometrika, 70, 57–65.
Article MathSciNet MATH Google Scholar
AITCHISON, J. (1992), “On Criteria for Measures of Compositional Difference”, Mathematical Geology, 24, 365–379.
Article MathSciNet MATH Google Scholar
AITCHISON, J. (2003), The Statistical Analysis of Compositional Data (reprinted with additional material by The Blackburn Press), London, UK: Chapman & Hall.
Google Scholar
AITCHISON, J., BARCELO-VIDAL, C., MARTIN-FERNANDEZ, J.A., and PAWLOWSKY-GLAHN, V. (2000), “Logratio Analysis and Compositional Distance”, Mathematical Geology, 32, 271–275.
Article MATH Google Scholar
BAXTER,M.J. (2001), “Statistical Modelling of Artefact Compositional Data”, Archaeometry, 43, 131–147.
Article Google Scholar
BAXTER,M.J., BEARDAH, C. C., COOL, H. E.M., and JACKSON, C. M. (2005), “Compositional Data Analysis of Some Alkaline Glasses”, Mathematical Geology, 37, 183–196.
Article Google Scholar
BAXTER, M.J., and FREESTONE, I. C. (2006), “Log-Ratio Compositional Data Analysis in Archaeometry”, Archaeometry, 48, 511–531.
Article Google Scholar
BUTLER, A., and GLASBEY, C. (2008), “A Latent Gaussian Model for Compositional Data With Zeros”, Journal of the Royal Statistical Society: Series C, 57, 505–520.
Article MathSciNet Google Scholar
DRDYEN, I.L., and MARDIA, K. V. (1998), Statistical Shape Analysis, New York: Wiley.
Google Scholar
EGOZQUE, J.J., PAWLOWSKY-GLAHN, V., MATEU-FIGUERAS, G., and BARCELOVIDAL, C. (2003), “Isometric Logratio Transformations for Compositional Data Analysis”, Mathematical Geology, 35, 279–300.
Article MathSciNet MATH Google Scholar
ENDRES, D.M. and SCHINDELIN, J. E. (2003), “A New Metric for Probability Distributions”, IEEE Transactions on Information Theory, 49, 1858–1860.
Article MathSciNet MATH Google Scholar
FRY, J.M., FRY, T. R. L., and McLAREN, K. R. (2000), “Compositional Data Analysis and Zeros in Micro Data”, Applied Economics, 32, 953–959.
Article Google Scholar
GREENACRE, M. (2009), “Power Transformations in Correspondence Analysis”, Computational Statistics & Data Analysis, 53, 3107–3116.
Article MathSciNet MATH Google Scholar
GREENACRE, M. (2011), “Measuring Subcompositional Incoherence”, Mathematical Geosciences, 43, 681–693.
Article Google Scholar
GUEORGUIEVA, R., ROSENHECK, R., and ZELTERMAN, D. (2008), “Dirichlet Component Regression and Its Applications to Psychiatric Data”, Computational Statistics & Data Analysis, 52, 5344–5355.
Article MathSciNet MATH Google Scholar
HASTIE, T., TIBSHIRANI, R., and FRIEDMAN, J. (2001), The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Berlin: Springer.
Book MATH Google Scholar
IVERSON, S.J., FIELD, C., BOWEN, W.D., and BLANCHARD, W. (2004), “Quantitative Fatty Acid Signature Analysis: A New Method of Estimating Predator Diets”, Ecological Monographs, 74, 211–235.
Article Google Scholar
LANCASTER,H.O. (1965), “The Helmert Matrices”, American Mathematical Monthly, 72, 4–12.
Article MathSciNet MATH Google Scholar
LARROSA, J.M. (2003), “A Compositional Statistical Analysis of Capital Stock”, in Proceedings of the 1st Compositional Data Analysis Workshop, Girona, Spain.
Google Scholar
NEOCLEOUS, T., AITKEN, C., and ZADORA, G. (2011), “Transformations for Compositional Data With Zeros With an Application to Forensic Evidence Evaluation”, Chemometrics and Intelligent Laboratory Systems, 109, 77–85.
Article Google Scholar
OSTERREICHER, F., and VAJDA, I. (2003), “A New Class of Metric Divergences on Probability Spaces and Its Applicability in Statistics”, Annals of the Institute of Statistical Mathematics, 55, 639–653.
Article MathSciNet MATH Google Scholar
OTERO, N., TOLOSANA-DELGADO, R., SOLER, A., PAWLOWSKY-GLAHN, V., and CANALS, A. (2005), “Relative vs. Absolute Statistical Analysis of Compositions: A Comparative Study of SurfaceWaters of aMediterranean River”, Water Research, 39, 1404–1414.
Article Google Scholar
PALAREA-ALBALADEJO, J., MARTIN-FERNANDEZ, J.A., and SOTO, J.A. (2012), “Dealing with Distances and Transformations for Fuzzy C-means Clustering of Compositional Data”, Journal of Classification, 29, 144–169.
Article MathSciNet MATH Google Scholar
RODRIGUES, P.C., and LIMA, A.T. (2009), “Analysis of an European Union Election Using Principal Component Analysis”, Statistical Papers, 50, 895–904.
Article MathSciNet MATH Google Scholar
SCEALY, J.L., and WELSH, A.H. (2011), “Regression for Compositional Data by Using Distributions Defined on the Hypersphere”, Journal of the Royal Statistical Society: Series B, 73, 351–375.
Article MathSciNet Google Scholar
SCEALY, J.L., and WELSH, A.H. (2014), “Colours and Cocktails: Compositional Data Analysis: 2013 Lancaster Lecture”, Australian & New Zealand Journal of Statistics, 56, 145–169.
Article MathSciNet MATH Google Scholar
STEPHENS, M.A. (1982), ”Use of the Von Mises Distribution to Analyse Continuous Proportions”, Biometrika, 69, 197–203.
Article MathSciNet Google Scholar
STEWART, C., and FIELD, C. (2011), “Managing the Essential Zeros in Quantitative Fatty Acid Signature Analysis”, Journal of Agricultural, Biological, and Environmental Statistics, 16, 45–69.
Article MathSciNet MATH Google Scholar
TSAGRIS, M.T., PRESTON, S., and WOOD, A. T. A. (2011), “A Data-Based Power Transformation for Compositional Data”, in Proceedings of the 4th Compositional Data Analysis Workshop, Girona, Spain.
Google Scholar
TSAGRIS, M., and ATHINEOU, G. (2016), Compositional: A Collection of R Functions for Compositional Data Analysis (R package version 1.2.5), https://cran.r-project.org/web/packages/Compositional/
UC IRVINE MACHINE LEARNING REPOSITORY (2014), “Forensic Glass Dataset”, http://archive.ics.uci.edu/ml/datasets/Glass+Identification.
WORONOW, A. (1997), “The Elusive Benefits of Logratios”, in Proceedings of the 3rd Annual Conference of the International Association for Mathematical Geology, Barcelona, Spain.
Google Scholar
ZADORA, G., NEOCLEOUS, T., and AITKEN, C. (2010), “A Two-Level Model for Evidence Evaluation in the Presence of Zeros”, Journal of Forensic Sciences, 55, 371–384.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Crete, Heraklion, Greece
Michail Tsagris
University of Nottingham, Nottingham, UK
Simon Preston & Andrew T. A. Wood

Authors

Michail Tsagris
View author publications
You can also search for this author in PubMed Google Scholar
Simon Preston
View author publications
You can also search for this author in PubMed Google Scholar
Andrew T. A. Wood
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michail Tsagris.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(PDF 112 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tsagris, M., Preston, S. & Wood, A.T.A. Improved Classification for Compositional Data Using the α-transformation. J Classif 33, 243–261 (2016). https://doi.org/10.1007/s00357-016-9207-5

Download citation

Published: 01 August 2016
Issue Date: July 2016
DOI: https://doi.org/10.1007/s00357-016-9207-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improved Classification for Compositional Data Using the α-transformation

Abstract

Access this article

Similar content being viewed by others

Exploring Compositional Data with the Robust Compositional Biplot

Discriminant Analysis for Compositional Data Incorporating Cell-Wise Uncertainties

Compositional data: the sample space and its structure

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improved Classification for Compositional Data Using the α-transformation

Abstract

Access this article

Similar content being viewed by others

Exploring Compositional Data with the Robust Compositional Biplot

Discriminant Analysis for Compositional Data Incorporating Cell-Wise Uncertainties

Compositional data: the sample space and its structure

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation