Abstract
The application of non-compositional methods to compositional data representing parts of a whole should be avoided from a theoretical point of view. For example, one can show that Pearson correlations applied to compositional data are biased to be negative. Moreover, almost all statistical methods lead to biased estimates when applied to compositional data. One way out is to analyze data after representing them in log-ratio coordinates. However, several implications arise, such as interpretation in log-ratios and dealing with zeros and non-detects where log-ratios are undefined. When focusing on settings where only the prediction and classification error is important rather than an interpretation of results, one might argue to rather use non-linear methods to avoid those implications. Generally, it is known that misclassification and prediction errors are lower with a log-ratio approach when using machine learning methods that model the linear relationship between variables. However, is this also true when training a neural network who may learn the inner relationships between parts of a whole also without representing the data in log-ratios? This paper gives an answer of this matter based on applications with multiple real data sets, leading to the recommendation to use a compositional treatment of compositional data in any case. Misclassification and prediction errors are lower when nonlinear methods, and in particular deep learning methods, are applied together with a compositional treatment of compositional data. The compositional treatment of compositional data therefore remains very important even in the context of focusing on prediction errors using deep artificial neural networks or other nonlinear methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C.: Tensorflow: large-scale machine learning on heterogeneous systems (2018). https://www.tensorflow.org/. Version: 1.10.0, Software available from tensorflow.org
Aitchison, J.: The Statistical Analysis of Compositional Data. Chapman & Hall, London (1986)
Allaire, J.J., Tang, Y.: Tensorflow: R Interface to ‘TensorFlow’ (2019). https://github.com/rstudio/tensorflow. R package version 2.0.0
Butler, A., Glasbey, C.: A latent gaussian model for compositional data with zeros. J. Roy. Stat. Soc. Ser. C (Appl. Stat.) 57(5), 505–520 (2008). https://doi.org/10.1111/j.1467-9876.2008.00627.x
Chollet, F., et al.: Keras (2015). https://keras.io
da Silva, P.M., Gauche, C., Gonzaga, L.V., Costa, A.C.O., Fett, R.: Honey: chemical composition, stability and authenticity. Food Chem. 196, 309–323 (2016). ISSN 0308-8146. https://doi.org/10.1016/j.foodchem.2015.09.051
Egozcue, J.J., Pawlowsky-Glahn, V.: Groups of parts and their balances in compositional data analysis. Math. Geol. 37(7), 795–828 (2005)
Egozcue, J.J., Pawlowsky-Glahn, V., Mateu-Figueras, G., Barceló-Vidal, C.: Isometric logratio transformations for compositional data analysis. Math. Geol. 35(3), 279–300 (2003)
Escalante, H.J.: Automated Machine Learning—A Brief Review at the End of the Early Years, pp. 11–28. Springer International Publishing, Cham (2021)
Escuredo, O., Dobre, I., Fernández-González M., Seijo, M.C.: Contribution of botanical origin and sugar composition of honeys on the crystallization phenomenon. Food Chem. 149, 84–90 (2014). ISSN 0308-8146. https://doi.org/10.1016/j.foodchem.2013.10.097
Fakhlaei, R., Selamat, J., Khatib, A., Faizal, A., Razis, A., Sukor, R., Ahmad, S., Babadi, A.A.: The toxic impact of honey adulteration: a review. Foods 9(11) (2020). ISSN 2304-8158. https://doi.org/10.3390/foods9111538
Filzmoser, P., Walczak, B.: What can go wrong at the data normalization step for identification of biomarkers? J. Chromatogr. A 1362, 194–205 (2014). ISSN 0021-9673. https://doi.org/10.1016/j.chroma.2014.08.050
Filzmoser, P., Hron, K., Templ, M.: Discriminant analysis for compositional data and robust estimation. J. Comput. Stat. 27(4), 585–604 (2012)
Filzmoser, P., Hron, K., Templ, M.: Applied Compositional Data Analysis. Springer International Publishing (2018). ISBN 9783319964225. https://doi.org/10.1007/978-3-319-96422-5
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer, New York (2009). ISBN 978-0-387-84857-0
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification (2015)
He, X., Zhao, K., Chu, X.: AutoML: a survey of the state-of-the-art. Knowl. Based Syst. 212, 106622 (2021). ISSN 0950-7051. https://doi.org/10.1016/j.knosys.2020.106622
Hron, K., Menafoglio, A., Palarea-Albaladejo, J., Filzmoser, P., Talská, R., Egozcue, J.J.: Weighting of parts in compositional data analysis: advances and applications. Math. Geosci. 54, 71–93 (2022). https://doi.org/10.1007/s11004-021-09952-y
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR. abs/1412.6980 (2014)
Korhonová, M., Hron, K., Klimcíková, D., Müller, L., Bednář, P., Barták, P.: Coffee aroma-statistical analysis of compositional data. Talanta 80, 710–715 (2009). https://doi.org/10.1016/j.talanta.2009.07.054
Leininger, T.J., Gelfand, A.E., Allen, J.M., Silander, J.A.: Spatial regression modeling for compositional data with many zeros. J. Agric. Biol. Environ. Stat. 18(3), 314–334 (2013). https://doi.org/10.1007/s13253-013-0145-y
Lovell, D., Müller, W., Taylor, J., Zwart, A., Helliwell, C.: Proportions, percentages, PPM: do the molecular biosciences treat compositional data right? In: Compositional Data Analysis: Theory and Applications, pp. 191–207. Wiley (2011). https://doi.org/10.1002/9781119976462.ch14
Lubbe, S., Templ, M., Filzmoser, P.: Comparison of zero replacement strategies for compositional data with large numbers of zeros. Chemom. Intell. Lab. Syst. 215, 104248 (2021)
Majka, M.: Naivebayes: high performance implementation of the Naive Bayes algorithm in R (2019). https://CRAN.R-project.org/package=naivebayes. R package version 0.9.7
Malyjurek, Z., de Beer, D., Joubert, E., Walczak, B.: Working with log-ratios. Anal. Chimica Acta 1059, 16–27 (2019). ISSN 0003-2670. https://doi.org/10.1016/j.aca.2019.01.041
Reimann, C., Birke, M., Demetriades, A., Filzmoser, P., O’Connor, P., Akinfiev, G., Albanese, S., Amashukeli, Y., Andersson, M., Arnoldussen, A., Artamonov, Y., Audion, A., Baritz, R., Barker, K., Batista, M., Bellan, A., Belougushev, V., Bitz, I., Branellec, M., Zomeni, Z.: Chemistry of Europe’s Agricultural Soils—Part A: Methodology and Interpretation of the Gemas Data Set (2014). ISBN 978-3-510-96846-6
Ren, P., Xiao, Y., Chang, X., Huang, P., Li, Z., Chen, X., Wang, X.: A comprehensive survey of neural architecture search: challenges and solutions. ACM Comput. Surv. 54(4). ISSN 0360-0300. https://doi.org/10.1145/3447582
Ruder, S.: An overview of gradient descent optimization algorithms (2016). arXiv: 1609.04747
Santos-Buelga, C., González-Paramás, A.M.: Chemical Composition of Honey, pp. 43–82. Springer International Publishing, Cham (2017). ISBN 978-3-319-59689-1
Scealy, J.L., Welsh, A.H.: Regression for compositional data by using distributions defined on the hypersphere. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 73(3), 351–375 (2011). https://doi.org/10.1111/j.1467-9868.2010.00766.x
Scealy, J.L., Welsh, A.H.: Colours and cocktails: compositional data analysis 2013 lancaster lecture. Aust. NZ J. Stat. 56(2), 145–169 (2014). https://doi.org/10.1111/anzs.12073
Scealy, J.L., Wood, A.T.A.: Score matching for compositional distributions (2020)
Se, K.W., Wahab, R.A., Syed Yaacob, S.N., Ghoshal, S.K.: Detection techniques for adulterants in honey: challenges and recent trends. J. Food Compos. Anal. 80, 16–32 (2019). ISSN 0889-1575. https://doi.org/10.1016/j.jfca.2019.04.001
Soares, S., Amaral, J.S., Oliveira, M.B.P.P., Mafra, I.: A comprehensive review on the main honey authentication issues: production and origin. Compr. Rev. Food Sci. Food Saf. 16(5), 1072–1100 (2017). https://doi.org/10.1111/1541-4337.12278
Stewart, C., Field, C.: Managing the essential zeros in quantitative fatty acid signature analysis. J. Agric. Biol. Environ. Stat. 16(1), 45–69 (2011). https://doi.org/10.1007/s13253-010-0040-8. March
Templ, M.: Artificial Neural Networks to Impute Rounded Zeros in Compositional Data, pp. 163–187. Springer International Publishing, Cham (2021). ISBN 978-3-030-71175-7
Templ, M., Templ, B.: Analysis of chemical compounds in beverages—guidance for establishing a compositional analysis. Food Chem. 325, 1–7 (2020)
Templ, M., Templ, B.: Statistical analysis of chemical element compositions in food science: problems and possibilities. Molecules 26(19) (2021). https://doi.org/10.3390/molecules26195752
Templ, M., Hron, K., Filzmoser, P., Gardlo, A.: Imputation of rounded zeros for high-dimensional compositional data. Chemometr. Intell. Lab. Syst. 155, 183–190 (2016). https://doi.org/10.1016/j.chemolab.2016.04.011
Templ, M., Hron, K., Filzmoser, P.: Exploratory tools for outlier detection in compositional data with structural zeros. J. Appl. Stat. 44(4), 734–752 (2017). https://doi.org/10.1080/02664763.2016.1182135
Tsagris, M., Stewart, C.: A folded model for compositional data analysis. Aust. NZ J. Stat. 62(2), 249–277 (2020). https://doi.org/10.1111/anzs.12289
Varmuza, K., Steiner, I., Glinsner, T., Klein, H.: Chemometric evaluation of concentration profiles from compounds relevant in beer ageing. Eur. Food Res. Technol. 215(3), 235–239 (2002). https://doi.org/10.1007/s00217-002-0539-5
Wang, J., Li, Q.X.: Chapter 3—chemical composition, characterization, and differentiation of honey botanical and geographical origins. Volume 62 of Advances in Food and Nutrition Research, pp. 89–137. Academic Press (2011). https://doi.org/10.1016/B978-0-12-385989-1.00003-X
Wistuba, M., Rawat, A., Pedapati, T.: A survey on neural architecture search. CoRR, abs/1905.01392 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Templ, M. (2022). Can the Compositional Nature of Compositional Data Be Ignored by Using Deep Learning Approaches?. In: Salvati, N., Perna, C., Marchetti, S., Chambers, R. (eds) Studies in Theoretical and Applied Statistics . SIS 2021. Springer Proceedings in Mathematics & Statistics, vol 406. Springer, Cham. https://doi.org/10.1007/978-3-031-16609-9_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-16609-9_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16608-2
Online ISBN: 978-3-031-16609-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)