Can the Compositional Nature of Compositional Data Be Ignored by Using Deep Learning Approaches?

Templ, Matthias

doi:10.1007/978-3-031-16609-9_11

Matthias Templ⁵

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 406))

Included in the following conference series:

Convegno della Società Italiana di Statistica

370 Accesses

Abstract

The application of non-compositional methods to compositional data representing parts of a whole should be avoided from a theoretical point of view. For example, one can show that Pearson correlations applied to compositional data are biased to be negative. Moreover, almost all statistical methods lead to biased estimates when applied to compositional data. One way out is to analyze data after representing them in log-ratio coordinates. However, several implications arise, such as interpretation in log-ratios and dealing with zeros and non-detects where log-ratios are undefined. When focusing on settings where only the prediction and classification error is important rather than an interpretation of results, one might argue to rather use non-linear methods to avoid those implications. Generally, it is known that misclassification and prediction errors are lower with a log-ratio approach when using machine learning methods that model the linear relationship between variables. However, is this also true when training a neural network who may learn the inner relationships between parts of a whole also without representing the data in log-ratios? This paper gives an answer of this matter based on applications with multiple real data sets, leading to the recommendation to use a compositional treatment of compositional data in any case. Misclassification and prediction errors are lower when nonlinear methods, and in particular deep learning methods, are applied together with a compositional treatment of compositional data. The compositional treatment of compositional data therefore remains very important even in the context of focusing on prediction errors using deep artificial neural networks or other nonlinear methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Artificial Neural Networks to Impute Rounded Zeros in Compositional Data

Modelling Compositional Data. The Sample Space Approach

Flexible non-parametric regression models for compositional response data with zeros

Article Open access 22 July 2023

References

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C.: Tensorflow: large-scale machine learning on heterogeneous systems (2018). https://www.tensorflow.org/. Version: 1.10.0, Software available from tensorflow.org
Aitchison, J.: The Statistical Analysis of Compositional Data. Chapman & Hall, London (1986)
Book MATH Google Scholar
Allaire, J.J., Tang, Y.: Tensorflow: R Interface to ‘TensorFlow’ (2019). https://github.com/rstudio/tensorflow. R package version 2.0.0
Butler, A., Glasbey, C.: A latent gaussian model for compositional data with zeros. J. Roy. Stat. Soc. Ser. C (Appl. Stat.) 57(5), 505–520 (2008). https://doi.org/10.1111/j.1467-9876.2008.00627.x
Article MathSciNet Google Scholar
Chollet, F., et al.: Keras (2015). https://keras.io
da Silva, P.M., Gauche, C., Gonzaga, L.V., Costa, A.C.O., Fett, R.: Honey: chemical composition, stability and authenticity. Food Chem. 196, 309–323 (2016). ISSN 0308-8146. https://doi.org/10.1016/j.foodchem.2015.09.051
Egozcue, J.J., Pawlowsky-Glahn, V.: Groups of parts and their balances in compositional data analysis. Math. Geol. 37(7), 795–828 (2005)
Article MathSciNet MATH Google Scholar
Egozcue, J.J., Pawlowsky-Glahn, V., Mateu-Figueras, G., Barceló-Vidal, C.: Isometric logratio transformations for compositional data analysis. Math. Geol. 35(3), 279–300 (2003)
Article MathSciNet MATH Google Scholar
Escalante, H.J.: Automated Machine Learning—A Brief Review at the End of the Early Years, pp. 11–28. Springer International Publishing, Cham (2021)
Google Scholar
Escuredo, O., Dobre, I., Fernández-González M., Seijo, M.C.: Contribution of botanical origin and sugar composition of honeys on the crystallization phenomenon. Food Chem. 149, 84–90 (2014). ISSN 0308-8146. https://doi.org/10.1016/j.foodchem.2013.10.097
Fakhlaei, R., Selamat, J., Khatib, A., Faizal, A., Razis, A., Sukor, R., Ahmad, S., Babadi, A.A.: The toxic impact of honey adulteration: a review. Foods 9(11) (2020). ISSN 2304-8158. https://doi.org/10.3390/foods9111538
Filzmoser, P., Walczak, B.: What can go wrong at the data normalization step for identification of biomarkers? J. Chromatogr. A 1362, 194–205 (2014). ISSN 0021-9673. https://doi.org/10.1016/j.chroma.2014.08.050
Filzmoser, P., Hron, K., Templ, M.: Discriminant analysis for compositional data and robust estimation. J. Comput. Stat. 27(4), 585–604 (2012)
Article MathSciNet MATH Google Scholar
Filzmoser, P., Hron, K., Templ, M.: Applied Compositional Data Analysis. Springer International Publishing (2018). ISBN 9783319964225. https://doi.org/10.1007/978-3-319-96422-5
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer, New York (2009). ISBN 978-0-387-84857-0
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification (2015)
Google Scholar
He, X., Zhao, K., Chu, X.: AutoML: a survey of the state-of-the-art. Knowl. Based Syst. 212, 106622 (2021). ISSN 0950-7051. https://doi.org/10.1016/j.knosys.2020.106622
Hron, K., Menafoglio, A., Palarea-Albaladejo, J., Filzmoser, P., Talská, R., Egozcue, J.J.: Weighting of parts in compositional data analysis: advances and applications. Math. Geosci. 54, 71–93 (2022). https://doi.org/10.1007/s11004-021-09952-y
Article MathSciNet Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR. abs/1412.6980 (2014)
Google Scholar
Korhonová, M., Hron, K., Klimcíková, D., Müller, L., Bednář, P., Barták, P.: Coffee aroma-statistical analysis of compositional data. Talanta 80, 710–715 (2009). https://doi.org/10.1016/j.talanta.2009.07.054
Leininger, T.J., Gelfand, A.E., Allen, J.M., Silander, J.A.: Spatial regression modeling for compositional data with many zeros. J. Agric. Biol. Environ. Stat. 18(3), 314–334 (2013). https://doi.org/10.1007/s13253-013-0145-y
Article MathSciNet MATH Google Scholar
Lovell, D., Müller, W., Taylor, J., Zwart, A., Helliwell, C.: Proportions, percentages, PPM: do the molecular biosciences treat compositional data right? In: Compositional Data Analysis: Theory and Applications, pp. 191–207. Wiley (2011). https://doi.org/10.1002/9781119976462.ch14
Lubbe, S., Templ, M., Filzmoser, P.: Comparison of zero replacement strategies for compositional data with large numbers of zeros. Chemom. Intell. Lab. Syst. 215, 104248 (2021)
Article Google Scholar
Majka, M.: Naivebayes: high performance implementation of the Naive Bayes algorithm in R (2019). https://CRAN.R-project.org/package=naivebayes. R package version 0.9.7
Malyjurek, Z., de Beer, D., Joubert, E., Walczak, B.: Working with log-ratios. Anal. Chimica Acta 1059, 16–27 (2019). ISSN 0003-2670. https://doi.org/10.1016/j.aca.2019.01.041
Reimann, C., Birke, M., Demetriades, A., Filzmoser, P., O’Connor, P., Akinfiev, G., Albanese, S., Amashukeli, Y., Andersson, M., Arnoldussen, A., Artamonov, Y., Audion, A., Baritz, R., Barker, K., Batista, M., Bellan, A., Belougushev, V., Bitz, I., Branellec, M., Zomeni, Z.: Chemistry of Europe’s Agricultural Soils—Part A: Methodology and Interpretation of the Gemas Data Set (2014). ISBN 978-3-510-96846-6
Google Scholar
Ren, P., Xiao, Y., Chang, X., Huang, P., Li, Z., Chen, X., Wang, X.: A comprehensive survey of neural architecture search: challenges and solutions. ACM Comput. Surv. 54(4). ISSN 0360-0300. https://doi.org/10.1145/3447582
Ruder, S.: An overview of gradient descent optimization algorithms (2016). arXiv: 1609.04747
Santos-Buelga, C., González-Paramás, A.M.: Chemical Composition of Honey, pp. 43–82. Springer International Publishing, Cham (2017). ISBN 978-3-319-59689-1
Google Scholar
Scealy, J.L., Welsh, A.H.: Regression for compositional data by using distributions defined on the hypersphere. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 73(3), 351–375 (2011). https://doi.org/10.1111/j.1467-9868.2010.00766.x
Article MathSciNet MATH Google Scholar
Scealy, J.L., Welsh, A.H.: Colours and cocktails: compositional data analysis 2013 lancaster lecture. Aust. NZ J. Stat. 56(2), 145–169 (2014). https://doi.org/10.1111/anzs.12073
Article MathSciNet MATH Google Scholar
Scealy, J.L., Wood, A.T.A.: Score matching for compositional distributions (2020)
Google Scholar
Se, K.W., Wahab, R.A., Syed Yaacob, S.N., Ghoshal, S.K.: Detection techniques for adulterants in honey: challenges and recent trends. J. Food Compos. Anal. 80, 16–32 (2019). ISSN 0889-1575. https://doi.org/10.1016/j.jfca.2019.04.001
Soares, S., Amaral, J.S., Oliveira, M.B.P.P., Mafra, I.: A comprehensive review on the main honey authentication issues: production and origin. Compr. Rev. Food Sci. Food Saf. 16(5), 1072–1100 (2017). https://doi.org/10.1111/1541-4337.12278
Article Google Scholar
Stewart, C., Field, C.: Managing the essential zeros in quantitative fatty acid signature analysis. J. Agric. Biol. Environ. Stat. 16(1), 45–69 (2011). https://doi.org/10.1007/s13253-010-0040-8. March
Article MathSciNet MATH Google Scholar
Templ, M.: Artificial Neural Networks to Impute Rounded Zeros in Compositional Data, pp. 163–187. Springer International Publishing, Cham (2021). ISBN 978-3-030-71175-7
Google Scholar
Templ, M., Templ, B.: Analysis of chemical compounds in beverages—guidance for establishing a compositional analysis. Food Chem. 325, 1–7 (2020)
Article Google Scholar
Templ, M., Templ, B.: Statistical analysis of chemical element compositions in food science: problems and possibilities. Molecules 26(19) (2021). https://doi.org/10.3390/molecules26195752
Templ, M., Hron, K., Filzmoser, P., Gardlo, A.: Imputation of rounded zeros for high-dimensional compositional data. Chemometr. Intell. Lab. Syst. 155, 183–190 (2016). https://doi.org/10.1016/j.chemolab.2016.04.011
Article Google Scholar
Templ, M., Hron, K., Filzmoser, P.: Exploratory tools for outlier detection in compositional data with structural zeros. J. Appl. Stat. 44(4), 734–752 (2017). https://doi.org/10.1080/02664763.2016.1182135
Article MathSciNet MATH Google Scholar
Tsagris, M., Stewart, C.: A folded model for compositional data analysis. Aust. NZ J. Stat. 62(2), 249–277 (2020). https://doi.org/10.1111/anzs.12289
Article MathSciNet Google Scholar
Varmuza, K., Steiner, I., Glinsner, T., Klein, H.: Chemometric evaluation of concentration profiles from compounds relevant in beer ageing. Eur. Food Res. Technol. 215(3), 235–239 (2002). https://doi.org/10.1007/s00217-002-0539-5
Article Google Scholar
Wang, J., Li, Q.X.: Chapter 3—chemical composition, characterization, and differentiation of honey botanical and geographical origins. Volume 62 of Advances in Food and Nutrition Research, pp. 89–137. Academic Press (2011). https://doi.org/10.1016/B978-0-12-385989-1.00003-X
Wistuba, M., Rawat, A., Pedapati, T.: A survey on neural architecture search. CoRR, abs/1905.01392 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Data Analysis and Process Design, Zurich University of Applied Sciences, Rosenstrasse 3, 8404, Winterthur, Switzerland
Matthias Templ

Authors

Matthias Templ
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthias Templ .

Editor information

Editors and Affiliations

Department of Economics and Management, University of Pisa, Pisa, Italy
Nicola Salvati
Department of Economics and Statistics, University of Salerno, Fisciano, Salerno, Italy
Cira Perna
Department of Economics and Management, University of Pisa, Pisa, Italy
Stefano Marchetti
School of Mathematics and Applied Statistics, University of Wollongong, Wollongong, NSW, Australia
Raymond Chambers

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Templ, M. (2022). Can the Compositional Nature of Compositional Data Be Ignored by Using Deep Learning Approaches?. In: Salvati, N., Perna, C., Marchetti, S., Chambers, R. (eds) Studies in Theoretical and Applied Statistics . SIS 2021. Springer Proceedings in Mathematics & Statistics, vol 406. Springer, Cham. https://doi.org/10.1007/978-3-031-16609-9_11

Download citation

DOI: https://doi.org/10.1007/978-3-031-16609-9_11
Published: 15 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16608-2
Online ISBN: 978-3-031-16609-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Can the Compositional Nature of Compositional Data Be Ignored by Using Deep Learning Approaches?

Abstract

Access this chapter

Similar content being viewed by others

Artificial Neural Networks to Impute Rounded Zeros in Compositional Data

Modelling Compositional Data. The Sample Space Approach

Flexible non-parametric regression models for compositional response data with zeros

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Can the Compositional Nature of Compositional Data Be Ignored by Using Deep Learning Approaches?

Abstract

Access this chapter

Similar content being viewed by others

Artificial Neural Networks to Impute Rounded Zeros in Compositional Data

Modelling Compositional Data. The Sample Space Approach

Flexible non-parametric regression models for compositional response data with zeros

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation