Skip to main content

Can the Compositional Nature of Compositional Data Be Ignored by Using Deep Learning Approaches?

  • Conference paper
  • First Online:
Studies in Theoretical and Applied Statistics (SIS 2021)

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 406))

Included in the following conference series:

  • 370 Accesses

Abstract

The application of non-compositional methods to compositional data representing parts of a whole should be avoided from a theoretical point of view. For example, one can show that Pearson correlations applied to compositional data are biased to be negative. Moreover, almost all statistical methods lead to biased estimates when applied to compositional data. One way out is to analyze data after representing them in log-ratio coordinates. However, several implications arise, such as interpretation in log-ratios and dealing with zeros and non-detects where log-ratios are undefined. When focusing on settings where only the prediction and classification error is important rather than an interpretation of results, one might argue to rather use non-linear methods to avoid those implications. Generally, it is known that misclassification and prediction errors are lower with a log-ratio approach when using machine learning methods that model the linear relationship between variables. However, is this also true when training a neural network who may learn the inner relationships between parts of a whole also without representing the data in log-ratios? This paper gives an answer of this matter based on applications with multiple real data sets, leading to the recommendation to use a compositional treatment of compositional data in any case. Misclassification and prediction errors are lower when nonlinear methods, and in particular deep learning methods, are applied together with a compositional treatment of compositional data. The compositional treatment of compositional data therefore remains very important even in the context of focusing on prediction errors using deep artificial neural networks or other nonlinear methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C.: Tensorflow: large-scale machine learning on heterogeneous systems (2018). https://www.tensorflow.org/. Version: 1.10.0, Software available from tensorflow.org

  2. Aitchison, J.: The Statistical Analysis of Compositional Data. Chapman & Hall, London (1986)

    Book  MATH  Google Scholar 

  3. Allaire, J.J., Tang, Y.: Tensorflow: R Interface to ‘TensorFlow’ (2019). https://github.com/rstudio/tensorflow. R package version 2.0.0

  4. Butler, A., Glasbey, C.: A latent gaussian model for compositional data with zeros. J. Roy. Stat. Soc. Ser. C (Appl. Stat.) 57(5), 505–520 (2008). https://doi.org/10.1111/j.1467-9876.2008.00627.x

    Article  MathSciNet  Google Scholar 

  5. Chollet, F., et al.: Keras (2015). https://keras.io

  6. da Silva, P.M., Gauche, C., Gonzaga, L.V., Costa, A.C.O., Fett, R.: Honey: chemical composition, stability and authenticity. Food Chem. 196, 309–323 (2016). ISSN 0308-8146. https://doi.org/10.1016/j.foodchem.2015.09.051

  7. Egozcue, J.J., Pawlowsky-Glahn, V.: Groups of parts and their balances in compositional data analysis. Math. Geol. 37(7), 795–828 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  8. Egozcue, J.J., Pawlowsky-Glahn, V., Mateu-Figueras, G., Barceló-Vidal, C.: Isometric logratio transformations for compositional data analysis. Math. Geol. 35(3), 279–300 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  9. Escalante, H.J.: Automated Machine Learning—A Brief Review at the End of the Early Years, pp. 11–28. Springer International Publishing, Cham (2021)

    Google Scholar 

  10. Escuredo, O., Dobre, I., Fernández-González M., Seijo, M.C.: Contribution of botanical origin and sugar composition of honeys on the crystallization phenomenon. Food Chem. 149, 84–90 (2014). ISSN 0308-8146. https://doi.org/10.1016/j.foodchem.2013.10.097

  11. Fakhlaei, R., Selamat, J., Khatib, A., Faizal, A., Razis, A., Sukor, R., Ahmad, S., Babadi, A.A.: The toxic impact of honey adulteration: a review. Foods 9(11) (2020). ISSN 2304-8158. https://doi.org/10.3390/foods9111538

  12. Filzmoser, P., Walczak, B.: What can go wrong at the data normalization step for identification of biomarkers? J. Chromatogr. A 1362, 194–205 (2014). ISSN 0021-9673. https://doi.org/10.1016/j.chroma.2014.08.050

  13. Filzmoser, P., Hron, K., Templ, M.: Discriminant analysis for compositional data and robust estimation. J. Comput. Stat. 27(4), 585–604 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  14. Filzmoser, P., Hron, K., Templ, M.: Applied Compositional Data Analysis. Springer International Publishing (2018). ISBN 9783319964225. https://doi.org/10.1007/978-3-319-96422-5

  15. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer, New York (2009). ISBN 978-0-387-84857-0

    Google Scholar 

  16. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification (2015)

    Google Scholar 

  17. He, X., Zhao, K., Chu, X.: AutoML: a survey of the state-of-the-art. Knowl. Based Syst. 212, 106622 (2021). ISSN 0950-7051. https://doi.org/10.1016/j.knosys.2020.106622

  18. Hron, K., Menafoglio, A., Palarea-Albaladejo, J., Filzmoser, P., Talská, R., Egozcue, J.J.: Weighting of parts in compositional data analysis: advances and applications. Math. Geosci. 54, 71–93 (2022). https://doi.org/10.1007/s11004-021-09952-y

    Article  MathSciNet  Google Scholar 

  19. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR. abs/1412.6980 (2014)

    Google Scholar 

  20. Korhonová, M., Hron, K., Klimcíková, D., Müller, L., Bednář, P., Barták, P.: Coffee aroma-statistical analysis of compositional data. Talanta 80, 710–715 (2009). https://doi.org/10.1016/j.talanta.2009.07.054

  21. Leininger, T.J., Gelfand, A.E., Allen, J.M., Silander, J.A.: Spatial regression modeling for compositional data with many zeros. J. Agric. Biol. Environ. Stat. 18(3), 314–334 (2013). https://doi.org/10.1007/s13253-013-0145-y

    Article  MathSciNet  MATH  Google Scholar 

  22. Lovell, D., Müller, W., Taylor, J., Zwart, A., Helliwell, C.: Proportions, percentages, PPM: do the molecular biosciences treat compositional data right? In: Compositional Data Analysis: Theory and Applications, pp. 191–207. Wiley (2011). https://doi.org/10.1002/9781119976462.ch14

  23. Lubbe, S., Templ, M., Filzmoser, P.: Comparison of zero replacement strategies for compositional data with large numbers of zeros. Chemom. Intell. Lab. Syst. 215, 104248 (2021)

    Article  Google Scholar 

  24. Majka, M.: Naivebayes: high performance implementation of the Naive Bayes algorithm in R (2019). https://CRAN.R-project.org/package=naivebayes. R package version 0.9.7

  25. Malyjurek, Z., de Beer, D., Joubert, E., Walczak, B.: Working with log-ratios. Anal. Chimica Acta 1059, 16–27 (2019). ISSN 0003-2670. https://doi.org/10.1016/j.aca.2019.01.041

  26. Reimann, C., Birke, M., Demetriades, A., Filzmoser, P., O’Connor, P., Akinfiev, G., Albanese, S., Amashukeli, Y., Andersson, M., Arnoldussen, A., Artamonov, Y., Audion, A., Baritz, R., Barker, K., Batista, M., Bellan, A., Belougushev, V., Bitz, I., Branellec, M., Zomeni, Z.: Chemistry of Europe’s Agricultural Soils—Part A: Methodology and Interpretation of the Gemas Data Set (2014). ISBN 978-3-510-96846-6

    Google Scholar 

  27. Ren, P., Xiao, Y., Chang, X., Huang, P., Li, Z., Chen, X., Wang, X.: A comprehensive survey of neural architecture search: challenges and solutions. ACM Comput. Surv. 54(4). ISSN 0360-0300. https://doi.org/10.1145/3447582

  28. Ruder, S.: An overview of gradient descent optimization algorithms (2016). arXiv: 1609.04747

  29. Santos-Buelga, C., González-Paramás, A.M.: Chemical Composition of Honey, pp. 43–82. Springer International Publishing, Cham (2017). ISBN 978-3-319-59689-1

    Google Scholar 

  30. Scealy, J.L., Welsh, A.H.: Regression for compositional data by using distributions defined on the hypersphere. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 73(3), 351–375 (2011). https://doi.org/10.1111/j.1467-9868.2010.00766.x

    Article  MathSciNet  MATH  Google Scholar 

  31. Scealy, J.L., Welsh, A.H.: Colours and cocktails: compositional data analysis 2013 lancaster lecture. Aust. NZ J. Stat. 56(2), 145–169 (2014). https://doi.org/10.1111/anzs.12073

    Article  MathSciNet  MATH  Google Scholar 

  32. Scealy, J.L., Wood, A.T.A.: Score matching for compositional distributions (2020)

    Google Scholar 

  33. Se, K.W., Wahab, R.A., Syed Yaacob, S.N., Ghoshal, S.K.: Detection techniques for adulterants in honey: challenges and recent trends. J. Food Compos. Anal. 80, 16–32 (2019). ISSN 0889-1575. https://doi.org/10.1016/j.jfca.2019.04.001

  34. Soares, S., Amaral, J.S., Oliveira, M.B.P.P., Mafra, I.: A comprehensive review on the main honey authentication issues: production and origin. Compr. Rev. Food Sci. Food Saf. 16(5), 1072–1100 (2017). https://doi.org/10.1111/1541-4337.12278

    Article  Google Scholar 

  35. Stewart, C., Field, C.: Managing the essential zeros in quantitative fatty acid signature analysis. J. Agric. Biol. Environ. Stat. 16(1), 45–69 (2011). https://doi.org/10.1007/s13253-010-0040-8. March

    Article  MathSciNet  MATH  Google Scholar 

  36. Templ, M.: Artificial Neural Networks to Impute Rounded Zeros in Compositional Data, pp. 163–187. Springer International Publishing, Cham (2021). ISBN 978-3-030-71175-7

    Google Scholar 

  37. Templ, M., Templ, B.: Analysis of chemical compounds in beverages—guidance for establishing a compositional analysis. Food Chem. 325, 1–7 (2020)

    Article  Google Scholar 

  38. Templ, M., Templ, B.: Statistical analysis of chemical element compositions in food science: problems and possibilities. Molecules 26(19) (2021). https://doi.org/10.3390/molecules26195752

  39. Templ, M., Hron, K., Filzmoser, P., Gardlo, A.: Imputation of rounded zeros for high-dimensional compositional data. Chemometr. Intell. Lab. Syst. 155, 183–190 (2016). https://doi.org/10.1016/j.chemolab.2016.04.011

    Article  Google Scholar 

  40. Templ, M., Hron, K., Filzmoser, P.: Exploratory tools for outlier detection in compositional data with structural zeros. J. Appl. Stat. 44(4), 734–752 (2017). https://doi.org/10.1080/02664763.2016.1182135

    Article  MathSciNet  MATH  Google Scholar 

  41. Tsagris, M., Stewart, C.: A folded model for compositional data analysis. Aust. NZ J. Stat. 62(2), 249–277 (2020). https://doi.org/10.1111/anzs.12289

    Article  MathSciNet  Google Scholar 

  42. Varmuza, K., Steiner, I., Glinsner, T., Klein, H.: Chemometric evaluation of concentration profiles from compounds relevant in beer ageing. Eur. Food Res. Technol. 215(3), 235–239 (2002). https://doi.org/10.1007/s00217-002-0539-5

    Article  Google Scholar 

  43. Wang, J., Li, Q.X.: Chapter 3—chemical composition, characterization, and differentiation of honey botanical and geographical origins. Volume 62 of Advances in Food and Nutrition Research, pp. 89–137. Academic Press (2011). https://doi.org/10.1016/B978-0-12-385989-1.00003-X

  44. Wistuba, M., Rawat, A., Pedapati, T.: A survey on neural architecture search. CoRR, abs/1905.01392 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matthias Templ .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Templ, M. (2022). Can the Compositional Nature of Compositional Data Be Ignored by Using Deep Learning Approaches?. In: Salvati, N., Perna, C., Marchetti, S., Chambers, R. (eds) Studies in Theoretical and Applied Statistics . SIS 2021. Springer Proceedings in Mathematics & Statistics, vol 406. Springer, Cham. https://doi.org/10.1007/978-3-031-16609-9_11

Download citation

Publish with us

Policies and ethics