Kent feature embedding for classification of compositional data with zeros

Lu, Shan; Wang, Wenjing; Guan, Rong

doi:10.1007/s11222-024-10382-z

Kent feature embedding for classification of compositional data with zeros

Original Paper
Published: 31 January 2024

Volume 34, article number 69, (2024)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Shan Lu¹,
Wenjing Wang¹ &
Rong Guan¹

223 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Compositional data have posed challenges to current classification methods owing to the non-negative and unit-sum constraints, especially when a certain of the components are zeros. In this paper, we develop an effective classification method for multivariate compositional data with certain of the components equal to zero. Specifically, a Kent feature embedding technique is first proposed to transform compositional data and improve data quality. We then use support vector machine as the state-of-the-art machine learning model to build the classifier. The proposed method is proved to be effective through numerical simulations. Results on multiple real datasets, including species classification, day-night image classification and household’s consumption pattern recognition, further verify that the proposed method can achieve good classification performance and outperform the other competitors. This method would help to broaden the practical usage of compositional data with zeros in the task of classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Novel Approach for Exploring Data-Driven Nutritional Insights Using Clustering and Dimensionality Reduction Techniques

Article 04 November 2024

Can the Compositional Nature of Compositional Data Be Ignored by Using Deep Learning Approaches?

Convex clustering method for compositional data modeling

Article 10 October 2020

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availibility statement

Data are public available and details are given in the paper. Data can also be made available on reasonable request.

Notes

To explicitly showcase the proposed Kent feature embedding, the corresponding pseudocode is depicted in Algorithm 2, conveniently placed in the Appendix to maintain the paper’s conciseness.

References

An, W., Liang, M.: A new intrusion detection method based on svm with minimum within-class scatter. Secur. Commun. Netw. 6(9), 1064–1074 (2013). https://doi.org/10.1002/sec.666
Article Google Scholar
Armanfard, N., Reilly, J.P., Komeili, M.: Local feature selection for data classification. IEEE Trans. Pattern Anal. Mach. Intell. 38(6), 1217–1227 (2016). https://doi.org/10.1109/TPAMI.2015.2478471
Article Google Scholar
Bello, M., Nápoles, G., Vanhoof, K., Bello, R.: Data quality measures based on granular computing for multi-label classification. Inf. Sci. 560, 51–67 (2021). https://doi.org/10.1016/j.ins.2021.01.027
Article Google Scholar
Cuesta-Albertos, J.A., Cuevas, A., Fraiman, R.: On projection-based tests for directional and compositional data. Stat. Comput. 19(4), 367 (2009). https://doi.org/10.1007/s11222-008-9098-3
Article MathSciNet Google Scholar
Fan, J., Feng, Y., Jiang, J., Tong, X.: Feature augmentation via nonparametrics and selection (fans) in high-dimensional classification. J. Am. Stat. Assoc. 111(513), 275–287 (2016). https://doi.org/10.1080/01621459.2015.1005212
Article MathSciNet Google Scholar
Fiksel, J., Zeger, S., Datta, A.: A transformation-free linear regression for compositional outcomes and predictors. Biometrics (2021). https://doi.org/10.1111/biom.13465
Article Google Scholar
Filzmoser, P., Hron, K., Templ, M.: Discriminant analysis for compositional data and robust parameter estimation. Comput. Stat. 27(4), 585–604 (2012). https://doi.org/10.1007/s00180-011-0279-8
Article MathSciNet Google Scholar
Fry, J.M., Fry, T.R., McLaren, K.R.: Compositional data analysis and zeros in micro data. Appl. Econ. 32(8), 953–959 (2000)
Article Google Scholar
Gou, J., Sun, L., Du, L., Ma, H., Xiong, T., Ou, W., Zhan, Y.: A representation coefficient-based k-nearest centroid neighbor classifier. Expert Syst. Appl. 194, 116529 (2022). https://doi.org/10.1016/j.eswa.2022.116529
Article Google Scholar
Greenacre, M., Grunsky, E., Bacon-Shone, J., Erb, I., Quinn, T.: Aitchison’s compositional data analysis 40 years on: a reappraisal. Stat. Sci. 38(3), 386–410 (2023)
Article MathSciNet Google Scholar
Gu, J., Wang, L., Wang, H., Wang, S.: A novel approach to intrusion detection using svm ensemble with feature augmentation. Comput. Secur. 86, 53–62 (2019). https://doi.org/10.1016/j.cose.2019.05.022
Article Google Scholar
Gu, J., Cui, B., Lu, S.: A classification framework for multivariate compositional data with dirichlet feature embedding. Knowl.-Based Syst. 212, 106614 (2021). https://doi.org/10.1016/j.knosys.2020.106614
Article Google Scholar
Huang, X., Shi, L., Suykens, J.A.K.: Support vector machine classifier with pinball loss. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 984–997 (2014). https://doi.org/10.1109/TPAMI.2013.178
Article Google Scholar
Jiménez-Cordero, A., Morales, J.M., Pineda, S.: A novel embedded min-max approach for feature selection in nonlinear support vector machine classification. Eur. J. Oper. Res. 293(1), 24–35 (2021). https://doi.org/10.1016/j.ejor.2020.12.009
Article MathSciNet Google Scholar
Kaiser, M., Klier, M., Heinrich, B.: How to measure data quality?-a metric-based approach. ICIS 2007 Proceedings, page 108, 2007
Kalivodová, A., Hron, K., Filzmoser, P., Najdekr, L., Janečková, H., Adam, T.: Pls-da for compositional data with application to metabolomics. J. Chemom. 29(1), 21–28 (2015). https://doi.org/10.1002/cem.2657
Article Google Scholar
Kent, J.T.: The fisher-bingham distribution on the sphere. J. Roy. Stat. Soc.: Ser. B (Methodol.) 44(1), 71–80 (1982). https://doi.org/10.1111/j.2517-6161.1982.tb01189.x
Article MathSciNet Google Scholar
Kovács, L., Kovács, G., Martín-Fernández, J. A., Barceló-Vidal, C.: Major-oxide compositional discrimination in cenozoic volcanites of hungary. In Buccianti, A., Mateu-Figueras, G. and Pawlowsky-Glahn, V. editors, Compositional data analysis in the geosciences: from theory to practice, pages 11–23. Geological Society, London, 2006. https://doi.org/10.1144/GSL.SP.2006.264.01.02
Kume, A., Walker, S.G.: Sampling from compositional and directional distributions. Stat. Comput. 16(3), 261–265 (2006). https://doi.org/10.1007/s11222-006-8077-9
Article MathSciNet Google Scholar
Kume, A., Wood, A.T.: Saddlepoint approximations for the bingham and fisher-bingham normalising constants. Biometrika 92(2), 465–476 (2005). https://doi.org/10.1093/biomet/92.2.465
Article MathSciNet Google Scholar
Lavanya, P., Kouser, K., Suresha, M.: Effective feature representation using symbolic approach for classification and clustering of big data. Expert Syst. Appl. 173, 114658 (2021). https://doi.org/10.1016/j.eswa.2021.114658
Article Google Scholar
Li, Y., Chai, Y., Zhou, H., Yin, H.: A novel dimension reduction and dictionary learning framework for high-dimensional data classification. Pattern Recogn. 112, 107793 (2021). https://doi.org/10.1016/j.patcog.2020.107793
Article Google Scholar
Li, Y., Zhu, L., Wang, H., Yu, F.R., Liu, S.: A cross-layer defense scheme for edge intelligence-enabled cbtc systems against mitm attacks. IEEE Trans. Intell. Transp. Syst. 22(4), 2286–2298 (2021). https://doi.org/10.1109/TITS.2020.3030496
Article Google Scholar
Liu, P., Tian, G.-L., Yuen, K.C., Sun, Y., Zhang, C.: Compositional inverse gaussian models with applications in compositional data analysis with possible zero observations. J. Stat. Comput. Simul. (2023). https://doi.org/10.1080/00949655.2023.2242550
Article Google Scholar
Lu, S., Zhao, J., Wang, H.: Md-mbpls: a novel explanatory model in computational social science. Knowl.-Based Syst. 223, 107023 (2021)
Article Google Scholar
Lunga, D., Ersoy, O.: Kent mixture model for classification of remote sensing data on spherical manifolds. In 2011 IEEE Applied Imagery Pattern Recognition Workshop (AIPR), pages 1–7. IEEE, 2011
Maji, S., Berg, A.C., Malik, J.: Efficient classification for additive kernel svms. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 66–77 (2013). https://doi.org/10.1109/TPAMI.2012.62
Article Google Scholar
Martín-Fernández, J.A., Barceló-Vidal, C., Pawlowsky-Glahn, V.: Dealing with zeros and missing values in compositional data sets using nonparametric imputation. Math. Geol. 35, 253–278 (2003)
Article Google Scholar
Matuk, J., Bharath, K., Chkrebtii, O., Kurtek, S.: Bayesian framework for simultaneous registration and estimation of noisy, sparse, and fragmented functional data. J. Am. Stat. Assoc. (2021). https://doi.org/10.1080/01621459.2021.1893179
Article Google Scholar
Napier, G., Neocleous, T., Nobile, A.: A composite bayesian hierarchical model of compositional data with zeros. J. Chemom. 29(2), 96–108 (2015). https://doi.org/10.1002/cem.2681
Article Google Scholar
Neocleous, T., Aitken, C., Zadora, G.: Transformations for compositional data with zeros with an application to forensic evidence evaluation. Chemom. Intell. Lab. Syst. 109, 77–85 (2011). https://doi.org/10.1016/j.chemolab.2011.08.003
Article Google Scholar
Paine, P., Preston, S.P., Tsagris, M., Wood, A.T.: An elliptically symmetric angular gaussian distribution. Stat. Comput. 28(3), 689–697 (2018)
Article MathSciNet Google Scholar
Pandolfo, G., D’Ambrosio, A.: Depth-based classification of directional data. Expert Syst. Appl. 169, 114433 (2021). https://doi.org/10.1016/j.eswa.2020.114433
Article Google Scholar
Pawlowsky-Glahn, V., Buccianti, A.: editors. Compositional data analysis: Theory and applications. John Wiley & Sons, 2011a
Pawlowsky-Glahn, V., Buccianti, A.: Compositional data analysis. Wiley Online Library, 2011b
Peng, Q., Lin, X., Shi, H., Bao, J., Li, X., Zhuang, Y.: A support vector machine classification-based signal detection method in ultrahigh-frequency radio frequency identification systems. IEEE Trans. Industr. Inf. 17(7), 4646–4656 (2021). https://doi.org/10.1016/10.1109/TII.2020.3015241
Article Google Scholar
Pipino, L.L., Lee, Y.W., Wang, R.Y.: Data quality assessment. Commun. ACM 45(4), 211–218 (2002)
Article Google Scholar
Rasmussen, C.L., Palarea-Albaladejo, J., Johansson, M.S., Crowley, P., Stevens, M.L., Gupta, N., Karstad, K., Holtermann, A.: Zero problems with compositional data of physical behaviors: a comparison of three zero replacement methods. Int. J. Behav. Nutr. Phys. Act. 17, 126 (2020). https://doi.org/10.1186/s12966-020-01029-z
Article Google Scholar
Rieser, C., Filzmoser, P.: Extending compositional data analysis from a graph signal processing perspective. J. Multiv. Anal. 198, 105209 (2023). https://doi.org/10.1016/j.jmva.2023.105209
Article MathSciNet Google Scholar
Scealy, J., Welsh, A.H.: Fitting kent models to compositional data with small concentration. Stat. Comput. 24(2), 165–179 (2014). https://doi.org/10.1007/s11222-012-9361-5
Article MathSciNet Google Scholar
Scealy, J., de Caritat, P., Grunsky, E.C., Tsagris, M.T., Welsh, A.: Robust principal component analysis for power transformed compositional data. J. Am. Stat. Assoc. 110(509), 136–148 (2015). https://doi.org/10.1080/01621459.2014.990563
Article MathSciNet Google Scholar
Scealy, J.L., Welsh, A.H.: Regression for compositional data by using distributions defined on the hypersphere. J. Royal Stat. Soc. Ser. B-Stat. Methodol. 73(3), 351–375 (2011). https://doi.org/10.1111/j.1467-9868.2010.00766.x
Article MathSciNet Google Scholar
Stephens, M.A.: Use of the von mises distribution to analyse continuous proportions. Biometrika 69(1), 197–203 (1982). https://doi.org/10.1093/biomet/69.1.197
Article MathSciNet Google Scholar
Stewart, C., Field, C.: Managing the essential zeros in quantitative fatty acid signature analysis. J. Agric. Biol. Environ. Stat. 16(1), 45–69 (2011). https://doi.org/10.1007/s13253-010-0040-8
Article MathSciNet Google Scholar
Taghia, J., Ma, Z., Leijon, A.: Bayesian estimation of the von-mises fisher mixture model with variational inference. IEEE Trans. Pattern Anal. Mach. Intell. 36(9), 1701–1715 (2014). https://doi.org/10.1109/TPAMI.2014.2306426
Templ, M., Hron, K., Filzmoser, P.: Exploratory tools for outlier detection in compositional data with structural zeros. J. Appl. Stat. 44(4), 734–752 (2017). https://doi.org/10.1080/02664763.2016.1182135
Article MathSciNet Google Scholar
Tian-Tsong, N., Shih-Fu, C., Jessie, H., Martin, P.: Columbia photographic images and photorealistic computer graphics dataset. Technical Report 205-2004-5, ADVENT, Columbia University, 2004
Tsagris, M., Preston, S., Wood, A.T.: Improved classification for compositional data using the $\alpha $-transformation. J. Classif. 33(2), 243–261 (2016). https://doi.org/10.1007/s00357-016-9207-5
Article MathSciNet Google Scholar
Tsilimigras, M.C., Fodor, A.A.: Compositional data analysis of the microbiome: fundamentals, tools, and challenges. Ann. Epidemiol. 26(5), 330–335 (2016)
Article Google Scholar
von Eynatten, H., Barceló-Vidal, C., Pawlowsky-Glahn, V.: Composition and discrimination of sandstones: a statistical evaluation of different analytical methods. J. Sediment. Res. 73(1), 47–57 (2003). https://doi.org/10.1306/070102730047
Article Google Scholar
Wang, H., Liu, Q., Mok, H.M., Fu, L., Tse, W.M.: A hyperspherical transformation forecasting model for compositional data. Eur. J. Oper. Res. 179(2), 459–468 (2007). https://doi.org/10.1016/j.ejor.2006.03.039
Article Google Scholar
Wang, H., Meng, J., Tenenhaus, M.: Regression modelling analysis on compositional data. In Handbook of Partial Least Squares, pages 381–406. Springer, 2010
Wang, H., Gu, J., Wang, S.: An effective intrusion detection framework based on svm with feature augmentation. Knowl.-Based Syst. 136, 130–139 (2017). https://doi.org/10.1016/j.knosys.2017.09.014
Article Google Scholar
Wang, H., Lu, S., Zhao, J.: Aggregating multiple types of complex data in stock market prediction: a model-independent framework. Knowl. Based Syst. 164, 193–204 (2019). https://doi.org/10.1016/j.knosys.2018.10.035
Article Google Scholar
Weiss, S., Xu, Z.Z., Peddada, S., Amir, A., Bittinger, K., Gonzalez, A., Lozupone, C., Zaneveld, J.R., Vázquez-Baeza, Y., Birmingham, A., et al.: Normalization and microbial differential abundance strategies depend upon data characteristics. Microbiome 5, 27 (2017). https://doi.org/10.1186/s40168-017-0237-y
Article Google Scholar

Download references

Funding

This study is funded by National Natural Science Foundation of China (Nos. 72371257, 72001222, 71873012). RG is partially supported by Humanities and Social Science General Program of the Ministry of Education of China (No. 23YJC910002). SL thanks the support from Jing Ying Scholar Support Program in Central University of Finance and Economics (CUFE) and is a member of Financial Sustainable Development Research Team in CUFE. SL, WW and RG also thank the support from Program for Innovation Research, the “Double First-Class” Disciplinary Project and the Disciplinary Funding in CUFE.

Author information

Authors and Affiliations

School of Statistics and Mathematics, Central University of Finance and Economics, Beijing, China
Shan Lu, Wenjing Wang & Rong Guan

Authors

Shan Lu
View author publications
You can also search for this author in PubMed Google Scholar
Wenjing Wang
View author publications
You can also search for this author in PubMed Google Scholar
Rong Guan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

SL: Conceptualization; Methodology; Formal analysis; Writing—original draft; Writing—review & editing. WW: Formal analysis; Writing—review & editing. RG: Conceptualization; Methodology; Writing—review & editing.

Corresponding author

Correspondence to Rong Guan.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lu, S., Wang, W. & Guan, R. Kent feature embedding for classification of compositional data with zeros. Stat Comput 34, 69 (2024). https://doi.org/10.1007/s11222-024-10382-z

Download citation

Received: 20 October 2023
Accepted: 05 January 2024
Published: 31 January 2024
DOI: https://doi.org/10.1007/s11222-024-10382-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Kent feature embedding for classification of compositional data with zeros

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Novel Approach for Exploring Data-Driven Nutritional Insights Using Clustering and Dimensionality Reduction Techniques

Can the Compositional Nature of Compositional Data Be Ignored by Using Deep Learning Approaches?

Convex clustering method for compositional data modeling

Data availibility statement

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Kent feature embedding for classification of compositional data with zeros

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Novel Approach for Exploring Data-Driven Nutritional Insights Using Clustering and Dimensionality Reduction Techniques

Can the Compositional Nature of Compositional Data Be Ignored by Using Deep Learning Approaches?

Convex clustering method for compositional data modeling

Explore related subjects

Data availibility statement

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation