skip to main content
10.1145/3501409.3501646acmotherconferencesArticle/Chapter ViewAbstractPublication PageseitceConference Proceedingsconference-collections
research-article

Missing data filling method based on Aitchison simplex space

Authors Info & Claims
Published:31 December 2021Publication History

ABSTRACT

Because the Aitchison simplex spatial data is limited by bounded and definite sum constraints, the data generally does not meet the multivariate normal distribution, and there is a strict or approximate linear relationship between some variables, it is very difficult to establish its data model. In multiple regression analysis, a small change in the sample attribute value will greatly disturb the estimated value of the regression coefficient, resulting in the extremely unstable regression coefficient, and the existing general statistical analysis methods cannot be used to properly interpret and process the data. To solve this problem, based on the relevant definitions of complete algebraic operations in Aitchison simplex space, this paper proposes a filling method based on missing data in simplex space: firstly, the k-means method is used for initial filling in simplex space, then the equidistant logarithm ratio transformation is carried out, and finally the principal component method is used to correct the initial filling value. The example results show that the effect of using the principal component correction filling method based on the proposed complete algebraic operation system of simplex space is better than that of other filling methods.

References

  1. Aitchison, J. (1986) The Statistical Analysis of Compositional Data. Chapman and Hall, London.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Buccianti, A. and Pawlowsky-Glahn, V. (2005) New Perspectives on Water Chemistry and Compositional Data Analysis. Mathematical Geology, 37, 703--727. https://doi.org/10.1007/s11004-005-7376-6Google ScholarGoogle Scholar
  3. Jarautabragulat, E., Hervadasala, C., Egozcue, J.J., et al. (2015) Air Quality Index Revisited from a Compositional Point of View. Mathematical Geosciences, 48, 581--593. https://doi.org/10.1007/s11004-015-9599-5Google ScholarGoogle ScholarCross RefCross Ref
  4. Snyder, R.D., Ord, K., Koehler, A.B., et al. (2015) Fore-casting Compositional Time Series: A State Space Approach. Monash Econometrics and Business Statistics Working Papers, Monash University.Google ScholarGoogle Scholar
  5. Billheimer, D., Guttorp, P. and Fagan, W.F. (1998) Statistical Analysis and Interpretation of Discrete Compositional Data. National Center for Statistics and the Environment (NRCSE) Technical Report NRCSE-TRS.Google ScholarGoogle Scholar
  6. Zhang Yaoting. Introduction to statistical analysis of component data [M]. Beijing: Science Press, 2000.Google ScholarGoogle Scholar
  7. Pawlowsky-Glahn, V., Egozcue, J.J. and Tolosana-Delgado, R. (2015) Modeling and Analysis of Compositional Data. John Wiley & Sons, Ltd.Google ScholarGoogle ScholarCross RefCross Ref
  8. Kynclová, P., Filzmoser, P. and Hron, K. (2015) Modeling Compositional Time Series with Vector Autoregressive Models. Journal of Fore-casting, 34, 303--314. https://doi.org/10.1002/for.2336Google ScholarGoogle Scholar
  9. Guo Lijuan, Wang Huiwen, Guan Rong. Discriminant analysis of component data based on isometric logarithm transformation [J]. Systems engineering, 2016, 34 (2): 153--158.Google ScholarGoogle Scholar
  10. Aitchison, J, Barceló-Vidal, C., Egozcue, J.J., et al. (2002) A Concise Guide to the Algebraic-Geometric Structure of the Simplex, the Sample Space for Composi-tional Data Analysis. Proceedings of IAMG, 2, 387--392.Google ScholarGoogle Scholar
  11. Aitchison J. The Statistical Analysis of Compositional Data[M]. New York: Chapman and Hall, 1986.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Hron K, Tempi M, Filzmoser P. Imputation of missing values for compositional data using classical and robust methods [J]. Comput Statist. Data Anal, 2010, 54(12): 3095--3107.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Egozcue J J, Pawlowsky-Glahn V, Mateu-Figueras G, et al. Isometric logratio transformations for compositional data analysis [J]. Math. GeoL, 2003, 35(3): 279--300.Google ScholarGoogle ScholarCross RefCross Ref
  14. Wang Songgui, Shi Jianhong, Yi suju, et al. Introduction to linear model [M]. Beijing: Science Press, 2004.Google ScholarGoogle Scholar
  15. Wang Xing. Nonparametric statistics [M]. Beijing: Tsinghua University Press, 2013.Google ScholarGoogle Scholar
  16. Yoon D, Lee E K, Park T. Robust imputation method for missing values in microarray data [J]. BMC Bioinformatics, 2007, 8(Suppl 2): S6, 1--7.Google ScholarGoogle ScholarCross RefCross Ref
  17. Filzmoser P, Hron K, Reimann C. Principal component analysis for compositional data with outliers[J]. Environmetrics, 2009, 20(6): 621--632.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Missing data filling method based on Aitchison simplex space

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        EITCE '21: Proceedings of the 2021 5th International Conference on Electronic Information Technology and Computer Engineering
        October 2021
        1723 pages
        ISBN:9781450384322
        DOI:10.1145/3501409

        Copyright © 2021 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 31 December 2021

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        EITCE '21 Paper Acceptance Rate294of531submissions,55%Overall Acceptance Rate508of972submissions,52%
      • Article Metrics

        • Downloads (Last 12 months)5
        • Downloads (Last 6 weeks)1

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader