ABSTRACT
Because the Aitchison simplex spatial data is limited by bounded and definite sum constraints, the data generally does not meet the multivariate normal distribution, and there is a strict or approximate linear relationship between some variables, it is very difficult to establish its data model. In multiple regression analysis, a small change in the sample attribute value will greatly disturb the estimated value of the regression coefficient, resulting in the extremely unstable regression coefficient, and the existing general statistical analysis methods cannot be used to properly interpret and process the data. To solve this problem, based on the relevant definitions of complete algebraic operations in Aitchison simplex space, this paper proposes a filling method based on missing data in simplex space: firstly, the k-means method is used for initial filling in simplex space, then the equidistant logarithm ratio transformation is carried out, and finally the principal component method is used to correct the initial filling value. The example results show that the effect of using the principal component correction filling method based on the proposed complete algebraic operation system of simplex space is better than that of other filling methods.
- Aitchison, J. (1986) The Statistical Analysis of Compositional Data. Chapman and Hall, London.Google ScholarDigital Library
- Buccianti, A. and Pawlowsky-Glahn, V. (2005) New Perspectives on Water Chemistry and Compositional Data Analysis. Mathematical Geology, 37, 703--727. https://doi.org/10.1007/s11004-005-7376-6Google Scholar
- Jarautabragulat, E., Hervadasala, C., Egozcue, J.J., et al. (2015) Air Quality Index Revisited from a Compositional Point of View. Mathematical Geosciences, 48, 581--593. https://doi.org/10.1007/s11004-015-9599-5Google ScholarCross Ref
- Snyder, R.D., Ord, K., Koehler, A.B., et al. (2015) Fore-casting Compositional Time Series: A State Space Approach. Monash Econometrics and Business Statistics Working Papers, Monash University.Google Scholar
- Billheimer, D., Guttorp, P. and Fagan, W.F. (1998) Statistical Analysis and Interpretation of Discrete Compositional Data. National Center for Statistics and the Environment (NRCSE) Technical Report NRCSE-TRS.Google Scholar
- Zhang Yaoting. Introduction to statistical analysis of component data [M]. Beijing: Science Press, 2000.Google Scholar
- Pawlowsky-Glahn, V., Egozcue, J.J. and Tolosana-Delgado, R. (2015) Modeling and Analysis of Compositional Data. John Wiley & Sons, Ltd.Google ScholarCross Ref
- Kynclová, P., Filzmoser, P. and Hron, K. (2015) Modeling Compositional Time Series with Vector Autoregressive Models. Journal of Fore-casting, 34, 303--314. https://doi.org/10.1002/for.2336Google Scholar
- Guo Lijuan, Wang Huiwen, Guan Rong. Discriminant analysis of component data based on isometric logarithm transformation [J]. Systems engineering, 2016, 34 (2): 153--158.Google Scholar
- Aitchison, J, Barceló-Vidal, C., Egozcue, J.J., et al. (2002) A Concise Guide to the Algebraic-Geometric Structure of the Simplex, the Sample Space for Composi-tional Data Analysis. Proceedings of IAMG, 2, 387--392.Google Scholar
- Aitchison J. The Statistical Analysis of Compositional Data[M]. New York: Chapman and Hall, 1986.Google ScholarDigital Library
- Hron K, Tempi M, Filzmoser P. Imputation of missing values for compositional data using classical and robust methods [J]. Comput Statist. Data Anal, 2010, 54(12): 3095--3107.Google ScholarDigital Library
- Egozcue J J, Pawlowsky-Glahn V, Mateu-Figueras G, et al. Isometric logratio transformations for compositional data analysis [J]. Math. GeoL, 2003, 35(3): 279--300.Google ScholarCross Ref
- Wang Songgui, Shi Jianhong, Yi suju, et al. Introduction to linear model [M]. Beijing: Science Press, 2004.Google Scholar
- Wang Xing. Nonparametric statistics [M]. Beijing: Tsinghua University Press, 2013.Google Scholar
- Yoon D, Lee E K, Park T. Robust imputation method for missing values in microarray data [J]. BMC Bioinformatics, 2007, 8(Suppl 2): S6, 1--7.Google ScholarCross Ref
- Filzmoser P, Hron K, Reimann C. Principal component analysis for compositional data with outliers[J]. Environmetrics, 2009, 20(6): 621--632.Google ScholarCross Ref
Index Terms
- Missing data filling method based on Aitchison simplex space
Recommendations
Principal component analysis for data containing outliers and missing elements
Two approaches are presented to perform principal component analysis (PCA) on data which contain both outlying cases and missing elements. At first an eigendecomposition of a covariance matrix which can deal with such data is proposed, but this approach ...
A reinforcement learning-based approach for imputing missing data
AbstractMissing data is a major problem in real-world datasets, which hinders the performance of data analytics. Conventional data imputation schemes such as univariate single imputation replace missing values in each column with the same approximated ...
Principal component analysis for compositional data vectors
Since Aitchison's founding research work, compositional data analysis has attracted growing attention in recent decades. As a powerful technique for exploratory analysis, principal component analysis (PCA) has been extended to compositional data. ...
Comments