Abstract
The field of AI and machine learning is constantly evolving, and as the size of data continues to grow, so does the need for accurate and efficient methods of data processing. However, the data is not always perfect, and missing data is becoming common and occurs more frequently. Therefore, imputation techniques, aside from precision, needed to be scalable. For that reason, we examine the performance of Principle Components Analysis Imputation (PCAI) [9], an imputation speeding up framework, for logistic regression. Note that the coefficients of a logistic regression model are usually used for interpretation. Therefore, in addition to examining the improvement in the speed of PCAI, we examine how the coefficients of fitted logistic regression models change when using this imputation speeding-up mechanism. To demonstrate the efficiency of the mentioned method, the model’s performance is compared against frequently used imputation methods on three popular datasets: Fashion MNIST, Gene, and Parkinson. And achieves lower time and better accuracy in most experiments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
A principal-component missing-data method for multiple regression models. System Development Corporation (1959)
Al-helali, B., Chen, Q., Xue, B., Zhang, M.: A new imputation method based on genetic programming and weighted KNN for symbolic regression with incomplete data. Soft Comput. 25, 1–20 (2021)
Bansal, P., Deshpande, P., Sarawagi, S.: Missing value imputation on multidimensional time series. CoRR, abs/2103.01600 (2021)
Fortuny-Folch, A., Arteaga, F., Ferrer, A.: PCA model building with missing data: new proposals and a comparative study. System Development Corporation (2015)
Garg, A., Naryani, D., Aggarwal, G., Aggarwal, S.: DL-GSA: a deep learning metaheuristic approach to missing data imputation. In: Tan, Y., Shi, Y., Tang, Q. (eds.) ICSI 2018. LNCS, vol. 10942, pp. 513–521. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93818-9_49
Guyon, I., Li, J., Mader, T., Pletscher, P.A., Schneider, G., Uhr, M.: Competitive baseline methods set new standards for the NIPS 2003 feature selection benchmark. Pattern Recogn. Lett. 28(12), 1438–1444 (2007)
Lipton, Z.C., Kale, D.C., Wetzel, R., et al.: Modeling missing data in clinical time series with RNNs. Mach. Learn. Healthcare 56, 253–270 (2016)
Mazumder, R., Hastie, T., Tibshirani, R.: Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res. 11(80), 2287–2322 (2010)
Nguyen, T., Ly, H.T., Riegler, M.A., Halvorsen, P.: Principal component analysis based frameworks for efficient missing data imputation algorithms (2022)
Nguyen, T., Nguyen, D.H., Nguyen, H., Nguyen, B.T., Wade, B.A.: EPEM: efficient parameter estimation for multiple class monotone missing data. Inf. Sci. 567, 1–22 (2021)
Nguyen, T., Nguyen-Duy, K.M., Nguyen, D.H.M., Nguyen, B.T., Wade, B.A.: DPER: direct parameter estimation for randomly missing data. Knowl.-Based Syst. 240, 108082 (2022)
Stekhoven, D.J., Bühlmann, P.: MissForest-non-parametric missing value imputation for mixed-type data. Bioinformatics 28(1), 112–118 (2012)
Tolles, J., Meurer, W.J.: Logistic regression: relating patient characteristics to outcomes. JAMA 316(5), 533–534 (2016)
van Buuren, S., Groothuis-Oudshoorn, K.: Mice: multivariate imputation by chained equations in R. J. Stat. Softw. 45(3), 1–67 (2011)
Vu, M.A., et al.: Conditional expectation for missing data imputation. arXiv preprint arXiv:2302.00911 (2023)
Woźnica, K., Biecek, P.: Does imputation matter? Benchmark for predictive models (2020)
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)
Yoon, J., Jordon, J., van der Schaar, M.: Gain: missing data imputation using generative adversarial nets (2018)
Acknowledgments
We want to thank the University of Science, Vietnam National University in Ho Chi Minh City, and AISIA Research Lab in Vietnam for supporting us throughout this paper.
Funding
This research is funded by Vietnam National University Ho Chi Minh City (VNU-HCM) in Ho Chi Minh City, Vietnam under the grant number DS2023-18-01.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Nguyen, T.H.T., Le, B., Nguyen, P., Tran, L.G.H., Nguyen, T., Nguyen, B.T. (2023). Principal Components Analysis Based Imputation for Logistic Regression. In: Fujita, H., Wang, Y., Xiao, Y., Moonis, A. (eds) Advances and Trends in Artificial Intelligence. Theory and Applications. IEA/AIE 2023. Lecture Notes in Computer Science(), vol 13925. Springer, Cham. https://doi.org/10.1007/978-3-031-36819-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-36819-6_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36818-9
Online ISBN: 978-3-031-36819-6
eBook Packages: Computer ScienceComputer Science (R0)