Principal Components Analysis Based Imputation for Logistic Regression

Nguyen, Thuong H. T.; Le, Bao; Nguyen, Phuc; Tran, Linh G. H.; Nguyen, Thu; Nguyen, Binh T.

doi:10.1007/978-3-031-36819-6_3

Thuong H. T. Nguyen^12,13,
Bao Le^11,12,13,
Phuc Nguyen^12,13,
Linh G. H. Tran^12,13,
Thu Nguyen¹⁴ &
…
Binh T. Nguyen^11,12,13

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13925))

Included in the following conference series:

International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems

342 Accesses
1 Citations

Abstract

The field of AI and machine learning is constantly evolving, and as the size of data continues to grow, so does the need for accurate and efficient methods of data processing. However, the data is not always perfect, and missing data is becoming common and occurs more frequently. Therefore, imputation techniques, aside from precision, needed to be scalable. For that reason, we examine the performance of Principle Components Analysis Imputation (PCAI) [9], an imputation speeding up framework, for logistic regression. Note that the coefficients of a logistic regression model are usually used for interpretation. Therefore, in addition to examining the improvement in the speed of PCAI, we examine how the coefficients of fitted logistic regression models change when using this imputation speeding-up mechanism. To demonstrate the efficiency of the mentioned method, the model’s performance is compared against frequently used imputation methods on three popular datasets: Fashion MNIST, Gene, and Parkinson. And achieves lower time and better accuracy in most experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

A principal-component missing-data method for multiple regression models. System Development Corporation (1959)
Google Scholar
Al-helali, B., Chen, Q., Xue, B., Zhang, M.: A new imputation method based on genetic programming and weighted KNN for symbolic regression with incomplete data. Soft Comput. 25, 1–20 (2021)
Article Google Scholar
Bansal, P., Deshpande, P., Sarawagi, S.: Missing value imputation on multidimensional time series. CoRR, abs/2103.01600 (2021)
Google Scholar
Fortuny-Folch, A., Arteaga, F., Ferrer, A.: PCA model building with missing data: new proposals and a comparative study. System Development Corporation (2015)
Google Scholar
Garg, A., Naryani, D., Aggarwal, G., Aggarwal, S.: DL-GSA: a deep learning metaheuristic approach to missing data imputation. In: Tan, Y., Shi, Y., Tang, Q. (eds.) ICSI 2018. LNCS, vol. 10942, pp. 513–521. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93818-9_49
Chapter Google Scholar
Guyon, I., Li, J., Mader, T., Pletscher, P.A., Schneider, G., Uhr, M.: Competitive baseline methods set new standards for the NIPS 2003 feature selection benchmark. Pattern Recogn. Lett. 28(12), 1438–1444 (2007)
Article Google Scholar
Lipton, Z.C., Kale, D.C., Wetzel, R., et al.: Modeling missing data in clinical time series with RNNs. Mach. Learn. Healthcare 56, 253–270 (2016)
Google Scholar
Mazumder, R., Hastie, T., Tibshirani, R.: Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res. 11(80), 2287–2322 (2010)
MathSciNet MATH Google Scholar
Nguyen, T., Ly, H.T., Riegler, M.A., Halvorsen, P.: Principal component analysis based frameworks for efficient missing data imputation algorithms (2022)
Google Scholar
Nguyen, T., Nguyen, D.H., Nguyen, H., Nguyen, B.T., Wade, B.A.: EPEM: efficient parameter estimation for multiple class monotone missing data. Inf. Sci. 567, 1–22 (2021)
Article MathSciNet Google Scholar
Nguyen, T., Nguyen-Duy, K.M., Nguyen, D.H.M., Nguyen, B.T., Wade, B.A.: DPER: direct parameter estimation for randomly missing data. Knowl.-Based Syst. 240, 108082 (2022)
Article Google Scholar
Stekhoven, D.J., Bühlmann, P.: MissForest-non-parametric missing value imputation for mixed-type data. Bioinformatics 28(1), 112–118 (2012)
Article Google Scholar
Tolles, J., Meurer, W.J.: Logistic regression: relating patient characteristics to outcomes. JAMA 316(5), 533–534 (2016)
Article Google Scholar
van Buuren, S., Groothuis-Oudshoorn, K.: Mice: multivariate imputation by chained equations in R. J. Stat. Softw. 45(3), 1–67 (2011)
Article Google Scholar
Vu, M.A., et al.: Conditional expectation for missing data imputation. arXiv preprint arXiv:2302.00911 (2023)
Woźnica, K., Biecek, P.: Does imputation matter? Benchmark for predictive models (2020)
Google Scholar
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)
Yoon, J., Jordon, J., van der Schaar, M.: Gain: missing data imputation using generative adversarial nets (2018)
Google Scholar

Download references

Acknowledgments

We want to thank the University of Science, Vietnam National University in Ho Chi Minh City, and AISIA Research Lab in Vietnam for supporting us throughout this paper.

Funding

This research is funded by Vietnam National University Ho Chi Minh City (VNU-HCM) in Ho Chi Minh City, Vietnam under the grant number DS2023-18-01.

Author information

Authors and Affiliations

AISIA Research Lab, Ho Chi Minh City, Vietnam
Bao Le & Binh T. Nguyen
University of Science, Ho Chi Minh City, Vietnam
Thuong H. T. Nguyen, Bao Le, Phuc Nguyen, Linh G. H. Tran & Binh T. Nguyen
Vietnam National University, Ho Chi Minh City, Vietnam
Thuong H. T. Nguyen, Bao Le, Phuc Nguyen, Linh G. H. Tran & Binh T. Nguyen
Simula Metropolitan, Oslo, Norway
Thu Nguyen

Authors

Thuong H. T. Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Bao Le
View author publications
You can also search for this author in PubMed Google Scholar
Phuc Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Linh G. H. Tran
View author publications
You can also search for this author in PubMed Google Scholar
Thu Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Binh T. Nguyen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Binh T. Nguyen .

Editor information

Editors and Affiliations

Universiti Teknologi Malaysia, Kuala Lumpur, Malaysia
Hamido Fujita
Shanghai University of Finance and Economics, Shanghai, China
Yinglin Wang
Fudan University, Shanghai, China
Yanghua Xiao
Texas State University, San Marcos, TX, USA
Ali Moonis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, T.H.T., Le, B., Nguyen, P., Tran, L.G.H., Nguyen, T., Nguyen, B.T. (2023). Principal Components Analysis Based Imputation for Logistic Regression. In: Fujita, H., Wang, Y., Xiao, Y., Moonis, A. (eds) Advances and Trends in Artificial Intelligence. Theory and Applications. IEA/AIE 2023. Lecture Notes in Computer Science(), vol 13925. Springer, Cham. https://doi.org/10.1007/978-3-031-36819-6_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-36819-6_3
Published: 19 July 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36818-9
Online ISBN: 978-3-031-36819-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Principal Components Analysis Based Imputation for Logistic Regression