Skip to main content

Principal Components Analysis Based Imputation for Logistic Regression

  • Conference paper
  • First Online:
Advances and Trends in Artificial Intelligence. Theory and Applications (IEA/AIE 2023)

Abstract

The field of AI and machine learning is constantly evolving, and as the size of data continues to grow, so does the need for accurate and efficient methods of data processing. However, the data is not always perfect, and missing data is becoming common and occurs more frequently. Therefore, imputation techniques, aside from precision, needed to be scalable. For that reason, we examine the performance of Principle Components Analysis Imputation (PCAI) [9], an imputation speeding up framework, for logistic regression. Note that the coefficients of a logistic regression model are usually used for interpretation. Therefore, in addition to examining the improvement in the speed of PCAI, we examine how the coefficients of fitted logistic regression models change when using this imputation speeding-up mechanism. To demonstrate the efficiency of the mentioned method, the model’s performance is compared against frequently used imputation methods on three popular datasets: Fashion MNIST, Gene, and Parkinson. And achieves lower time and better accuracy in most experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. A principal-component missing-data method for multiple regression models. System Development Corporation (1959)

    Google Scholar 

  2. Al-helali, B., Chen, Q., Xue, B., Zhang, M.: A new imputation method based on genetic programming and weighted KNN for symbolic regression with incomplete data. Soft Comput. 25, 1–20 (2021)

    Article  Google Scholar 

  3. Bansal, P., Deshpande, P., Sarawagi, S.: Missing value imputation on multidimensional time series. CoRR, abs/2103.01600 (2021)

    Google Scholar 

  4. Fortuny-Folch, A., Arteaga, F., Ferrer, A.: PCA model building with missing data: new proposals and a comparative study. System Development Corporation (2015)

    Google Scholar 

  5. Garg, A., Naryani, D., Aggarwal, G., Aggarwal, S.: DL-GSA: a deep learning metaheuristic approach to missing data imputation. In: Tan, Y., Shi, Y., Tang, Q. (eds.) ICSI 2018. LNCS, vol. 10942, pp. 513–521. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93818-9_49

    Chapter  Google Scholar 

  6. Guyon, I., Li, J., Mader, T., Pletscher, P.A., Schneider, G., Uhr, M.: Competitive baseline methods set new standards for the NIPS 2003 feature selection benchmark. Pattern Recogn. Lett. 28(12), 1438–1444 (2007)

    Article  Google Scholar 

  7. Lipton, Z.C., Kale, D.C., Wetzel, R., et al.: Modeling missing data in clinical time series with RNNs. Mach. Learn. Healthcare 56, 253–270 (2016)

    Google Scholar 

  8. Mazumder, R., Hastie, T., Tibshirani, R.: Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res. 11(80), 2287–2322 (2010)

    MathSciNet  MATH  Google Scholar 

  9. Nguyen, T., Ly, H.T., Riegler, M.A., Halvorsen, P.: Principal component analysis based frameworks for efficient missing data imputation algorithms (2022)

    Google Scholar 

  10. Nguyen, T., Nguyen, D.H., Nguyen, H., Nguyen, B.T., Wade, B.A.: EPEM: efficient parameter estimation for multiple class monotone missing data. Inf. Sci. 567, 1–22 (2021)

    Article  MathSciNet  Google Scholar 

  11. Nguyen, T., Nguyen-Duy, K.M., Nguyen, D.H.M., Nguyen, B.T., Wade, B.A.: DPER: direct parameter estimation for randomly missing data. Knowl.-Based Syst. 240, 108082 (2022)

    Article  Google Scholar 

  12. Stekhoven, D.J., Bühlmann, P.: MissForest-non-parametric missing value imputation for mixed-type data. Bioinformatics 28(1), 112–118 (2012)

    Article  Google Scholar 

  13. Tolles, J., Meurer, W.J.: Logistic regression: relating patient characteristics to outcomes. JAMA 316(5), 533–534 (2016)

    Article  Google Scholar 

  14. van Buuren, S., Groothuis-Oudshoorn, K.: Mice: multivariate imputation by chained equations in R. J. Stat. Softw. 45(3), 1–67 (2011)

    Article  Google Scholar 

  15. Vu, M.A., et al.: Conditional expectation for missing data imputation. arXiv preprint arXiv:2302.00911 (2023)

  16. Woźnica, K., Biecek, P.: Does imputation matter? Benchmark for predictive models (2020)

    Google Scholar 

  17. Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)

  18. Yoon, J., Jordon, J., van der Schaar, M.: Gain: missing data imputation using generative adversarial nets (2018)

    Google Scholar 

Download references

Acknowledgments

We want to thank the University of Science, Vietnam National University in Ho Chi Minh City, and AISIA Research Lab in Vietnam for supporting us throughout this paper.

Funding

This research is funded by Vietnam National University Ho Chi Minh City (VNU-HCM) in Ho Chi Minh City, Vietnam under the grant number DS2023-18-01.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Binh T. Nguyen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nguyen, T.H.T., Le, B., Nguyen, P., Tran, L.G.H., Nguyen, T., Nguyen, B.T. (2023). Principal Components Analysis Based Imputation for Logistic Regression. In: Fujita, H., Wang, Y., Xiao, Y., Moonis, A. (eds) Advances and Trends in Artificial Intelligence. Theory and Applications. IEA/AIE 2023. Lecture Notes in Computer Science(), vol 13925. Springer, Cham. https://doi.org/10.1007/978-3-031-36819-6_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-36819-6_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-36818-9

  • Online ISBN: 978-3-031-36819-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics