Skip to main content

Faster Imputation Using Singular Value Decomposition for Sparse Data

  • Conference paper
  • First Online:
Intelligent Information and Database Systems (ACIIDS 2023)

Abstract

With the emergence of many knowledge-based systems worldwide, there have been more and more applications using different kinds of data and solving significant daily problems. Among that, the issues of missing data in such systems have become more popular, especially in data-driven areas. Other research on the imputation problem has dealt with partial and missing data. This study aims to investigate the imputation techniques for sparse data using the Singular Value Decomposition technique, namely SVDI. We explore the application of the SVDI framework for image classification and text classification tasks that involve sparse data. The experimental results show that the proposed SVDI method improves the speed and accuracy of the imputation process when compared to the PCAI method. We aim to publish our codes related to the SVDI later for the relevant research community.

Supported by Vietnam National University Ho Chi Minh City under the grant number DS2023-18-01.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews.

  2. 2.

    https://github.com/zalandoresearch/fashion-mnist.

  3. 3.

    http://yann.lecun.com/exdb/mnist/.

References

  1. Alter, O., Brown, P.: Processing and modeling genome-wide expression data using singular value decomposition. In: Proceedings of SPIE - The International Society for Optical Engineering, vol. 4266 (2001)

    Google Scholar 

  2. Awan, S.E., Bennamoun, M., Sohel, F., Sanfilippo, F., Dwivedi, G.: Imputation of missing data with class imbalance using conditional generative adversarial networks. Neurocomputing 453, 164–171 (2021)

    Article  Google Scholar 

  3. Berry, M., Dumais, S., Gavin, W.: O’brien, using linear algebra for intelligent information retrieval. SIAM Rev. 37, 573–595 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  4. van Buuren, S., Groothuis-Oudshoorn, K.: mice: multivariate imputation by chained equations in R. J. Stat. Softw. 45(3), 1–67 (2011). https://doi.org/10.18637/jss.v045.i03. https://www.jstatsoft.org/index.php/jss/article/view/v045i03

  5. García-Peña, M., Arciniegas-Alarcón, S., Krzanowski, W.J., Duarte, D.: Missing-value imputation using the robust singular-value decomposition: proposals and numerical evaluation. Crop Sci. 61(5), 3288–3300 (2021)

    Article  Google Scholar 

  6. Gelman, A., Hill, J.: Data analysis using regression and multilevel/hierarchical models (2007)

    Google Scholar 

  7. Hassan, G.S., Ali, N.J., Abdulsahib, A.K., Mohammed, F.J., Gheni, H.M.: A missing data imputation method based on salp swarm algorithm for diabetes disease. Bull. Electric. Eng. Inf. 12(3), 1700–1710 (2023)

    Google Scholar 

  8. Huang, J., Shen, H., Buja, A.: The analysis of two-way functional data using two-way regularized singular value decompositions. J. Am. Stat. Assoc. 104, 1609–1620 (2009)

    Google Scholar 

  9. Jafrasteh, B., Hernández-Lobato, D., Lubián-López, S.P., Benavente-Fernández, I.: Gaussian processes for missing value imputation (2022)

    Google Scholar 

  10. Jerez, J.M., et al.: Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif. Intell. Med. 50(2), 105–115 (2010)

    Article  Google Scholar 

  11. Khan, S.I., Hoque, A.S.M.L.: SICE: an improved missing data imputation technique. J. Big Data 7(1), 1–21 (2020)

    Article  Google Scholar 

  12. Lakshminarayan, K., Harp, S.A., Goldman, R.P., Samad, T., et al.: Imputation of missing data using machine learning techniques. In: KDD, vol. 96 (1996)

    Google Scholar 

  13. Little, R., Rubin, D.: Regression with missing XS - a review. J. Am. Stat. Assoc. 87, 1227–1237 (1992)

    Google Scholar 

  14. Little, R., Rubin, D.: Modeling the drop-out mechanism in repeated-measures studies. J. Am. Stat. Assoc. 90, 1112–1121 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  15. Little, R., Rubin, D.: Statistical analysis with missing data (2014)

    Google Scholar 

  16. Liu, M., et al.: Handling missing values in healthcare data: a systematic review of deep learning-based imputation techniques. Artif. Intell. Med., 102587 (2023)

    Google Scholar 

  17. Gondara, L., Wang, K.: MIDA: multiple imputation using denoising autoencoders. In: Phung, D., Tseng, V.S., Webb, G.I., Ho, B., Ganji, M., Rashidi, L. (eds.) PAKDD 2018. LNCS (LNAI), vol. 10939, pp. 260–272. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93040-4_21

    Chapter  Google Scholar 

  18. Lu, C., Zhu, C., Xu, C., Yan, S., Lin, Z.: Generalized singular value thresholding. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015)

    Google Scholar 

  19. Lüdtke, O., Robitzsch, A., Grund, S.: Multiple imputation of missing data in multilevel designs: a comparison of different strategies. Psychol. Methods 22(1), 141 (2017)

    Article  Google Scholar 

  20. Malarvizhi, R., Thanamani, A.S.: K-nearest neighbor in missing data imputation. Int. J. Eng. Res. Dev. 5(1), 5–7 (2012)

    Google Scholar 

  21. Mazumder, R., Hastie, T., Tibshirani, R.: Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res. 11(80), 2287–2322 (2010). http://jmlr.org/papers/v11/mazumder10a.html

  22. Musil, C.M., Warner, C.B., Yobas, P.K., Jones, S.L.: A comparison of imputation techniques for handling missing data. West. J. Nurs. Res. 24(7), 815–829 (2002)

    Article  Google Scholar 

  23. Nguyen, H.D., Sakama, C., Sato, T., Inoue, K.: Computing logic programming semantics in linear algebra. In: Kaenampornpan, M., Malaka, R., Nguyen, D.D., Schwind, N. (eds.) MIWAI 2018. LNCS (LNAI), vol. 11248, pp. 32–48. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03014-8_3

    Chapter  Google Scholar 

  24. Nguyen, H.D., Sakama, C., Sato, T., Inoue, K.: An efficient reasoning method on logic programming using partial evaluation in vector spaces. J. Log. Comput. 31(5), 1298–1316 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  25. Nguyen, T., Nguyen, D.H., Nguyen, H., Nguyen, B.T., Wade, B.A.: EPEM: efficient parameter estimation for multiple class monotone missing data. Inf. Sci. 567, 1–22 (2021)

    Article  MathSciNet  Google Scholar 

  26. Nguyen, T., Nguyen-Duy, K.M., Nguyen, D.H.M., Nguyen, B.T., Wade, B.A.: DPER: direct parameter estimation for randomly missing data. Knowl.-Based Syst. 240, 108082 (2022)

    Article  Google Scholar 

  27. Nguyen, V., Tran, N., Nguyen, H., et al.: KTFEv2: multimodal facial emotion database and its analysis. IEEE Access 11, 17811–17822 (2023)

    Article  Google Scholar 

  28. Rubin, D.: Inference and missing data. Biometrika 63, 5781–590 (1976)

    Article  MathSciNet  Google Scholar 

  29. Prasantha, H.S., Shashidhara, H.L., Murthy, K.B.: Image compression using SVD. In: International Conference on Computational Intelligence and Multimedia Applications, pp. 143–145 (2008)

    Google Scholar 

  30. Suthar, B., Patel, H., Goswami, A.: A survey: classification of imputation methods in data mining. Int. J. Emerg. Technol. Adv. Eng. 2(1), 309–12 (2012)

    Google Scholar 

  31. Wang, S., Liu, Z., Lv, S., et al.: A natural visible and infrared facial expression database for expression recognition and emotion inference. IEEE Trans. Multimedia 12(7), 682–691 (2010)

    Article  Google Scholar 

  32. Woźnica, K., Biecek, P.: Does imputation matter? benchmark for predictive models. In: 37th International Conference on Machine Learning (2020)

    Google Scholar 

  33. Yang, D., Ma, Z., Buja, A.: A sparse SVD method for high-dimensional data. J. Comput. Graph. Stat. 23, 923–942 (2014)

    Article  Google Scholar 

  34. Yoon, J., Jordon, J., van der Schaar, M.: Gain: missing data imputation using generative adversarial nets (2018)

    Google Scholar 

  35. Zhai, R., Gutman, R.: A Bayesian singular value decomposition procedure for missing data imputation. J. Comput. Graph. Stat., 1–13 (2022)

    Google Scholar 

Download references

Acknowledgments

This research is funded by Vietnam National University Ho Chi Minh City in Vietnam under the funding/grant number DS2023-18-01.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Binh T. Nguyen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nguyen, P. et al. (2023). Faster Imputation Using Singular Value Decomposition for Sparse Data. In: Nguyen, N.T., et al. Intelligent Information and Database Systems. ACIIDS 2023. Lecture Notes in Computer Science(), vol 13995. Springer, Singapore. https://doi.org/10.1007/978-981-99-5834-4_11

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-5834-4_11

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-5833-7

  • Online ISBN: 978-981-99-5834-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics