Abstract
With the emergence of many knowledge-based systems worldwide, there have been more and more applications using different kinds of data and solving significant daily problems. Among that, the issues of missing data in such systems have become more popular, especially in data-driven areas. Other research on the imputation problem has dealt with partial and missing data. This study aims to investigate the imputation techniques for sparse data using the Singular Value Decomposition technique, namely SVDI. We explore the application of the SVDI framework for image classification and text classification tasks that involve sparse data. The experimental results show that the proposed SVDI method improves the speed and accuracy of the imputation process when compared to the PCAI method. We aim to publish our codes related to the SVDI later for the relevant research community.
Supported by Vietnam National University Ho Chi Minh City under the grant number DS2023-18-01.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alter, O., Brown, P.: Processing and modeling genome-wide expression data using singular value decomposition. In: Proceedings of SPIE - The International Society for Optical Engineering, vol. 4266 (2001)
Awan, S.E., Bennamoun, M., Sohel, F., Sanfilippo, F., Dwivedi, G.: Imputation of missing data with class imbalance using conditional generative adversarial networks. Neurocomputing 453, 164–171 (2021)
Berry, M., Dumais, S., Gavin, W.: O’brien, using linear algebra for intelligent information retrieval. SIAM Rev. 37, 573–595 (1995)
van Buuren, S., Groothuis-Oudshoorn, K.: mice: multivariate imputation by chained equations in R. J. Stat. Softw. 45(3), 1–67 (2011). https://doi.org/10.18637/jss.v045.i03. https://www.jstatsoft.org/index.php/jss/article/view/v045i03
García-Peña, M., Arciniegas-Alarcón, S., Krzanowski, W.J., Duarte, D.: Missing-value imputation using the robust singular-value decomposition: proposals and numerical evaluation. Crop Sci. 61(5), 3288–3300 (2021)
Gelman, A., Hill, J.: Data analysis using regression and multilevel/hierarchical models (2007)
Hassan, G.S., Ali, N.J., Abdulsahib, A.K., Mohammed, F.J., Gheni, H.M.: A missing data imputation method based on salp swarm algorithm for diabetes disease. Bull. Electric. Eng. Inf. 12(3), 1700–1710 (2023)
Huang, J., Shen, H., Buja, A.: The analysis of two-way functional data using two-way regularized singular value decompositions. J. Am. Stat. Assoc. 104, 1609–1620 (2009)
Jafrasteh, B., Hernández-Lobato, D., Lubián-López, S.P., Benavente-Fernández, I.: Gaussian processes for missing value imputation (2022)
Jerez, J.M., et al.: Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif. Intell. Med. 50(2), 105–115 (2010)
Khan, S.I., Hoque, A.S.M.L.: SICE: an improved missing data imputation technique. J. Big Data 7(1), 1–21 (2020)
Lakshminarayan, K., Harp, S.A., Goldman, R.P., Samad, T., et al.: Imputation of missing data using machine learning techniques. In: KDD, vol. 96 (1996)
Little, R., Rubin, D.: Regression with missing XS - a review. J. Am. Stat. Assoc. 87, 1227–1237 (1992)
Little, R., Rubin, D.: Modeling the drop-out mechanism in repeated-measures studies. J. Am. Stat. Assoc. 90, 1112–1121 (1995)
Little, R., Rubin, D.: Statistical analysis with missing data (2014)
Liu, M., et al.: Handling missing values in healthcare data: a systematic review of deep learning-based imputation techniques. Artif. Intell. Med., 102587 (2023)
Gondara, L., Wang, K.: MIDA: multiple imputation using denoising autoencoders. In: Phung, D., Tseng, V.S., Webb, G.I., Ho, B., Ganji, M., Rashidi, L. (eds.) PAKDD 2018. LNCS (LNAI), vol. 10939, pp. 260–272. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93040-4_21
Lu, C., Zhu, C., Xu, C., Yan, S., Lin, Z.: Generalized singular value thresholding. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015)
Lüdtke, O., Robitzsch, A., Grund, S.: Multiple imputation of missing data in multilevel designs: a comparison of different strategies. Psychol. Methods 22(1), 141 (2017)
Malarvizhi, R., Thanamani, A.S.: K-nearest neighbor in missing data imputation. Int. J. Eng. Res. Dev. 5(1), 5–7 (2012)
Mazumder, R., Hastie, T., Tibshirani, R.: Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res. 11(80), 2287–2322 (2010). http://jmlr.org/papers/v11/mazumder10a.html
Musil, C.M., Warner, C.B., Yobas, P.K., Jones, S.L.: A comparison of imputation techniques for handling missing data. West. J. Nurs. Res. 24(7), 815–829 (2002)
Nguyen, H.D., Sakama, C., Sato, T., Inoue, K.: Computing logic programming semantics in linear algebra. In: Kaenampornpan, M., Malaka, R., Nguyen, D.D., Schwind, N. (eds.) MIWAI 2018. LNCS (LNAI), vol. 11248, pp. 32–48. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03014-8_3
Nguyen, H.D., Sakama, C., Sato, T., Inoue, K.: An efficient reasoning method on logic programming using partial evaluation in vector spaces. J. Log. Comput. 31(5), 1298–1316 (2021)
Nguyen, T., Nguyen, D.H., Nguyen, H., Nguyen, B.T., Wade, B.A.: EPEM: efficient parameter estimation for multiple class monotone missing data. Inf. Sci. 567, 1–22 (2021)
Nguyen, T., Nguyen-Duy, K.M., Nguyen, D.H.M., Nguyen, B.T., Wade, B.A.: DPER: direct parameter estimation for randomly missing data. Knowl.-Based Syst. 240, 108082 (2022)
Nguyen, V., Tran, N., Nguyen, H., et al.: KTFEv2: multimodal facial emotion database and its analysis. IEEE Access 11, 17811–17822 (2023)
Rubin, D.: Inference and missing data. Biometrika 63, 5781–590 (1976)
Prasantha, H.S., Shashidhara, H.L., Murthy, K.B.: Image compression using SVD. In: International Conference on Computational Intelligence and Multimedia Applications, pp. 143–145 (2008)
Suthar, B., Patel, H., Goswami, A.: A survey: classification of imputation methods in data mining. Int. J. Emerg. Technol. Adv. Eng. 2(1), 309–12 (2012)
Wang, S., Liu, Z., Lv, S., et al.: A natural visible and infrared facial expression database for expression recognition and emotion inference. IEEE Trans. Multimedia 12(7), 682–691 (2010)
Woźnica, K., Biecek, P.: Does imputation matter? benchmark for predictive models. In: 37th International Conference on Machine Learning (2020)
Yang, D., Ma, Z., Buja, A.: A sparse SVD method for high-dimensional data. J. Comput. Graph. Stat. 23, 923–942 (2014)
Yoon, J., Jordon, J., van der Schaar, M.: Gain: missing data imputation using generative adversarial nets (2018)
Zhai, R., Gutman, R.: A Bayesian singular value decomposition procedure for missing data imputation. J. Comput. Graph. Stat., 1–13 (2022)
Acknowledgments
This research is funded by Vietnam National University Ho Chi Minh City in Vietnam under the funding/grant number DS2023-18-01.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Nguyen, P. et al. (2023). Faster Imputation Using Singular Value Decomposition for Sparse Data. In: Nguyen, N.T., et al. Intelligent Information and Database Systems. ACIIDS 2023. Lecture Notes in Computer Science(), vol 13995. Springer, Singapore. https://doi.org/10.1007/978-981-99-5834-4_11
Download citation
DOI: https://doi.org/10.1007/978-981-99-5834-4_11
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-5833-7
Online ISBN: 978-981-99-5834-4
eBook Packages: Computer ScienceComputer Science (R0)