Abstract
River flow is important in hydrological or meteorological research in order to forecast flood, drought events, as well as day-to-day river basin management and it is hard to do so without both real-time and historical data. This is because of hydrology data is frequently incomplete due to a variety of factors, including data loss. Due to the hydrological data is prone to missing values, imputing the missing data is very important to complete the data. In this study, six imputation methods are used which are Mean Imputation and Median Imputation, Multiple Imputation, Normal Ratio, NIPALS and EM Algorithms. The aims of this study are to impute the missing values in river flow dataset using various imputation methods and to apply the ARIMA model on the original and imputed datasets. The experimental result showed that Multiple Imputation that used MCMC method was deemed the best method as it has the lowest value of RMSE and MAE which are 41.23 and 14.09 respectively compared to other methods. It can be concluded that Multiple Imputation is the most robust and adaptable machine learning method, but it is also the most difficult to program in terms of computing complexity.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Sattari, M.T., Rezazadeh-Joudi, A., Kusiak, A.: Assessment of different methods for estimation of missing data in precipitation studies. Hydrol. Res. 48(4), 1032–1044 (2016)
Chen, L., Xu, J., Wang, G., Shen, Z.: Comparison of the multiple imputation approaches for imputing rainfall data series and their applications to watershed models. J. Hydrol. 572, 449–460 (2019)
Mfwango, L.H., Salim, C.J., Kazumba, S.: Estimation of missing river flow data for hydrologic analysis: the case of Great Ruaha River catchment. Hydrol. Curr. Res. 9(2), 1–8 (2018)
Gill, M.K., Asefa, T., Kaheil, Y., McKee, M.: Effect of missing data on performance of learning algorithms for hydrologic predictions: implications to an imputation technique. Water Resour. Res. 43(7), 1–12 (2007)
Stavseth, M.R., Clausen, T., Røislien, J.: How handling missing data may impact conclusions: a comparison of six different imputation methods for categorical questionnaire data. SAGE Open Med. 7, 205031211882291 (2019)
Burhanuddin, S.N.Z.A., Deni, S.M., Ramli, N.M.: Imputation of missing rainfall data using revised normal ratio method. Adv. Sci. Lett. 23(11), 10981–10985 (2017). https://doi.org/10.1166/asl.2017.10203
Hamzah, F.B., Mohd Hamzah, F., Mohd Razali, S.F., Jaafar, O., Abdul Jamil, N.: Imputation methods for recovering streamflow observation: a methodological review. Cogent Environ. Sci. 6(1), 1745133 (2020)
Cheema, J.R.: Some general guidelines for choosing missing data handling methods in educational research. J. Mod. Appl. Statist. Methods 13(2), 53–75 (2014). https://doi.org/10.22237/jmasm/1414814520
Mariana Che Mat Nor, S., Shaharudin, S.M., Ismail, S., Zainuddin, N.H., Tan, M.L.: A comparative study of different imputation methods for daily rainfall data in east-coast Peninsular Malaysia. Bull. Electric. Eng. Inf. 9(2), 1–9 (2020). https://doi.org/10.11591/eei.v9i2.2090
Ekeu-wei, I., Blackburn, G., Pedruco, P.: Infilling missing data in hydrology: solutions using satellite radar altimetry and multiple imputation for data-sparse regions. Water 10(10), 1483 (2018)
Madley-Dowd, P., Hughes, R., Tilling, K., Heron, J.: The proportion of missing data should not be used to guide decisions on multiple imputation. J. Clin. Epidemiol. 110, 63–73 (2019). https://doi.org/10.1016/j.jclinepi.2019.02.016
Suhaime, N., Ghazali, N.A., Nasir, M.Y., Mokhtar, M.I.Z., Ramli, N.A.: Markov chain Monte Carlo method for handling missing data in air quality datasets. Malaysian J. Analyt. Sci. 21(3) (2017). https://doi.org/10.17576/mjas-2017-2103-05
Masseran, N., Razali, A.M., Ibrahim, K., Zaharim, A., Sopian, K.: Application of the single imputation method to estimate missing wind speed data in Malaysia. Res. J. Appl. Sci. Eng. Technol. 6(10), 1780–1784 (2013). https://doi.org/10.19026/rjaset.6.3903
De Silva, R.P., Dayawansa, N.D.K., Ratnasiri, M.D.: A comparison of methods used in estimating missing rainfall data. J. Agricult. Sci. 3(2), 101 (2007). https://doi.org/10.4038/jas.v3i2.8107
Shaharudin, S.M., Andayani, S.K., Binatari, N., Kurniawan, A., Ahmad Basri, M.A., Zainuddin, N.H.: Imputation methods for addressing missing data of monthly rainfall in Yogyakarta, Indonesia. Int. J. Adv. Trends Comput. Sci. Eng. 9(1.4), 646–651 (2020). https://doi.org/10.30534/ijatcse/2020/9091.42020
Firat, M., Dikbas, F., Koc, A.C., Gungor, M.: Analysis of temperature series: estimation of missing data and homogeneity test. Meteorol. Appl. 19(4), 397–406 (2011). https://doi.org/10.1002/met.271
Dastorani, M.T., Moghadamnia, A., Piri, J., Rico-Ramirez, M.: Application of ANN and ANFIS models for reconstructing missing flow data. Environ. Monit. Assess. 166(1–4), 421–434 (2009). https://doi.org/10.1007/s10661-009-1012-8
Nadiatul Adilah, A.A.G., Hannani, H.: Comparison of methods to estimate missing rainfall data for short term period at UMP Gambang. IOP Conf. Ser. Earth Environ. Sci. 682(1), 012027 (2021). https://doi.org/10.1088/1755-1315/682/1/012027
Osman, M.S., Abu-Mahfouz, A.M., Page, P.R.: A survey on data imputation techniques: water distribution system as a use case. IEEE Access 6, 63279–63291 (2018)
Abdulgader, Q.: Time series forecasting using arima methodology with application on census data in Iraq. Sci. J. Univ. Zakho 4(2), 258–268 (2016). https://doi.org/10.25271/2016.4.2.116
Fattah, J., Ezzine, L., Aman, Z., el Moussami, H., Lachhab, A.: Forecasting of demand using ARIMA model. Int. J. Eng. Bus. Manag. 10, 184797901880867 (2018). https://doi.org/10.1177/1847979018808673
Pampaka, M., Hutcheson, G., Williams, J.: Handling missing data: analysis of a challenging data set using multiple imputation. Int. J. Res. Method Educ. 39(1), 19–37 (2014). https://doi.org/10.1080/1743727x.2014.979146
Acknowledgments
The authors would like to thank the Ministry of Higher Education Malaysia (MOHE) for supporting this research under Fundamental Research Grant Scheme Vot No. FRGS/1/2018/STG06/UTHM/03/3 and partially sponsor by Universiti Tun Hussein Onn Malaysia under Multi-Displinary Grant Vot No. H508.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Muhaime, N.A.D.A., Arifin, M.A., Ismail, S., Shaharuddin, S.M. (2022). Comparative Performance of Various Imputation Methods for River Flow Data. In: Ghazali, R., Mohd Nawi, N., Deris, M.M., Abawajy, J.H., Arbaiy, N. (eds) Recent Advances in Soft Computing and Data Mining. SCDM 2022. Lecture Notes in Networks and Systems, vol 457. Springer, Cham. https://doi.org/10.1007/978-3-031-00828-3_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-00828-3_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-00827-6
Online ISBN: 978-3-031-00828-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)