Abstract
Missing data are a rule rather than an exception in quantitative research. The questionable aspect however is the extent, pattern, mechanism, and treatment of missingness in facility-based paper maternal health records. We utilized data from maternal health records at Kawempe National Referral Hospital, Uganda. Only records of women who had given birth at the Hospital during January 2017 to January 2021 were considered. The analysis was done using R-Studio using frequency distributions, Pearson χ2 Test. Treatment of missingness was done using Listwise deletion (LD), Mode Imputation, Multiple Imputation by chained equations (MICE), Imputation using K-Nearest Neighbors (KNN) and Random Forest (RF) Imputation. Performance of methods was investigated using prediction accuracy and the Kruskal–Wallis Test on Standard Errors (SEs) derived following a Logistic Regression. Overall, 5% of the data was missing. The proportion of missingness ranged from 1.4 to 20.7% in variables. Case-wise missingness was established where 2498 out of the 4626 cases (54%) had at-least one variable with missing value. The pattern of missingness was arbitrary. The data suggest either missing at random or missing completely at random. With the exception of LD, no difference in SEs following Logistic Regression was noted in the imputation methods for treatment of missingness (p > 0.05). Further, LD yielded the lowest prediction accuracy after Logistic Regression. No major variations were noted in the prediction accuracy following a Logistic Regression after imputation using MICE, mode imputation, KNN and RF. Missingness in facility-based health records should not be ignored. Researchers need to pay attention to both overall and case-wise missingness.
Similar content being viewed by others
Data Availability
Data can be availed at reasonable request.
References
Dong Y, Peng C-YJ. Principled missing data methods for researchers. Springerplus. 2013;2(1):222. https://doi.org/10.1186/2193-1801-2-222.
Orchard T, Woodbury MA. A missing information principle: theory and applications. In: Theory of statistics. Berkeley: University of California Press; 1972. p. 697–716.
Barnard J, Meng X-L. Applications of multiple imputation in medical studies: from AIDS to NHANES. Stat Methods Med Res. 1999;8(1):17–36. https://doi.org/10.1177/096228029900800103.
Cole JC. How to deal with missing data. In: Best practices in quantitative methods. 2008. pp. 214–238
Wells BJ, Chagin KM, Nowacki AS, Kattan MW. Strategies for handling missing data in electronic health record derived data. EGEMS (Wash DC). 2013;1(3):1035. https://doi.org/10.13063/2327-9214.1035.
Ladouceur R, Gosselin P, Laberge M, Blaszczynski A. Dropouts in clinical research: Do results reported reflect clinical reality? Behav Ther. 2001;24(2):44–6.
Peng C-YJ, Harwell M, Liou S-M, Ehman LH. Advances in missing data methods and implications for educational research. Real Data Anal. 2006;3178.
Rubin DB. Multiple imputation for nonresponse in surveys, vol. 81. Hoboken: Wiley; 2004.
Schafer JL. Analysis of incomplete multivariate data. Boca Raton: CRC Press; 1997.
Pedersen AB, et al. Missing data and multiple imputation in clinical epidemiological research. Clin Epidemiol. 2017;9:157–66. https://doi.org/10.2147/CLEP.S129785.
Schafer JL. Multiple imputation: a primer. Stat Methods Med Res. 1999;8(1):3–15.
Bennett DA. How can I deal with missing data in my study? Aust N Z J Public Health. 2001;25(5):464–9.
Tabachnick BG, Fidell LS, Ullman JB. Using multivariate statistics, vol. 5. Boston: Pearson; 2007.
Rubin DB. Inference and missing data. Biometrika. 1976;63(3):581–92.
Collins LM, Schafer JL, Kam C-M. A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychol Methods. 2001;6(4):330.
Van Buuren S. Flexible imputation of missing data. Boca Raton: CRC Press; 2018.
Haneuse S, et al. Learning about missing data mechanisms in electronic health records-based research: a survey-based approach. Epidemiology. 2016;27(1):82–90. https://doi.org/10.1097/EDE.0000000000000393.
Rubin DB, Stern HS, Vehovar V. Handling ‘Don’t Know’ Survey Responses: The Case of the Slovenian Plebiscite. J Am Stat Assoc. 1995;90(431):822–8. https://doi.org/10.1080/01621459.1995.10476580.
Petersen I, et al. Health indicator recording in UK primary care electronic health records: key implications for handling missing data. Clin Epidemiol. 2019;11:157–67. https://doi.org/10.2147/CLEP.S191437.
Tsai J, Bond G. A comparison of electronic records to paper records in mental health centers. Int J Qual Health Care. 2008;20(2):136–43. https://doi.org/10.1093/intqhc/mzm064.
Menachemi N, Saunders C, Chukmaitov A, Matthews MC, Brooks RG. Hospital adoption of information technologies and improved patient safety: a study of 98 hospitals in Florida. J Healthc Manag. 2007;52(6):398–409.
White IR, Carlin JB. Bias and efficiency of multiple imputation compared with complete-case analysis for missing covariate values. Stat Med. 2010;29(28):2920–31. https://doi.org/10.1002/sim.3944.
Carpenter J, Kenward M. Multiple imputation and its application. Hoboken: Wiley; 2012.
Kabakyenga JK, Östergren P-O, Turyakira E, Mukasa PK, Pettersson KO. Individual and health facility factors and the risk for obstructed labour and its adverse outcomes in south-western Uganda. BMC Pregnancy Childbirth. 2011;11(1):73. https://doi.org/10.1186/1471-2393-11-73.
Ngonzi J, et al. Puerperal sepsis, the leading cause of maternal deaths at a Tertiary University Teaching Hospital in Uganda. BMC Pregnancy Childbirth. 2016;16(1):207. https://doi.org/10.1186/s12884-016-0986-9.
Alobo G, Reverzani C, Sarno L, Giordani B, Greco L. Estimating the risk of maternal death at admission: a predictive model from a 5-year case reference study in Northern Uganda. Obstet Gynecol Int. 2022;2022: e4419722. https://doi.org/10.1155/2022/4419722.
Atuhairwe S, Gemzell-Danielsson K, Byamugisha J, Kaharuza F, Tumwesigye NM, Hanson C. Abortion-related near-miss morbidity and mortality in 43 health facilities with differences in readiness to provide abortion care in Uganda. BMJ Glob Health. 2021;6(2): e003274. https://doi.org/10.1136/bmjgh-2020-003274.
Wasswa EW, Nakubulwa S, Mutyaba T. Fetal demise and associated factors following umbilical cord prolapse in Mulago hospital, Uganda: a retrospective study. Reprod Health. 2014;11(1):12. https://doi.org/10.1186/1742-4755-11-12.
Hughes NJ, et al. Decision-to-delivery interval of emergency cesarean section in Uganda: a retrospective cohort study. BMC Pregnancy Childbirth. 2020;20(1):324. https://doi.org/10.1186/s12884-020-03010-x.
Nelson JP. Indications and appropriateness of caesarean sections performed in a tertiary referral centre in Uganda: a retrospective descriptive study. Pan Afr Med J. 2017;26:64. https://doi.org/10.11604/pamj.2017.26.64.9555.
Yego F, Stewart Williams J, Byles J, Nyongesa P, Aruasa W, D’Este C. A retrospective analysis of maternal and neonatal mortality at a teaching and referral hospital in Kenya. Reprod Health. 2013;10(1):13. https://doi.org/10.1186/1742-4755-10-13.
Ndwiga C, Odwe G, Pooja S, Ogutu O, Osoti A, Warren CE. Clinical presentation and outcomes of pre-eclampsia and eclampsia at a national hospital, Kenya: a retrospective cohort study. PLoS ONE. 2020;15(6): e0233323. https://doi.org/10.1371/journal.pone.0233323.
Bwana VM, Rumisha SF, Mremi IR, Lyimo EP, Mboera LEG. Patterns and causes of hospital maternal mortality in Tanzania: a 10-year retrospective analysis. PLoS ONE. 2019;14(4): e0214807. https://doi.org/10.1371/journal.pone.0214807.
Nyirahabimana N, et al. Maternal predictors of neonatal outcomes after emergency cesarean section: a retrospective study in three rural district hospitals in Rwanda. Maternal Health, Neonatolo Perinatol. 2017;3(1):11. https://doi.org/10.1186/s40748-017-0050-4.
Bhaskaran K, Forbes HJ, Douglas I, Leon DA, Smeeth L. Representativeness and optimal use of body mass index (BMI) in the UK Clinical Practice Research Datalink (CPRD). BMJ Open. 2013;3(9): e003389.
Marston L, Carpenter JR, Walters KR, Morris RW, Nazareth I, Petersen I. Issues in multiple imputation of missing data for large general practice clinical databases. Pharmacoepidemiol Drug Saf. 2010;19(6):618–26. https://doi.org/10.1002/pds.1934.
Jerez JM, et al. Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif Intell Med. 2010;50(2):105–15. https://doi.org/10.1016/j.artmed.2010.05.002.
Lin J-H, Haug PJ. Exploiting missing clinical data in Bayesian network modeling for predicting medical problems. J Biomed Inform. 2008;41(1):1–14. https://doi.org/10.1016/j.jbi.2007.06.001.
Bounthavong M, Watanabe JH, Sullivan KM. Approach to addressing missing data for electronic medical records and pharmacy claims data research. Pharmacotherapy: J Human Pharmacol Drug Ther. 2015;35(4):380–7. https://doi.org/10.1002/phar.1569.
Batista GEAPA, Monard MC. An analysis of four missing data treatment methods for supervised learning. Appl Artif Intell. 2003;17(5–6):519–33. https://doi.org/10.1080/713827181.
Kyureghian G, Capps O, Nayga RM. A missing variable imputation methodology with an empirical application. In: Drukker DM, editor. Missing data methods: cross-sectional methods and applications, vol. 27 Part 1. Emerald Group Publishing Limited; 2011. p. 313–337. https://doi.org/10.1108/S0731-9053(2011)000027A015.
Mishra S, Khare D. On comparative performance of multiple imputation methods for moderate to large proportions of missing data in clinical trials: a simulation study. J Med Stat Inform. 2014;2(1):9. https://doi.org/10.7243/2053-7662-2-9.
Twala B. An empirical comparison of techniques for handling incomplete data using decision trees. Appl Artif Intell. 2009;23(5):373–405. https://doi.org/10.1080/08839510902872223.
Jadhav A, Pramod D, Ramanathan K. Comparison of performance of data imputation methods for numeric dataset. Appl Artif Intell. 2019;33(10):913–33. https://doi.org/10.1080/08839514.2019.1637138.
Penone C, et al. Imputation of missing data in life-history trait datasets: which approach performs the best? Methods Ecol Evol. 2014;5(9):961–70. https://doi.org/10.1111/2041-210X.12232.
Ghorbani S, Desmarais MC. Performance comparison of recent imputation methods for classification tasks over binary data. Appl Artif Intell. 2017;31(1):1–22. https://doi.org/10.1080/08839514.2017.1279046.
Madley-Dowd P, Hughes R, Tilling K, Heron J. The proportion of missing data should not be used to guide decisions on multiple imputation. J Clin Epidemiol. 2019;110:63–73. https://doi.org/10.1016/j.jclinepi.2019.02.016.
Bono C, Ried LD, Kimberlin C, Vogel B. Missing data on the Center for Epidemiologic Studies Depression Scale: a comparison of 4 imputation techniques. Res Social Adm Pharm. 2007;3(1):1–27. https://doi.org/10.1016/j.sapharm.2006.04.001.
King G, Murray CJ, Salomon JA, Tandon A. Enhancing the validity and cross-cultural comparability of measurement in survey research. Am Political Sci Rev. 2004;98(1):191–207.
Van Buuren S, Groothuis-Oudshoorn K. mice: Multivariate imputation by chained equations in R. J Stat Softw. 2011;45(1):1–67.
Nguyen DV, Wang N, Carroll RJ. Evaluation of missing value estimation for microarray data. J Data Sci. 2004;2(4):347–70.
Troyanskaya O, et al. Missing value estimation methods for DNA microarrays. Bioinformatics. 2001;17(6):520–5. https://doi.org/10.1093/bioinformatics/17.6.520.
Malarvizhi R, Thanamani AS. K-nearest neighbor in missing data imputation. Int J Eng Res Dev. 2012;5(1):5–7.
Stekhoven DJ, Bühlmann P. MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics. 2012;28(1):112–8. https://doi.org/10.1093/bioinformatics/btr597.
Prata N, Hamza S, Bell S, Karasek D, Vahidnia F, Holston M. Inability to predict postpartum hemorrhage: insights from Egyptian intervention data. BMC Pregnancy Childbirth. 2011;11(1):97. https://doi.org/10.1186/1471-2393-11-97.
Akazawa M, Hashimoto K, Katsuhiko N, Kaname Y. Machine learning approach for the prediction of postpartum hemorrhage in vaginal birth. Sci Rep. 2021;11(1):Art. no. 1. https://doi.org/10.1038/s41598-021-02198-y.
Venkatesh KK, et al. Machine learning and statistical models to predict postpartum hemorrhage. Obstet Gynecol. 2020;135(4):935–44. https://doi.org/10.1097/AOG.0000000000003759.
Acknowledgements
Gratitude is extended to the African Centre of Excellence in Data Science, University of Rwanda for the financial support crucial for data collection. We wish to proffer our immense appreciation to Kawempe National Referral Hospital for availing the permission to capture data paramount to this study. We are indebted to the Department of Statistical Methods and Actuarial Science and the School of Statistics and Planning, Makerere University, for their input and technical guidance and support towards this paper.
Funding
Partial funding to support data collection was obtained from the African Centre of Excellence in Data Science, University of Rwanda.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Ethical Approval
Ethical approval for conducting the study was obtained from Uganda National Council of Science and Technology (Ref: HS977ES), and Mulago Hospital Research and Ethics Committee.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Memon, S.M.Z., Wamala, R. & Kabano, I.H. Missing Data Analysis Using Statistical and Machine Learning Methods in Facility-Based Maternal Health Records. SN COMPUT. SCI. 3, 355 (2022). https://doi.org/10.1007/s42979-022-01249-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-022-01249-z