skip to main content
10.1145/3379247.3379270acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccdeConference Proceedingsconference-collections
research-article

Use Case and Performance Analyses for Missing Data Imputation Methods in Big Data Analytics

Published: 07 March 2020 Publication History

Abstract

In big data analytics the phenomenon of missing data is universal due to reasons such as faulty equipment and nonresponses in surveys. Imputation is the process of replacing missing data with substituted values. Proper imputation could greatly improve the accuracy and effectiveness of big data analytics.
In this paper, we analyze a rich set of deletion and imputation methods, focusing on strengths, weaknesses, best use cases, implementation strategies, and error-examination based performance analysis. Our goal is to find the best fitted imputation method(s) for each given use case.

References

[1]
van Buuren, S., Flexible Imputation of Missing Data, 2nd edition, CRC Press.
[2]
Swalin, A., How to Handle Missing Data, retrieved on August 15, 2019 from https://towardsdatascience.com/how-to-handle-missing-data-8646b18db0d4
[3]
Asadi, R., Regan, A. A convolutional recurrent autoencoder for spatio-temporal missing data imputation, Proceedings of 2019 International Conference on Artificial Intelligence (ICAI'19), pp. 206--212, ISBN: 1-60132-501-0, CSREA Press ©
[4]
L. Li, J. Zhang, Y. Wang, and B. Ran, "Missing value imputation for traffic-related time series data based on a multi-view learning method," IEEE Transactions on Intelligent Transportation Systems, 2018.
[5]
Statistics solution, Handling Missing Data: Listwise Versus Pairwise Deletion, retrieved August 15, 2019 from https://www.statisticssolutions.com/handling-missing-data-listwise-versus-pairwise-deletion/
[6]
Molnar, F. J., Hulton, B., Fergusson, D., Does analysis using "last observation carried forward" introduce bias in dementia research?, retrieved on August 15, 2019 from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2553855/
[7]
Little, R., A Tet of Missing Completely at Random for Multivariable Data with Missing Value, Journal of the American Statistical Association Vol. 83, No. 404 (Dec., 1988), pp. 1198--1202
[8]
Li, C. Little's Test of Missing Completely at Random. The Stat Journal (2013), 13, No. 4, pp.795--809
[9]
Pedro J. GARCÍA-LAENCINA, José-Luis SANCHO-GÓMEZ, Aníbal R. FIGUEIRAS-VIDAL, Machine Learning Techniques for Solving Classification Problems with Missing Input Data, retrieved August 15, 2019 from https://pdfs.semanticscholar.org/7b1e/b4a482bf903079e5775b19e88225f956b9f2.pdf
[10]
Statistics How To, RMSE: Root Mean Square Error, retrieved on August 15, 2019 from https://www.statisticshowto.datasciencecentral.com/rmse/
[11]
Lepot, M., Aubin, J., Clemens, F., Interpolation in Time Series: An Introductive Overview of Existing Methods, Their Performance Criteria and Uncertainty Assessment, retrieved on August 15, 2019 from https://www.mdpi.com/2073-4441/9/10/796/pdf
[12]
Bingham, N. H., Fry, John M., Regression -- Linear Models in Statistics, Springer, SBN 978-1-84882-969-5
[13]
https://scikit-learn.org/stable/modules/impute.html

Index Terms

  1. Use Case and Performance Analyses for Missing Data Imputation Methods in Big Data Analytics

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICCDE '20: Proceedings of 2020 6th International Conference on Computing and Data Engineering
    January 2020
    279 pages
    ISBN:9781450376730
    DOI:10.1145/3379247
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 March 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Big data analytics
    2. error estimation
    3. imputation
    4. missing data

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICCDE 2020

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 176
      Total Downloads
    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media