Skip to main content

Performance Meta-analysis for Big-Data Univariate Auto-Imputation in the Building Sector

  • Conference paper
  • First Online:
Book cover Artificial Intelligence Applications and Innovations. AIAI 2022 IFIP WG 12.5 International Workshops (AIAI 2022)

Abstract

Filtering refers to the process of defining, detecting and correcting errors in a given dataset, to achieve system reliability and minimize the impact of errors in data analysis. Automated and accurate tools for data filtering and healing are crucial to ensure reliability of the system. This study aims to investigate statistical and machine-learning-based methodologies for data gaps healing and missing values imputation. In total, five models are being investigated individually, the well known ARIMA model, Linear and Polynomial Interpolation, General Regression and Facebook Prophet. The raw data that are used to evaluate these methods are simulated, and artificial data gaps are imposed randomly within the dataset to evaluate the univariate imputation performance of the aforementioned models based on Mean Squared Error and Mean Absolute Error. As expected the evaluation results illustrate the efficiency of highly elaborate machine-learning Facebook Prophet against more simple statistic ARIMA in expense of time and computational efforts. However, for Big Data univariate imputation applications the study findings suggest that a combination of ARIMA and Facebook Prophet, depending on the data gap size, could balance out the required computational resources while maintaining highly accurate imputation results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Roque, N.A., Ram, N.: tsfeaturex: an R package for automating time series feature extraction. J. Open Source Softw. 4(37) (2019)

    Google Scholar 

  2. Olivera, P., et al.: Big data in IBD: a look into the future. Nat. Rev. Gastroenterol. Hepatol. 16(5), 312–321 (2019)

    Google Scholar 

  3. Hancock, J.T., Khoshgoftaar, T.M.: CatBoost for big data: an interdisciplinary review. J. Big Data 7(1), 1–45 (2020). https://doi.org/10.1186/s40537-020-00369-8

    Article  Google Scholar 

  4. Schauer, J.M., et al.: Exploratory analyses for missing data in meta-analyses and meta-regression: a tutorial. Alcohol Alcohol. 57(1), 35–46 (2022)

    Article  Google Scholar 

  5. Bache-Mathiesen, L.K., et al.: Handling and reporting missing data in training load and injury risk research. Sci. Med. Footb. 1–13 (2021)

    Google Scholar 

  6. Kahale, L.A., et al.: Potential impact of missing outcome data on treatment effects in systematic reviews: imputation study. bmj 370 (2020)

    Google Scholar 

  7. Lin, W.-C., Tsai, C.-F.: Missing value imputation: a review and analysis of the literature (2006–2017). Artif. Intell. Rev. 53(2), 1487–1509 (2019). https://doi.org/10.1007/s10462-019-09709-4

    Article  Google Scholar 

  8. Flores, A., Tito, H., Silva, C.: Local average of nearest neighbors: univariate time series imputation. Int. J. Adv. Comput. Sci. Appl. 10(8), 45–50 (2019)

    Google Scholar 

  9. Saad, M., et al.: Tackling imputation across time series models using deep learning and ensemble learning. In: 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE (2020)

    Google Scholar 

  10. Saad, M., et al.: Machine learning based approaches for imputation in time series data and their impact on forecasting. In: 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE (2020)

    Google Scholar 

  11. Zymbler, M., et al.: Cleaning sensor data in smart heating control system. In: 2020 Global Smart Industry Conference (GloSIC). IEEE (2020)

    Google Scholar 

  12. Brajković, H., Jakšić, D., Poščić, P.: Data warehouse and data quality-an overview. In: Central European Conference on Information and Intelligent Systems. Faculty of Organization and Informatics Varazdin (2020)

    Google Scholar 

  13. Chiu, P.C., Selamat, A., Krejcar, O.: Infilling missing rainfall and runoff data for Sarawak, Malaysia using gaussian mixture model based K-Nearest neighbor imputation. In: Wotawa, F., Friedrich, G., Pill, I., Koitz-Hristov, R., Ali, M. (eds.) IEA/AIE 2019. LNCS (LNAI), vol. 11606, pp. 27–38. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-22999-3_3

    Chapter  Google Scholar 

  14. Afrifa-Yamoah, E., et al.: Missing data imputation of high-resolution temporal climate time series data. Meteorol. Appl. 27(1), e1873 (2020)

    Google Scholar 

  15. Chaudhry, A., et al.: A method for improving imputation and prediction accuracy of highly seasonal univariate data with large periods of missingness. Wirel. Commun. Mob. Comput. 2019, 1–13 (2019)

    Google Scholar 

  16. Jan, B., et al.: Deep learning in big data analytics: a comparative study. Comput. Electr. Eng. 75, 275–287 (2019)

    Article  Google Scholar 

Download references

Acknowledgements

The research leading to these results was partially funded by the European Commission “EEB-07-2017 Integration of energy harvesting at building and district level” - PLUG-N-HARVEST H2020 project (Grant agreement ID: 768735) https://www.plug-n-harvest.eu/, accessed on 22 February 2022; and “LC-SC3-B4E-3-2020 Upgrading smartness of existing buildings through innovations for legacy equipment” - Smart2B H2020 project (Grant agreement ID: 101023666) https://www.smart2b-project.eu/, accessed on 2 March 2022.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Asimina Dimara .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Stefanopoulou, A. et al. (2022). Performance Meta-analysis for Big-Data Univariate Auto-Imputation in the Building Sector. In: Maglogiannis, I., Iliadis, L., Macintyre, J., Cortez, P. (eds) Artificial Intelligence Applications and Innovations. AIAI 2022 IFIP WG 12.5 International Workshops. AIAI 2022. IFIP Advances in Information and Communication Technology, vol 652. Springer, Cham. https://doi.org/10.1007/978-3-031-08341-9_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-08341-9_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-08340-2

  • Online ISBN: 978-3-031-08341-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics