Skip to main content

Advertisement

Unsupervised anomaly detection and imputation in noisy time series data for enhancing load forecasting

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Efficient energy management relies heavily on accurate load forecasting, particularly in the face of increasing energy demands and the imperative for sustainable operations. However, the presence of anomalies in historical data poses a significant challenge to the effectiveness of forecasting models, potentially leading to suboptimal resource allocation and decision-making. This paper presents an innovative unsupervised feature bank based framework for anomaly detection in time series data affected by anomalies. Leveraging an RNN-based recurrent denoising autoencoder, identified anomalies are replaced with plausible patterns. We evaluate the effectiveness of our methodology through a comprehensive study, comparing the performance of different forecasting models before and after the anomaly detection and imputation processes. Our results demonstrate the versatility and effectiveness of our approach across various energy applications for smart grids and smart buildings, highlighting its potential for widespread adoption in energy management systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability and Access

The original version of the AEMO dataset is openly accessible through the AEMO website. The Park dataset is taken from [46]. Code and instructions to process this data in order to reproduce our experiments are available on Github. Authors do not have permission to distribute the Predis-MHI dataset.

Notes

  1. Code for this implementation is made open-source and is accessible at Github.com/MaherDissem/Unsupervised-Anomaly-Detection-in-Noisy-Time-Series-Data-for-Enhancing-Load-Forecasting.

References

  1. Institute E (2023) Statistical Review of World Energy. Energy Institute London, UK

    MATH  Google Scholar 

  2. Hoseinzadeh S, Groppi D, Sferra AS, Di Matteo U, Astiaso Garcia D (2022) The prismi plus toolkit application to a grid-connected mediterranean island. Energies. 15(22) . https://doi.org/10.3390/en15228652

  3. Azeem A, Ismail I, Jameel SM, Harindran VR (2021) Electrical load forecasting models for different generation modalities: a review. IEEE Access. 9:142239–142263

    Article  Google Scholar 

  4. Byun J, Hong I, Kang B, Park S (2011) A smart energy distribution and management system for renewable energy distribution and context-aware services based on user patterns and load forecasting. IEEE Trans Consumer Electron. 57(2):436–444. https://doi.org/10.1109/TCE.2011.5955177

    Article  MATH  Google Scholar 

  5. Hadri S, Naitmalek Y, Najib M, Bakhouya M, Fakhri Y, Elaroussi M (2019) A comparative study of predictive approaches for load forecasting in smart buildings. Procedia Comput Sci. 160:173–180 https://doi.org/10.1016/j.procs.2019.09.458 . The 10th International Conference on Emerging Ubiquitous Systems and Pervasive Networks (EUSPN-2019) / The 9th International Conference on Current and Future Trends of Information and Communication Technologies in Healthcare (ICTH-2019) / Affiliated Workshops

  6. Deb C, Zhang F, Yang J, Lee SE, Shah KW (2017) A review on time series forecasting techniques for building energy consumption. Renew Sustain Energy Rev. 74:902–924. https://doi.org/10.1016/j.rser.2017.02.085

    Article  MATH  Google Scholar 

  7. Labeodan T, Zeiler W, Boxem G, Zhao Y (2015) Occupancy measurement in commercial office buildings for demand-driven control applications-a survey and detection system evaluation. Energy Build. 93:303–314. https://doi.org/10.1016/j.enbuild.2015.02.028

    Article  MATH  Google Scholar 

  8. Kelly J, Knottenbelt W (2015) The uk-dale dataset, domestic appliance-level electricity demand and whole-house demand from five uk homes. Sci Data. 2(1). https://doi.org/10.1038/sdata.2015.7

  9. Kaur D, Islam SN, Mahmud MA, Haque ME, Dong ZY (2022) Energy forecasting in smart grid systems: recent advancements in probabilistic deep learning. IET Gener, Transmis Distrib. 16(22):4461–4479

    Article  MATH  Google Scholar 

  10. Nespoli A, Ogliari E, Pretto S, Gavazzeni M, Vigani S, Paccanelli F (2021) Electrical load forecast by means of lstm: The impact of data quality. Forecast. 3(1):91–101

    Article  Google Scholar 

  11. Sehwag V, Bhagoji AN, Song L, Sitawarin C, Cullina D, Chiang M, Mittal P (2019) Analyzing the robustness of open-world machine learning. In: Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security, pp 105–116

  12. Roth K, Pemula L, Zepeda J, Schölkopf B, Brox T, Gehler P (2021) Towards total recall in industrial anomaly detection. IEEE/CVF Conf Comput Vis Pattern Recog (CVPR) 2022:14298–14308

    MATH  Google Scholar 

  13. Jiang X, Liu J, Wang J, Nie Q, WU K, Liu Y, Wang C, Zheng F (2022) Softpatch: Unsupervised anomaly detection with noisy data. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems, vol 35, pp 15433–15445. https://proceedings.neurips.cc/paper_files/paper/2022/file/637a456d89289769ac1ab29617ef7213-Paper-Conference.pdf

  14. Choi K, Yi J, Park C, Yoon S (2021) Deep learning for anomaly detection in time-series data: Review, analysis, and guidelines. IEEE Access. 9:120043–120065. https://doi.org/10.1109/ACCESS.2021.3107975

    Article  MATH  Google Scholar 

  15. Schmidl S, Wenig P, Papenbrock T (2022) Anomaly detection in time series: a comprehensive evaluation. Proc. VLDB Endow. 15(9):1779–1797 https://doi.org/10.14778/3538598.3538602

  16. Zhou Y, Song X, Zhang Y, Liu F, Zhu C, Liu L (2022) Feature encoding with autoencoders for weakly supervised anomaly detection. IEEE Trans Neural Netw Learn Syst. 33(6):2454–2465. https://doi.org/10.1109/tnnls.2021.3086137

    Article  MathSciNet  MATH  Google Scholar 

  17. Hu T, Guo Q, Shen X, Wu R, Xi H (2019) Utilizing unlabeled data to detect electricity fraud in ami: A semisupervised deep learning approach. IEEE Trans Neural Netw Learn Syst. pp 1–13. https://doi.org/10.1109/TNNLS.2018.2890663

  18. Li J, Izakian H, Pedrycz W, Jamal I (2021) Clustering-based anomaly detection in multivariate time series data. Applied Soft Comput. 100:106919 https://doi.org/10.1016/j.asoc.2020.106919

  19. Roth K, Pemula L, Zepeda J, Schölkopf B, Brox T, Gehler P (2022) Towards total recall in industrial anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14318–14328

  20. Su Y, Zhao Y, Niu C, Liu R, Sun W, Pei D (2019) Robust anomaly detection for multivariate time series through stochastic recurrent neural network. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. KDD ’19, pp 2828–2837. Association for Computing Machinery, New York, NY, USA . https://doi.org/10.1145/3292500.3330672 . https://doi.org/10.1145/3292500.3330672

  21. Fang C, Wang C (2020) Time Series Data Imputation: A Survey on Deep Learning Approaches

  22. Fang C, Song S, Chen Z, Gui A (2019) Fine-grained fuel consumption prediction. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management. CIKM ’19, pp 2783–2791. Association for Computing Machinery, New York, NY, USA . https://doi.org/10.1145/3357384.3357836 . https://doi.org/10.1145/3357384.3357836

  23. Graham J (2009) Missing data analysis: Making it work in the real world. Annual Rev Psych. 60:549–76. https://doi.org/10.1146/annurev.psych.58.110405.085530

  24. Zhang A, Song S, Wang J (2016) Sequential data cleaning: A statistical approach. In: Proceedings of the 2016 International Conference on Management of Data. SIGMOD ’16, pp 909–924. Association for Computing Machinery, New York, NY, USA . https://doi.org/10.1145/2882903.2915233 . https://doi.org/10.1145/2882903.2915233

  25. Batista GEAPA, Monard MC (2002) A study of k-nearest neighbour as an imputation method. In: International Conference on Health Information Science . https://api.semanticscholar.org/CorpusID:37493644

  26. Zhang A, Song S, Wang J, Yu PS (2017) Time series data cleaning: from anomaly detection to anomaly repairing. Proc. VLDB Endow. 10(10):1046–1057. https://doi.org/10.14778/3115404.3115410

  27. Spinelli I, Scardapane S, Uncini A (2020) Missing data imputation with adversarially-trained graph convolutional networks. Neural Netw. 129:249–260. https://doi.org/10.1016/j.neunet.2020.06.005

  28. Ba-Alawi A, Loy-Benitez J, Kim S, Yoo C (2021) Missing data imputation and sensor self-validation towards a sustainable operation of wastewater treatment plants via deep variational residual autoencoders. Chemosphere. 288:132647. https://doi.org/10.1016/j.chemosphere.2021.132647

  29. Liguori A, Markovic R, Dam TTH, Frisch J, van Treeck C, Causone F (2021) Indoor environment data time-series reconstruction using autoencoder neural networks. Build Environ. 191:107623 . https://doi.org/10.1016/j.buildenv.2021.107623

  30. Zhang J, Yin P (2019) Multivariate time series missing data imputation using recurrent denoising autoencoder. In: 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp 760–764. https://doi.org/10.1109/BIBM47256.2019.8982996

  31. De Gooijer JG, Hyndman RJ (2006) 25 years of time series forecasting. Int J Forecast. 22(3):443–473

    Article  MATH  Google Scholar 

  32. Lim B, Zohren S (2021) Time-series forecasting with deep learning: a survey. Philo Trans Royal Soc A. 379(2194):20200209

    Article  MathSciNet  MATH  Google Scholar 

  33. Kong W, Dong ZY, Jia Y, Hill DJ, Xu Y, Zhang Y (2019) Short-term residential load forecasting based on lstm recurrent neural network. IEEE Trans Smart Grid. 10(1):841–851. https://doi.org/10.1109/TSG.2017.2753802

    Article  MATH  Google Scholar 

  34. Sajjad M, Khan ZA, Ullah A, Hussain T, Ullah W, Lee MY, Baik SW (2020) A novel cnn-gru-based hybrid approach for short-term residential load forecasting. IEEE Access. 8:143759–143768 . https://doi.org/10.1109/ACCESS.2020.3009537

  35. Zhou H, Zhang S, Peng J, Zhang S, Li J, Xiong H, Zhang W (2021) Informer: Beyond efficient transformer for long sequence time-series forecasting. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 35, pp 11106–11115

  36. Wu H, Xu J, Wang J, Long M (2021) Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Adv Neural Inf Process Syst. 34:22419–22430

    Google Scholar 

  37. Zeng A, Chen M, Zhang L, Xu Q (2023) Are transformers effective for time series forecasting? In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 37, pp 11121–11128 (2023)

  38. Sen R, Yu H-F, Dhillon I (2019) Think Globally. A Deep Neural Network Approach to High-Dimensional Time Series Forecasting, Act Locally

    MATH  Google Scholar 

  39. Liu M, Zeng A, Chen M, Xu Z, Lai Q, Ma L, Xu Q (2022) SCINet: Time series modeling and forecasting with sample convolution and interaction. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems, vol 35, pp 5816–5828. https://proceedings.neurips.cc/paper_files/paper/2022/file/266983d0949aed78a16fa4782237dea7-Paper-Conference.pdf

  40. Defard T, Setkov A, Loesch A, Audigier R (2020) Padim: a patch distribution modeling framework for anomaly detection and localization. In: ICPR Workshops . https://api.semanticscholar.org/CorpusID:226976039

  41. Sener O, Savarese S (2018) Active learning for convolutional neural networks: A core-set approach. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. https://openreview.net/forum?id=H1aIuk-RW

  42. Sinha S, Zhang H, Goyal A, Bengio Y, Larochelle H, Odena A (2020) Small-GAN: Speeding up GAN training using core-sets. In: Daumé III, H., Singh, A. (eds.) Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol 119, pp 9005–9015. https://proceedings.mlr.press/v119/sinha20b.html

  43. Van Houdt G, Mosquera C, Nápoles G (2020) A review on the long short-term memory model. Art Intell Rev. 53. https://doi.org/10.1007/s10462-020-09838-1

  44. Cho K, Merriënboer B, Bahdanau D, Bengio Y (2014) On the properties of neural machine translation: Encoder–decoder approaches. In: Wu D, Carpuat M, Carreras X, Vecchi EM (eds.) Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, pp 103–111. Association for Computational Linguistics, Doha, Qatar . https://doi.org/10.3115/v1/W14-4012 . https://aclanthology.org/W14-4012

  45. Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. cite arxiv:1412.3555Comment: Presented in NIPS 2014 Deep Learn Represent Learn Work . http://arxiv.org/abs/1412.3555

  46. Zhou K, Hu D, Hu R, Zhou J (2023) High-resolution electric power load data of an industrial park with multiple types of buildings in china. Sci Data. 10 . https://doi.org/10.1038/s41597-023-02786-9

  47. Martin Nascimento GF, Wurtz F, Kuo-Peng P, Delinchant B, Batistela N, Laranjeira T (2023) Green-er-electricity consumption data of a tertiary building. Frontiers in Sustain Cities. 5 . https://doi.org/10.3389/frsc.2023.1043657

  48. Turowski M, Weber M, Neumann O, Heidrich B, Phipps K, Çakmak HK, Mikut R, Hagenmeyer V (2022) Modeling and generating synthetic anomalies for energy and power time series. e-Energy ’22, pp 471–484. Association for Computing Machinery, New York, NY, USA . https://doi.org/10.1145/3538637.3539760 . https://doi.org/10.1145/3538637.3539760

  49. Wang L, Ding Y, Riedel T, Miclaus A, Beigl M (2017) Data analysis on building load profiles: A stepping stone to future campus. In: 2017 International Smart Cities Conference (ISC2), pp 1–4 . https://doi.org/10.1109/ISC2.2017.8090823

  50. Challu C, Jiang P, Wu YN, Callot L (2022) Deep Generative model with Hierarchical Latent Factors for Time Series Anomaly Detection . https://arxiv.org/abs/2202.07586

  51. Yahoo! Research (2015) Yahoo! Webscope dataset ydata-labeled-time-series-anomalies-v1_0. http://labs.yahoo.com/Academic_Relations

  52. Wu R, Keogh EJ (2022) Current time series anomaly detection benchmarks are flawed and are creating the illusion of progress (extended abstract). In: 2022 IEEE 38th International Conference on Data Engineering (ICDE), pp 1479–1480 . https://doi.org/10.1109/ICDE53745.2022.00116

  53. Romanuke V (2021) Time series smoothing improving forecasting. Applied Comput Syst. 26(1):60–70

    Article  MATH  Google Scholar 

Download references

Acknowledgements

The completion of this research was made possible thanks to the Natural Sciences and Engineering Research Council of Canada (NSERC) and a start-up grant from Concordia University. The authors would like to thank Prof. Nizar Bouguila for insightful discussions about the machine learning models.

Author information

Authors and Affiliations

Authors

Contributions

Maher Dissem: Formal Analysis, Investigation, Methodology, Software, Visualization, Writing - original draft. Manar Amayri: Conceptualization, Data curation, Funding acquisition, Project administration, Resources, Supervision, Validation, Writing - review & editing.

Corresponding author

Correspondence to Maher Dissem.

Ethics declarations

Competing Interests

The authors have no relevant financial or non-financial interests to disclose.

Ethical and informed consent for data used

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dissem, M., Amayri, M. Unsupervised anomaly detection and imputation in noisy time series data for enhancing load forecasting. Appl Intell 55, 11 (2025). https://doi.org/10.1007/s10489-024-05856-6

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-024-05856-6

Keywords