Abstract
Efficient energy management relies heavily on accurate load forecasting, particularly in the face of increasing energy demands and the imperative for sustainable operations. However, the presence of anomalies in historical data poses a significant challenge to the effectiveness of forecasting models, potentially leading to suboptimal resource allocation and decision-making. This paper presents an innovative unsupervised feature bank based framework for anomaly detection in time series data affected by anomalies. Leveraging an RNN-based recurrent denoising autoencoder, identified anomalies are replaced with plausible patterns. We evaluate the effectiveness of our methodology through a comprehensive study, comparing the performance of different forecasting models before and after the anomaly detection and imputation processes. Our results demonstrate the versatility and effectiveness of our approach across various energy applications for smart grids and smart buildings, highlighting its potential for widespread adoption in energy management systems.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability and Access
The original version of the AEMO dataset is openly accessible through the AEMO website. The Park dataset is taken from [46]. Code and instructions to process this data in order to reproduce our experiments are available on Github. Authors do not have permission to distribute the Predis-MHI dataset.
Notes
Code for this implementation is made open-source and is accessible at Github.com/MaherDissem/Unsupervised-Anomaly-Detection-in-Noisy-Time-Series-Data-for-Enhancing-Load-Forecasting.
References
Institute E (2023) Statistical Review of World Energy. Energy Institute London, UK
Hoseinzadeh S, Groppi D, Sferra AS, Di Matteo U, Astiaso Garcia D (2022) The prismi plus toolkit application to a grid-connected mediterranean island. Energies. 15(22) . https://doi.org/10.3390/en15228652
Azeem A, Ismail I, Jameel SM, Harindran VR (2021) Electrical load forecasting models for different generation modalities: a review. IEEE Access. 9:142239–142263
Byun J, Hong I, Kang B, Park S (2011) A smart energy distribution and management system for renewable energy distribution and context-aware services based on user patterns and load forecasting. IEEE Trans Consumer Electron. 57(2):436–444. https://doi.org/10.1109/TCE.2011.5955177
Hadri S, Naitmalek Y, Najib M, Bakhouya M, Fakhri Y, Elaroussi M (2019) A comparative study of predictive approaches for load forecasting in smart buildings. Procedia Comput Sci. 160:173–180 https://doi.org/10.1016/j.procs.2019.09.458 . The 10th International Conference on Emerging Ubiquitous Systems and Pervasive Networks (EUSPN-2019) / The 9th International Conference on Current and Future Trends of Information and Communication Technologies in Healthcare (ICTH-2019) / Affiliated Workshops
Deb C, Zhang F, Yang J, Lee SE, Shah KW (2017) A review on time series forecasting techniques for building energy consumption. Renew Sustain Energy Rev. 74:902–924. https://doi.org/10.1016/j.rser.2017.02.085
Labeodan T, Zeiler W, Boxem G, Zhao Y (2015) Occupancy measurement in commercial office buildings for demand-driven control applications-a survey and detection system evaluation. Energy Build. 93:303–314. https://doi.org/10.1016/j.enbuild.2015.02.028
Kelly J, Knottenbelt W (2015) The uk-dale dataset, domestic appliance-level electricity demand and whole-house demand from five uk homes. Sci Data. 2(1). https://doi.org/10.1038/sdata.2015.7
Kaur D, Islam SN, Mahmud MA, Haque ME, Dong ZY (2022) Energy forecasting in smart grid systems: recent advancements in probabilistic deep learning. IET Gener, Transmis Distrib. 16(22):4461–4479
Nespoli A, Ogliari E, Pretto S, Gavazzeni M, Vigani S, Paccanelli F (2021) Electrical load forecast by means of lstm: The impact of data quality. Forecast. 3(1):91–101
Sehwag V, Bhagoji AN, Song L, Sitawarin C, Cullina D, Chiang M, Mittal P (2019) Analyzing the robustness of open-world machine learning. In: Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security, pp 105–116
Roth K, Pemula L, Zepeda J, Schölkopf B, Brox T, Gehler P (2021) Towards total recall in industrial anomaly detection. IEEE/CVF Conf Comput Vis Pattern Recog (CVPR) 2022:14298–14308
Jiang X, Liu J, Wang J, Nie Q, WU K, Liu Y, Wang C, Zheng F (2022) Softpatch: Unsupervised anomaly detection with noisy data. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems, vol 35, pp 15433–15445. https://proceedings.neurips.cc/paper_files/paper/2022/file/637a456d89289769ac1ab29617ef7213-Paper-Conference.pdf
Choi K, Yi J, Park C, Yoon S (2021) Deep learning for anomaly detection in time-series data: Review, analysis, and guidelines. IEEE Access. 9:120043–120065. https://doi.org/10.1109/ACCESS.2021.3107975
Schmidl S, Wenig P, Papenbrock T (2022) Anomaly detection in time series: a comprehensive evaluation. Proc. VLDB Endow. 15(9):1779–1797 https://doi.org/10.14778/3538598.3538602
Zhou Y, Song X, Zhang Y, Liu F, Zhu C, Liu L (2022) Feature encoding with autoencoders for weakly supervised anomaly detection. IEEE Trans Neural Netw Learn Syst. 33(6):2454–2465. https://doi.org/10.1109/tnnls.2021.3086137
Hu T, Guo Q, Shen X, Wu R, Xi H (2019) Utilizing unlabeled data to detect electricity fraud in ami: A semisupervised deep learning approach. IEEE Trans Neural Netw Learn Syst. pp 1–13. https://doi.org/10.1109/TNNLS.2018.2890663
Li J, Izakian H, Pedrycz W, Jamal I (2021) Clustering-based anomaly detection in multivariate time series data. Applied Soft Comput. 100:106919 https://doi.org/10.1016/j.asoc.2020.106919
Roth K, Pemula L, Zepeda J, Schölkopf B, Brox T, Gehler P (2022) Towards total recall in industrial anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14318–14328
Su Y, Zhao Y, Niu C, Liu R, Sun W, Pei D (2019) Robust anomaly detection for multivariate time series through stochastic recurrent neural network. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. KDD ’19, pp 2828–2837. Association for Computing Machinery, New York, NY, USA . https://doi.org/10.1145/3292500.3330672 . https://doi.org/10.1145/3292500.3330672
Fang C, Wang C (2020) Time Series Data Imputation: A Survey on Deep Learning Approaches
Fang C, Song S, Chen Z, Gui A (2019) Fine-grained fuel consumption prediction. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management. CIKM ’19, pp 2783–2791. Association for Computing Machinery, New York, NY, USA . https://doi.org/10.1145/3357384.3357836 . https://doi.org/10.1145/3357384.3357836
Graham J (2009) Missing data analysis: Making it work in the real world. Annual Rev Psych. 60:549–76. https://doi.org/10.1146/annurev.psych.58.110405.085530
Zhang A, Song S, Wang J (2016) Sequential data cleaning: A statistical approach. In: Proceedings of the 2016 International Conference on Management of Data. SIGMOD ’16, pp 909–924. Association for Computing Machinery, New York, NY, USA . https://doi.org/10.1145/2882903.2915233 . https://doi.org/10.1145/2882903.2915233
Batista GEAPA, Monard MC (2002) A study of k-nearest neighbour as an imputation method. In: International Conference on Health Information Science . https://api.semanticscholar.org/CorpusID:37493644
Zhang A, Song S, Wang J, Yu PS (2017) Time series data cleaning: from anomaly detection to anomaly repairing. Proc. VLDB Endow. 10(10):1046–1057. https://doi.org/10.14778/3115404.3115410
Spinelli I, Scardapane S, Uncini A (2020) Missing data imputation with adversarially-trained graph convolutional networks. Neural Netw. 129:249–260. https://doi.org/10.1016/j.neunet.2020.06.005
Ba-Alawi A, Loy-Benitez J, Kim S, Yoo C (2021) Missing data imputation and sensor self-validation towards a sustainable operation of wastewater treatment plants via deep variational residual autoencoders. Chemosphere. 288:132647. https://doi.org/10.1016/j.chemosphere.2021.132647
Liguori A, Markovic R, Dam TTH, Frisch J, van Treeck C, Causone F (2021) Indoor environment data time-series reconstruction using autoencoder neural networks. Build Environ. 191:107623 . https://doi.org/10.1016/j.buildenv.2021.107623
Zhang J, Yin P (2019) Multivariate time series missing data imputation using recurrent denoising autoencoder. In: 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp 760–764. https://doi.org/10.1109/BIBM47256.2019.8982996
De Gooijer JG, Hyndman RJ (2006) 25 years of time series forecasting. Int J Forecast. 22(3):443–473
Lim B, Zohren S (2021) Time-series forecasting with deep learning: a survey. Philo Trans Royal Soc A. 379(2194):20200209
Kong W, Dong ZY, Jia Y, Hill DJ, Xu Y, Zhang Y (2019) Short-term residential load forecasting based on lstm recurrent neural network. IEEE Trans Smart Grid. 10(1):841–851. https://doi.org/10.1109/TSG.2017.2753802
Sajjad M, Khan ZA, Ullah A, Hussain T, Ullah W, Lee MY, Baik SW (2020) A novel cnn-gru-based hybrid approach for short-term residential load forecasting. IEEE Access. 8:143759–143768 . https://doi.org/10.1109/ACCESS.2020.3009537
Zhou H, Zhang S, Peng J, Zhang S, Li J, Xiong H, Zhang W (2021) Informer: Beyond efficient transformer for long sequence time-series forecasting. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 35, pp 11106–11115
Wu H, Xu J, Wang J, Long M (2021) Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Adv Neural Inf Process Syst. 34:22419–22430
Zeng A, Chen M, Zhang L, Xu Q (2023) Are transformers effective for time series forecasting? In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 37, pp 11121–11128 (2023)
Sen R, Yu H-F, Dhillon I (2019) Think Globally. A Deep Neural Network Approach to High-Dimensional Time Series Forecasting, Act Locally
Liu M, Zeng A, Chen M, Xu Z, Lai Q, Ma L, Xu Q (2022) SCINet: Time series modeling and forecasting with sample convolution and interaction. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems, vol 35, pp 5816–5828. https://proceedings.neurips.cc/paper_files/paper/2022/file/266983d0949aed78a16fa4782237dea7-Paper-Conference.pdf
Defard T, Setkov A, Loesch A, Audigier R (2020) Padim: a patch distribution modeling framework for anomaly detection and localization. In: ICPR Workshops . https://api.semanticscholar.org/CorpusID:226976039
Sener O, Savarese S (2018) Active learning for convolutional neural networks: A core-set approach. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. https://openreview.net/forum?id=H1aIuk-RW
Sinha S, Zhang H, Goyal A, Bengio Y, Larochelle H, Odena A (2020) Small-GAN: Speeding up GAN training using core-sets. In: Daumé III, H., Singh, A. (eds.) Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol 119, pp 9005–9015. https://proceedings.mlr.press/v119/sinha20b.html
Van Houdt G, Mosquera C, Nápoles G (2020) A review on the long short-term memory model. Art Intell Rev. 53. https://doi.org/10.1007/s10462-020-09838-1
Cho K, Merriënboer B, Bahdanau D, Bengio Y (2014) On the properties of neural machine translation: Encoder–decoder approaches. In: Wu D, Carpuat M, Carreras X, Vecchi EM (eds.) Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, pp 103–111. Association for Computational Linguistics, Doha, Qatar . https://doi.org/10.3115/v1/W14-4012 . https://aclanthology.org/W14-4012
Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. cite arxiv:1412.3555Comment: Presented in NIPS 2014 Deep Learn Represent Learn Work . http://arxiv.org/abs/1412.3555
Zhou K, Hu D, Hu R, Zhou J (2023) High-resolution electric power load data of an industrial park with multiple types of buildings in china. Sci Data. 10 . https://doi.org/10.1038/s41597-023-02786-9
Martin Nascimento GF, Wurtz F, Kuo-Peng P, Delinchant B, Batistela N, Laranjeira T (2023) Green-er-electricity consumption data of a tertiary building. Frontiers in Sustain Cities. 5 . https://doi.org/10.3389/frsc.2023.1043657
Turowski M, Weber M, Neumann O, Heidrich B, Phipps K, Çakmak HK, Mikut R, Hagenmeyer V (2022) Modeling and generating synthetic anomalies for energy and power time series. e-Energy ’22, pp 471–484. Association for Computing Machinery, New York, NY, USA . https://doi.org/10.1145/3538637.3539760 . https://doi.org/10.1145/3538637.3539760
Wang L, Ding Y, Riedel T, Miclaus A, Beigl M (2017) Data analysis on building load profiles: A stepping stone to future campus. In: 2017 International Smart Cities Conference (ISC2), pp 1–4 . https://doi.org/10.1109/ISC2.2017.8090823
Challu C, Jiang P, Wu YN, Callot L (2022) Deep Generative model with Hierarchical Latent Factors for Time Series Anomaly Detection . https://arxiv.org/abs/2202.07586
Yahoo! Research (2015) Yahoo! Webscope dataset ydata-labeled-time-series-anomalies-v1_0. http://labs.yahoo.com/Academic_Relations
Wu R, Keogh EJ (2022) Current time series anomaly detection benchmarks are flawed and are creating the illusion of progress (extended abstract). In: 2022 IEEE 38th International Conference on Data Engineering (ICDE), pp 1479–1480 . https://doi.org/10.1109/ICDE53745.2022.00116
Romanuke V (2021) Time series smoothing improving forecasting. Applied Comput Syst. 26(1):60–70
Acknowledgements
The completion of this research was made possible thanks to the Natural Sciences and Engineering Research Council of Canada (NSERC) and a start-up grant from Concordia University. The authors would like to thank Prof. Nizar Bouguila for insightful discussions about the machine learning models.
Author information
Authors and Affiliations
Contributions
Maher Dissem: Formal Analysis, Investigation, Methodology, Software, Visualization, Writing - original draft. Manar Amayri: Conceptualization, Data curation, Funding acquisition, Project administration, Resources, Supervision, Validation, Writing - review & editing.
Corresponding author
Ethics declarations
Competing Interests
The authors have no relevant financial or non-financial interests to disclose.
Ethical and informed consent for data used
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Dissem, M., Amayri, M. Unsupervised anomaly detection and imputation in noisy time series data for enhancing load forecasting. Appl Intell 55, 11 (2025). https://doi.org/10.1007/s10489-024-05856-6
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-024-05856-6