Enhancing multivariate time-series anomaly detection with positional encoding mechanisms in transformers

Alioghli, Abdul Amir; Yıldırım Okay, Feyza

doi:10.1007/s11227-024-06694-6

Enhancing multivariate time-series anomaly detection with positional encoding mechanisms in transformers

Published: 11 December 2024

Volume 81, article number 282, (2025)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

283 Accesses
Explore all metrics

Abstract

The surge in automation driven by IoT devices has generated extensive time-series data with highly variable features, posing challenges in anomaly detection. DL, particularly Transformer networks, has shown promise in addressing these issues. However, Transformer networks struggle with accurately determining the position of data points and maintaining the order of data in sequences, leading to the development of Positional Encoding (PE). Initially, Absolute PE was introduced, but newer methods like Relative PE and Rotary PE have been adopted in natural language processing tasks to improve performance. This study evaluates the potential of PEs including Absolute PE, Rotary PE, and two modifications of Relative PE methods (Representative attention and Global attention), for multivariate time-series anomaly detection problems. The experimental results indicate that Absolute PE, with a 98% accuracy score, performs well across different window sizes. Representative attention, with a 98% F1-score, performs best for short sequences (8, 16, and 32); whereas, Global attention, with a 97% F1-score, is more effective for longer sequences (64 and 128). Additionally, Absolute PE has the shortest training times, starting at 25 for sequence length 8 and increasing to 192 for length 128. Rotary PE also has slightly longer training times compared to Absolute PE. On the other hand, Representative attention consistently has the longest times, starting at 48 for length 8 and reaching 366 for length 128. Overall, Absolute PE and Global attention are the most time-efficient; while, Representative attention has significantly higher training times, particularly for long sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Transformers in Time-Series Analysis: A Tutorial

Article 25 July 2023

TransNAS-TSAD: harnessing transformers for multi-objective neural architecture search in time series anomaly detection

Article 05 December 2024

Anomaly Detection Representation Learning Framework Towards Mixed Time Series with Scalable Multivariate Fusion

Data availability

No datasets were generated or analyzed during the current study.

References

Fantin Irudaya Raj E, Appadurai M, Chithamabara Thanu M, Francy Irudaya Rani E (2023) IoT-based smart parking system for Indian smart cities, Chapter 15. Wiley
Laghari AA, Wu K, Laghari RA, Ali M, Khan AA (2022) Retracted article: a review and state of art of internet of things (IoT). Arch Comput Methods Eng 29:1395–1413
Article MATH Google Scholar
Gopinath A, Sivakumar S, Ranjani D, Kumari S, Perumal V, Prakash RB (2023) A communication system built on the internet of things for fully autonomous electric cars. In: 2023 7th International Conference on Intelligent Computing and Control Systems (ICICCS). pp 1515–1520
Mayer-Schönberger V, Cukier K (2013) Big data: a revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt. An Eamon Dolan book, Business book summary
Shickel B, Rashidi P (2020) Sequential interpretability: methods, applications, and future direction for understanding deep learning models in the context of sequential data. Preprint at arXiv:2004.12524
Cochrane JH (2005) Time series for macroeconomics and finance. Technical report, Graduate School of Business, University of Chicago, 5807 S. Woodlawn, Chicago, p 60637, 1997. Spring 1997; Pictures added Jan
Luo M (2023) Machine learning for time series analysis and forecasting. Master’s thesis, Northeastern University, Boston. Accepted: April 2023, Awarded: May
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3)
Teng M (2010) Anomaly detection on time series. In: 2010 IEEE International Conference on Progress in Informatics and Computing, vol 1, pp 603–608
Baidya R, Jeong H (2023) Anomaly detection in time series data using reversible instance normalized anomaly transformer. Sensors 23(22):9272
Article Google Scholar
Edgeworth FY (1997) Xli. on discordant observations. London Edinburgh Dublin Philos Mag J Sci 23(143):364–375
Article MATH Google Scholar
Cook AA, Misirli Göksel G, Fan Z (2020) Anomaly detection for IoT time-series data: a survey. IEEE Internet Things J 7(7):6481–6494
Article MATH Google Scholar
Ringberg H, Soule A, Rexford J, Diot C (2007) Sensitivity of PCA for traffic anomaly detection. SIGMETRICS Perform Eval Rev 35(1):109–120
Article MATH Google Scholar
Tianqi Yu, Wang Xianbin, Shami Abdallah (2017) Recursive principal component analysis-based data outlier detection and sensor data aggregation in IoT systems. IEEE Internet Things J 4(6):2207–2216
Article MATH Google Scholar
Liu FT, Ting KM, Zhou ZH (2008) Isolation forest. In: 2008 eighth IEEE International Conference on Data Mining, pp 413–422
Bay SD, Schwabacher M (2003) Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’03, Association for Computing Machinery, New York, pp 29–38
Shi T, Horvath S (2006) Unsupervised learning with random forest predictors. J Comput Graph Stat 15(1):118–138
Article MathSciNet MATH Google Scholar
Singh A (2017) Anomaly detection for temporal data using long short-term memory (LSTM). Master’s thesis, Royal Institute of Technology (KTH). Award Date: 31 August 2017
Tang C, Xu L, Yang B, Tang Y, Zhao D (2023) Gru-based interpretable multivariate time series anomaly detection in industrial control system. Comput Secur 127:103094
Article MATH Google Scholar
Li D, Chen D, Jin B, Shi L, Goh J, Ng SK (2019) Mad-gan: Multivariate anomaly detection for time series data with generative adversarial networks. In: Tetko I, Kurková V, Karpov P, Theis F (eds) Artificial neural networks and machine learning—ICANN 2019: text and time series, volume 11730 of lecture notes in computer science. Springer, Cham
Park D, Hoshi Y, Kemp CC (2018) A multimodal anomaly detector for robot-assisted feeding using an LSTM-based variational autoencoder. IEEE Robot Autom Lett 3(3):1544–1551
Article MATH Google Scholar
Çavdar T, Ebrahimpour N, Kakız Muhammet T, Günay FB (2023) Decision-making for the anomalies in IIoTs based on 1D convolutional neural networks and Dempster–Shafer theory (DS-1DCNN). J Supercomput 79(2):1683–1704
Article Google Scholar
Abbas S, Alsubai S, Ojo S, Sampedro GA, Almadhor A, Hejaili AA, Bouazzi I (2024) An efficient deep recurrent neural network for detection of cyberattacks in realistic IoT environment. J Supercomput 1–19
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, Curran Associates Inc, Red Hook, pp 6000–6010
Devlin J, Chang M-W, Lee K, Toutanova K (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: North American chapter of the association for computational linguistics
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2020) An image is worth 16x16 words: transformers for image recognition at scale. CoRR arXiv:2010.11929
Dong L, Xu S, Xu B (2018) Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 5884–5888
Shaw P, Uszkoreit J, Vaswani A (2018) Self-attention with relative position representations. CoRR arXiv:1803.02155
Huang CZ, Vaswani A, Uszkoreit J, Shazeer N, Hawthorne C, Dai AM, Hoffman MD, Eck D (2018) An improved relative self-attention mechanism for transformer with application to music generation. CoRR arXiv:1809.04281
Su J, Lu Y, Pan S, Wen B, Liu Y (2021) Roformer: enhanced transformer with rotary position embedding. CoRR arXiv:2104.09864
Meng H, Zhang Y, Li Y, Zhao H (2020) Spacecraft anomaly detection via transformer reconstruction error. In: Jing Z (ed) Proceedings of the International Conference on Aerospace System Science and Engineering 2019. ICASSE 2019, volume 622 of Lecture Notes in Electrical Engineering, Springer, Singapore
Zhang H, Xia Y, Yan T, Liu G (2021) Unsupervised anomaly detection in multivariate time series through transformer-based variational autoencoder. In: 2021 33rd Chinese Control and Decision Conference (CCDC). pp 281–286
Xu J, Wu H, Wang J, Long M (2021) Anomaly transformer: time series anomaly detection with association discrepancy. CoRR arXiv:2110.02642
Chen Z, Chen D, Zhang X, Yuan Z, Cheng X (2022) Learning graph structures with transformer for multivariate time-series anomaly detection in IoT. IEEE Internet Things J 9(12):9179–9189
Article MATH Google Scholar
Tuli S, Casale G, Jennings NR (2022) Tranad: deep transformer networks for anomaly detection in multivariate time series data. Proc VLDB Endow 15(6):1201–1214
Article Google Scholar
Zhang S, Liu Y, Zhang X, Cheng W, Chen H, Xiong H (2022) Cat: beyond efficient transformer for content-aware anomaly detection in event sequences. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’22. Association for Computing Machinery, New York, pp 4541–4550
Ding C, Zhao J, Sun S (2023) Concept drift adaptation for time series anomaly detection via transformer. Neural Process Lett 55(6):2081–2101
Article MATH Google Scholar
Li Y, Peng X, Zhang J, Li Z, Wen M (2023) DCT-GAN: dilated convolutional transformer-based GAN for time series anomaly detection. IEEE Trans Knowl Data Eng 35(4):3632–3644
Article MATH Google Scholar
Song J, Kim K, Oh J, Cho S (2023) MEMTO: memory-guided transformer for multivariate time series anomaly detection. In: Thirty-Seventh Conference on Neural Information Processing Systems
Kim J, Kang H, Kang P (2023) Time-series anomaly detection with stacked transformer representations and 1d convolutional network. Eng Appl Artif Intell 120:105964
Article MATH Google Scholar
Shin A-H, Kim ST, Park G-M (2023) Time series anomaly detection using transformer-based GAN with two-step masking. IEEE Access 11:74035–74047
Article MATH Google Scholar
Wang X, Pi D, Zhang X, Liu H, Guo C (2022) Variational transformer-based anomaly detection approach for multivariate time series. Measurement 191:110791
Article MATH Google Scholar
Foumani NM, Tan CW, Webb GI, Salehi M (2024) Improving position encoding of transformers for multivariate time series classification. Data Min Knowl Discov 38:22–48
Article MathSciNet MATH Google Scholar
Ba JL, Kiros JR, Hinton GE (2016) Layer normalization
Chen P-C, Tsai H, Bhojanapalli S, Chung HW, Chang Y-W, Ferng C-S (2021) Demystifying the better performance of position encoding variants for transformer. CoRR arXiv:2104.08698
Shin H-K, Lee W, Yun J-H, Min B-G (2021) Two ics security datasets and anomaly detection contest on the hil-based augmented ics testbed. In: Proceedings of the 14th Cyber Security Experimentation and Test Workshop. pp 36–40
Akiba T, Sano S, Yanase T, Ohta T, Koyama M (2019) Optuna: A next-generation hyperparameter optimization framework. CoRR arXiv:1907.10902

Download references

Author information

Authors and Affiliations

Department of Computer Science, Gazi University, Ankara, Türkiye
Abdul Amir Alioghli
Department of Computer Engineering, Gazi University, Ankara, Türkiye
Feyza Yıldırım Okay

Authors

Abdul Amir Alioghli
View author publications
You can also search for this author in PubMed Google Scholar
Feyza Yıldırım Okay
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Methodology and writing—reviewing contributed by Abdul Amir Alioghli and Feyza Yıldırım Okay; editing and supervision contributed by Feyza Yıldırım Okay; conceptualization, software and visualization contributed by Abdul Amir Alioghli. All authors have read and legally accepted the final version of the article published in the journal.

Corresponding author

Correspondence to Feyza Yıldırım Okay.

Ethics declarations

Conflict of interest

The authors declare no Conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Alioghli, A.A., Yıldırım Okay, F. Enhancing multivariate time-series anomaly detection with positional encoding mechanisms in transformers. J Supercomput 81, 282 (2025). https://doi.org/10.1007/s11227-024-06694-6

Download citation

Accepted: 05 November 2024
Published: 11 December 2024
DOI: https://doi.org/10.1007/s11227-024-06694-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhancing multivariate time-series anomaly detection with positional encoding mechanisms in transformers

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Transformers in Time-Series Analysis: A Tutorial

TransNAS-TSAD: harnessing transformers for multi-objective neural architecture search in time series anomaly detection

Anomaly Detection Representation Learning Framework Towards Mixed Time Series with Scalable Multivariate Fusion

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now