Alarm Log Data Augmentation Algorithm Based on a GAN Model and Apriori

Yang, Yang; Li, Yu-Ting; Huo, Yong-Hua; Gao, Zhi-Peng; Rui, Lan-Lan

doi:10.1007/s11390-024-2408-1

Alarm Log Data Augmentation Algorithm Based on a GAN Model and Apriori

Regular Paper
Published: 20 September 2024

Volume 39, pages 951–966, (2024)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Yang Yang (杨杨)¹,
Yu-Ting Li (李昱廷)¹,
Yong-Hua Huo (霍永华)²,
Zhi-Peng Gao (高志鹏)¹ &
…
Lan-Lan Rui (芮兰兰)¹

219 Accesses
1 Altmetric
Explore all metrics

Abstract

The complexity of alarm detection and diagnosis tasks often results in a lack of alarm log data. Due to the strong rule associations inherent in alarm log data, existing data augmentation algorithms cannot obtain good results for alarm log data. To address this problem, this paper introduces a new algorithm for augmenting alarm log data, termed APRGAN, which combines a generative adversarial network (GAN) with the Apriori algorithm. APRGAN generates alarm log data under the guidance of rules mined by the rule miner. Moreover, we propose a new dynamic updating mechanism to alleviate the mode collapse problem of the GAN. In addition to updating the real reference dataset used to train the discriminator in the GAN, we dynamically update the parameters and the rule set of the Apriori algorithm according to the data generated in each epoch. Through extensive experimentation on two public datasets, it is demonstrated that APRGAN surpasses other data augmentation algorithms in the domain with respect to alarm log data augmentation, as evidenced by its superior performance on metrics such as BLEU, ROUGE, and METEOR.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Du M, Li F F, Zheng G N, Srikumar V. DeepLog: Anomaly detection and diagnosis from system logs through deep learning. In Proc. the 2017 ACM SIGSAC Conference on Computer and Communications Security, Oct. 2017, pp.1285–1298. DOI: https://doi.org/10.1145/3133956.3134015.
Chapter Google Scholar
Fu Q, Lou J G, Wang Y, Li J. Execution anomaly detection in distributed systems through unstructured log analysis. In Proc. the 9th IEEE International Conference on Data Mining, Dec. 2009, pp.149–158. DOI: https://doi.org/10.1109/ICDM.2009.60.
Google Scholar
He S L, Zhu J M, He P J, Lyu M R. Experience report: System log analysis for anomaly detection. In Proc. the 27th IEEE International Symposium on Software Reliability Engineering, Oct. 2016, pp.207–218. DOI: https://doi.org/10.1109/ISSRE.2016.21.
Google Scholar
Shorten C, Khoshgoftaar T M. A survey on image data augmentation for deep learning. Journal of Big Data, 2019, 6(1): 60. DOI: https://doi.org/10.1186/s40537-019-0197-0.
Article Google Scholar
Lou J G, Fu Q, Yang S Q, Xu Y, Li J. Mining invariants from console logs for system problem detection. In Proc. the 2010 USENIX conference on USENIX Annual Technical Conference, Jun. 2010, Article No. 24.
Google Scholar
Xu W, Huang L, Fox A, Patterson D, Jordan M I. Detecting large-scale system problems by mining console logs. In Proc. the 22nd ACM SIGOPS Symposium on Operating Systems Principles, Oct. 2009, pp.117–132. DOI: https://doi.org/10.1145/1629575.1629587.
Chapter Google Scholar
Zhang C K, Wang X Y, Zhang H Y, Zhang H Y, Han P Y. Log sequence anomaly detection based on local information extraction and globally sparse Transformer model. IEEE Trans. Network and Service Management, 2021, 18(4): 4119–4133. DOI: https://doi.org/10.1109/TNSM.2021.3125967.
Article Google Scholar
Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In Proc. the 27th International Conference on Neural Information Processing Systems, Dec. 2014, pp.2672–2680.
Google Scholar
Agrawal R, Srikant R. Fast algorithms for mining association rules in large databases. In Proc. the 20th International Conference on Very Large Data Bases, Sept. 1994, pp.487–499.
Google Scholar
Du M, Li F F. Spell: Streaming parsing of system event logs. In Proc. the 16th IEEE International Conference on Data Mining, Dec. 2016, pp.859–864. DOI: https://doi.org/10.1109/ICDM.2016.0103.
Google Scholar
Liu P, Wang X M, Xiang C, Meng W Y. A survey of text data augmentation. In Proc. the 2020 International Conference on Computer Communication and Network Security, Aug. 2020, pp.191–195. DOI: https://doi.org/10.1109/CCNS50731.2020.00049.
Google Scholar
Chawla N V, Bowyer K W, Hall L O, Kegelmeyer W P. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 2002, 16(1): 321–357.
Article Google Scholar
Alejo R, García V, Pacheco-Sánchez J H. An efficient over-sampling approach based on mean square error back-propagation for dealing with the multi-class imbalance problem. Neural Processing Letters, 2015, 42(3): 603–617. DOI: https://doi.org/10.1007/s11063-014-9376-3.
Article Google Scholar
Rivera W A. Noise reduction a priori synthetic over-sampling for class imbalanced data sets. Information Sciences, 2017, 408: 146–161. DOI: https://doi.org/10.1016/j.ins.2017.04.046.
Article Google Scholar
Yu L T, Zhang W N, Wang J, Yu Y. seqGAN: Sequence generative adversarial nets with policy gradient. In Proc. the 31st AAAI Conference on Artificial Intelligence, Feb. 2017, pp.2852–2858. DOI: https://doi.org/10.1609/aaai.v31i1.10804.
Google Scholar
Lin K, Li D Q, He X D, Zhang Z Y, Sun M T. Adversarial ranking for language generation. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.3158–3168.
Google Scholar
Guo J X, Lu S D, Cai H, Zhang W N, Yu Y, Wang J. Long text generation via adversarial training with leaked information. In Proc. the 32nd AAAI Conference on Artificial Intelligence, Feb. 2018, pp.5141–5148. DOI: https://doi.org/10.1609/aaai.v32i1.11957.
Google Scholar
Makanju A, Zincir-Heywood A N, Milios E E. Investigating event log analysis with minimum apriori information. In Proc. the 2013 IFIP/IEEE International Symposium on Integrated Network Management, May 2013, pp.962–968.
Google Scholar
Hu W K, Chen T W, Shah S L. Discovering association rules of mode-dependent alarms from alarm and event logs. IEEE Trans. Control Systems Technology, 2018, 26(3): 971–983. DOI: https://doi.org/10.1109/TCST.2017.2695169.
Article Google Scholar
Wang C, Vo H T, Ni P. An IoT application for fault diagnosis and prediction. In Proc. the 2015 IEEE International Conference on Data Science and Data Intensive Systems, Dec. 2015, pp.726–731. DOI: https://doi.org/10.1109/DSDIS.2015.97.
Google Scholar
Mikolov T, Karafiát M, Burget L, Cernocky J, Khudanpur S. Recurrent neural network based language model. In Proc. the 11th Annual Conference of the International Speech Communication Association, Sept. 2010, pp.1045–1048.
Google Scholar
Sutton R S, McAllester D, Singh S, Mansour Y. Policy gradient methods for reinforcement learning with function approximation. In Proc. the 12th International Conference on Neural Information Processing Systems, Nov. 1999, pp.1057–1063.
Google Scholar
Borthakur D. HDFS architecture guide. May 2022. https://hadoop.apache.org/docs/r1.2.1/hdfs_design.pdf, Jul. 2024.
Google Scholar
Rosado T, Bernardino J. An overview of openstack architecture. In Proc. the 18th International Database Engineering & Applications Symposium, Jul. 2014, pp.366–367. DOI: https://doi.org/10.1145/2628194.2628195.
Chapter Google Scholar
Papineni K, Roukos S, Ward T, Zhu W J. Bleu: A method for automatic evaluation of machine translation. In Proc. the 40th Annual Meeting of the Association for Computational Linguistics, Jul. 2002, pp.311–318. DOI: https://doi.org/10.3115/1073083.1073135.
Google Scholar
Lin C. ROUGE: A package for automatic evaluation of summaries. In Proc. the 2004 Text Summarization Branches Out, Jul. 2004, pp.74–81.
Google Scholar
Banerjee S, Lavie A. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proc. the 2005 ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Jun. 2005, pp.65–72.
Google Scholar

Download references

Author information

Authors and Affiliations

State Key Laboratory of Network and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, 100876, China
Yang Yang (杨杨), Yu-Ting Li (李昱廷), Zhi-Peng Gao (高志鹏) & Lan-Lan Rui (芮兰兰)
The 54th Research Institute of China Electronics Technology Group Corporation, Shijiazhuang, 050081, China
Yong-Hua Huo (霍永华)

Authors

Yang Yang (杨杨)
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Ting Li (李昱廷)
View author publications
You can also search for this author in PubMed Google Scholar
Yong-Hua Huo (霍永华)
View author publications
You can also search for this author in PubMed Google Scholar
Zhi-Peng Gao (高志鹏)
View author publications
You can also search for this author in PubMed Google Scholar
Lan-Lan Rui (芮兰兰)
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhi-Peng Gao (高志鹏).

Ethics declarations

Conflict of Interest The authors declare that they have no conflict of interest.

Additional information

This work was supported by the National Key Research and Development Program of China under Grant No. 2019YFB-2103202.

Yang Yang received her Ph.D. degree in computer science from Beijing University of Posts and Telecommunications (BUPT), Beijing, in 2011. She is currently an associate professor at the State Key Laboratory of Network and Switching Technology of BUPT. Her research interests are in the area of network management based on big data and artificial intelligence, and related fields.

Yu-Ting Li received his B.S. drgree in computer science and technology from Beijing University of Posts and Telecommunications (BUPT), Beijing, in 2020. He is now pursing his M.S. degree at the State Key Laboratory of Networking and Switching Technology of BUPT, Beijing. His research interests cover fault diagnosis and data augmentation.

Yong-Hua Huo is a senior engineer of the Communication Networks Laboratory of the 54th Research Institute of China Electronics Technology Group Corporation, Shijiazhuang. Her research interests are network fault management and network anomaly detection.

Zhi-Peng Gao received his Ph.D. degree in computer science from Beijing University of Posts and Telecommunications (BUPT), Beijing, in 2007. He is currently a professor at the State Key Laboratory of Network and Switching Technology of BUPT, Beijing. His research interests are in the area of blockchain, big data analysis, edge computing, edge intelligent, and related fields.

Lan-Lan Rui is an associate professor of State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications (BUPT), Beijing. She received her Ph.D. degree in computer application technology from BUPT, Beijing, in 2010. Her research interests include edge computing, content based measurement and analysis, quality of service (QoS), smart service provisioning in mobile social network, and intelligent theory and technology of network services.

Electronic supplementary material