research-article

Multi-source System Log Behavior Pattern Mining Method Based on FP-Growth

Authors:
Daojuan Zhang

State Grid Laboratory of Power Cyber-Security Protection and Monitoring Technology, State Grid Smart Grid Research Institute co., Ltd, China

State Grid Laboratory of Power Cyber-Security Protection and Monitoring Technology, State Grid Smart Grid Research Institute co., Ltd, China

0009-0000-7173-8895
View Profile

,
Tianqi Wu

State Grid Laboratory of Power Cyber-Security Protection and Monitoring Technology, State Grid Smart Grid Research Institute co., Ltd, China

State Grid Laboratory of Power Cyber-Security Protection and Monitoring Technology, State Grid Smart Grid Research Institute co., Ltd, China

0009-0000-4094-3482
View Profile

,
Xiaoming Zhou

State Grid Liaoning Electric Power Supply Co., Ltd., China

State Grid Liaoning Electric Power Supply Co., Ltd., China

0009-0009-7144-8762
View Profile

,
Bo Hu

State Grid Liaoning Electric Power Supply Co., Ltd., China

State Grid Liaoning Electric Power Supply Co., Ltd., China

0009-0006-8495-5665
View Profile

,
Wenjie Zhang

State Grid Liaoning Electric Power Supply Co., Ltd., China

State Grid Liaoning Electric Power Supply Co., Ltd., China

0009-0001-0910-1188
View Profile

CNML '23: Proceedings of the 2023 International Conference on Communication Network and Machine LearningOctober 2023Pages 248–254https://doi.org/10.1145/3640912.3640961

Published:22 February 2024Publication History

CNML '23: Proceedings of the 2023 International Conference on Communication Network and Machine Learning

Pages 248–254

ABSTRACT

By analyzing and mining system logs, it is possible to effectively discover behavioral characteristics and anomalies of network users or systems. Through association analysis, patterns and correlations between different log sources can be identified, thereby enabling the detection of various intrusion behaviors in computer networks. To address the challenge of handling massive and complex log data, this paper proposes a multi-source system log behavior pattern mining method based on FP-Growth. This method efficiently extracts frequent patterns and abnormal behaviors from large-scale system logs. Firstly, to ensure data quality and facilitate subsequent data mining processes, we perform necessary structuring and cleansing of raw logs and transform them into a format suitable for data analysis. Subsequently, given the variations in structure and content across different types of logs, we conduct distinct feature extraction for each type of log to retain essential information and generate transaction items. The data is then organized into datasets based on temporal partitions, forming a transaction database as input for the association rule mining algorithm. Finally, utilizing the FP-Growth association rule mining algorithm, this paper explores the relationships between entries from various types of single and multi-source logs. Based on metrics such as support, appropriate association rules are selected for behavior pattern mining. Experimental results demonstrate that the proposed method effectively uncovers typical behavior patterns.

References

Du M, Li F, Zheng G, DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning [C]// Computer and Communications Security. ACM, 2017: 1285-1298.Google Scholar
Pahl C, Donnellan D. Data Mining Technology for the Evaluation of Web-based Teaching and Learning Systems [C]// 7th Int. Conference on E-Learning in Business, Government and Higher Education. Montreal, Canada: Association for the Advancement of Computing in Education (AACE), 2002.Google Scholar
Alspaugh S, Chen B, Lin J, Analyzing log analysis: an empirical study of user log mining [C]// Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference. Philadelphia, USA: USENIX Association, 2014: 159-170.Google Scholar
Oliner A, Ganapathi A, Xu W. Advances and challenges in log analysis [J]. Communications of the ACM, 2012, 55(2): 55-61.Google ScholarDigital Library
Chen Y, Srinivasan K, Goodson G R, Design implications for enterprise storage systems via multi-dimensional trace analysis [C]// Acm Symposium on Operating Systems Principles. ACM, 2011.Google Scholar
Chen Y, Alspaugh S, Katz R. Interactive Analytical Processing in Big Data Systems: A Cross-Industry Study of MapReduce Workloads [J]. Proceedings of the Vldb Endowment, 2012, 5(12): 1802-1813.Google ScholarDigital Library
Isermann R. Fault-diagnosis systems: an introduction from fault detection to fault tolerance [J]. Berlin: Springer, 2006, 6(13): 134-156.Google Scholar
Gideon C, Moshe M, Eran R. System and method for risk detection and analysis in a computer network: US, US6952779 B1 [P]. 2008.Google Scholar
Bakoben M, Adams N, Bellotti A. Uncertainty aware clustering for behaviour in enterprise networks [C]// IEEE International Conference on Data Mining Workshops. IEEE, 2017.Google Scholar
. Lou J G, Qiang F, Yang S, Mining invariants from console logs for system problem detection [J]. Proc of Usenix Atc, 2010, 63(7): 23- 35.Google Scholar
Fayyad U M, Chaudhuri S, Madigan D. Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining [C]. New York: AAAI Press, 1998: 35-38.Google Scholar
Bridges S M, Vaughn R B. Intrusion detection via fuzzy data mining [J]. Proceedings Annual Canadian Information Technology Security Symposium, 2000, 38(6): 167-179.Google Scholar
Maheswari B U, Sumathi P. A new clustering and preprocessing for web log mining [C]// Computing & Communication Technologies-Research, Innovation, and Vision for the Future (RIVF), 2014 IEEE RIVF International Conference on. IEEE, 2014: 25-29.Google Scholar
Yan X, Zhang J, Xun Y, A parallel algorithm for mining constrained frequent patterns using MapReduce [J]. Soft Computing, 2017, 21(9): 2237-2249.Google ScholarDigital Library
Ahmed S A, Nath B. Modified FP-Growth: An efficient frequent pattern mining approach from FP-Tree [J]. Pattern Recognition and Machine Intelligence. Singapore: Springer Nature Singapore Pte Ltd., 2019, (2): 1-8.Google Scholar
Shawkat M, Badawi M, El-ghamrawy S, An optimized FP-growth algorithm for discovery of association rules [J]. Journal of Supercomputing, 2022, 78: 5479-5506.Google ScholarDigital Library
Yang J, Cai Y, Wei Y. 2016. Frequent pattern mining algorithm for uncertain data streams based on sliding window [C]// 2016 8th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC). IEEE.Google Scholar
Hagberg A A, National L A, Alamos L, Exploring network structure, dynamics, and function using NetworkX [J]. Los Alamos National Lab. (LANL), Los Alamos, NM (United States), 2008, 34(6): 5-15.Google Scholar

Index Terms

Multi-source System Log Behavior Pattern Mining Method Based on FP-Growth

Index terms have been assigned to the content through auto-classification.

Recommendations

Modified FP-Growth: An Efficient Frequent Pattern Mining Approach from FP-Tree
Pattern Recognition and Machine Intelligence
Abstract
Prefix-tree based FP-growth algorithm is a two step process: construction of frequent pattern tree (FP-tree) and then generates the frequent patterns from the tree. After constructing the FP-tree, if we merely use the conditional FP-trees (CFP-...
Read More
Pattern-growth methods for frequent pattern mining
Read More
Batch incremental processing for FP-tree construction using FP-Growth algorithm

In the present scenario of global economy and World Wide Web, large sets of evolving and distributed data can be handled efficiently by incremental data mining. Frequent patterns are very important in knowledge discovery and data mining process, such as ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

CNML '23: Proceedings of the 2023 International Conference on Communication Network and Machine Learning
October 2023
446 pages
ISBN:9798400716683
DOI:10.1145/3640912

Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 February 2024
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 6
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Multi-source System Log Behavior Pattern Mining Method Based on FP-Growth

CNML '23: Proceedings of the 2023 International Conference on Communication Network and Machine Learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

Modified FP-Growth: An Efficient Frequent Pattern Mining Approach from FP-Tree

Pattern-growth methods for frequent pattern mining

Batch incremental processing for FP-tree construction using FP-Growth algorithm

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Multi-source System Log Behavior Pattern Mining Method Based on FP-Growth

CNML '23: Proceedings of the 2023 International Conference on Communication Network and Machine Learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

Modified FP-Growth: An Efficient Frequent Pattern Mining Approach from FP-Tree

Pattern-growth methods for frequent pattern mining

Batch incremental processing for FP-tree construction using FP-Growth algorithm

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media