ABSTRACT
By analyzing and mining system logs, it is possible to effectively discover behavioral characteristics and anomalies of network users or systems. Through association analysis, patterns and correlations between different log sources can be identified, thereby enabling the detection of various intrusion behaviors in computer networks. To address the challenge of handling massive and complex log data, this paper proposes a multi-source system log behavior pattern mining method based on FP-Growth. This method efficiently extracts frequent patterns and abnormal behaviors from large-scale system logs. Firstly, to ensure data quality and facilitate subsequent data mining processes, we perform necessary structuring and cleansing of raw logs and transform them into a format suitable for data analysis. Subsequently, given the variations in structure and content across different types of logs, we conduct distinct feature extraction for each type of log to retain essential information and generate transaction items. The data is then organized into datasets based on temporal partitions, forming a transaction database as input for the association rule mining algorithm. Finally, utilizing the FP-Growth association rule mining algorithm, this paper explores the relationships between entries from various types of single and multi-source logs. Based on metrics such as support, appropriate association rules are selected for behavior pattern mining. Experimental results demonstrate that the proposed method effectively uncovers typical behavior patterns.
- Du M, Li F, Zheng G, DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning [C]// Computer and Communications Security. ACM, 2017: 1285-1298.Google Scholar
- Pahl C, Donnellan D. Data Mining Technology for the Evaluation of Web-based Teaching and Learning Systems [C]// 7th Int. Conference on E-Learning in Business, Government and Higher Education. Montreal, Canada: Association for the Advancement of Computing in Education (AACE), 2002.Google Scholar
- Alspaugh S, Chen B, Lin J, Analyzing log analysis: an empirical study of user log mining [C]// Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference. Philadelphia, USA: USENIX Association, 2014: 159-170.Google Scholar
- Oliner A, Ganapathi A, Xu W. Advances and challenges in log analysis [J]. Communications of the ACM, 2012, 55(2): 55-61.Google ScholarDigital Library
- Chen Y, Srinivasan K, Goodson G R, Design implications for enterprise storage systems via multi-dimensional trace analysis [C]// Acm Symposium on Operating Systems Principles. ACM, 2011.Google Scholar
- Chen Y, Alspaugh S, Katz R. Interactive Analytical Processing in Big Data Systems: A Cross-Industry Study of MapReduce Workloads [J]. Proceedings of the Vldb Endowment, 2012, 5(12): 1802-1813.Google ScholarDigital Library
- Isermann R. Fault-diagnosis systems: an introduction from fault detection to fault tolerance [J]. Berlin: Springer, 2006, 6(13): 134-156.Google Scholar
- Gideon C, Moshe M, Eran R. System and method for risk detection and analysis in a computer network: US, US6952779 B1 [P]. 2008.Google Scholar
- Bakoben M, Adams N, Bellotti A. Uncertainty aware clustering for behaviour in enterprise networks [C]// IEEE International Conference on Data Mining Workshops. IEEE, 2017.Google Scholar
- . Lou J G, Qiang F, Yang S, Mining invariants from console logs for system problem detection [J]. Proc of Usenix Atc, 2010, 63(7): 23- 35.Google Scholar
- Fayyad U M, Chaudhuri S, Madigan D. Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining [C]. New York: AAAI Press, 1998: 35-38.Google Scholar
- Bridges S M, Vaughn R B. Intrusion detection via fuzzy data mining [J]. Proceedings Annual Canadian Information Technology Security Symposium, 2000, 38(6): 167-179.Google Scholar
- Maheswari B U, Sumathi P. A new clustering and preprocessing for web log mining [C]// Computing & Communication Technologies-Research, Innovation, and Vision for the Future (RIVF), 2014 IEEE RIVF International Conference on. IEEE, 2014: 25-29.Google Scholar
- Yan X, Zhang J, Xun Y, A parallel algorithm for mining constrained frequent patterns using MapReduce [J]. Soft Computing, 2017, 21(9): 2237-2249.Google ScholarDigital Library
- Ahmed S A, Nath B. Modified FP-Growth: An efficient frequent pattern mining approach from FP-Tree [J]. Pattern Recognition and Machine Intelligence. Singapore: Springer Nature Singapore Pte Ltd., 2019, (2): 1-8.Google Scholar
- Shawkat M, Badawi M, El-ghamrawy S, An optimized FP-growth algorithm for discovery of association rules [J]. Journal of Supercomputing, 2022, 78: 5479-5506.Google ScholarDigital Library
- Yang J, Cai Y, Wei Y. 2016. Frequent pattern mining algorithm for uncertain data streams based on sliding window [C]// 2016 8th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC). IEEE.Google Scholar
- Hagberg A A, National L A, Alamos L, Exploring network structure, dynamics, and function using NetworkX [J]. Los Alamos National Lab. (LANL), Los Alamos, NM (United States), 2008, 34(6): 5-15.Google Scholar
Index Terms
- Multi-source System Log Behavior Pattern Mining Method Based on FP-Growth
Recommendations
Modified FP-Growth: An Efficient Frequent Pattern Mining Approach from FP-Tree
Pattern Recognition and Machine IntelligenceAbstractPrefix-tree based FP-growth algorithm is a two step process: construction of frequent pattern tree (FP-tree) and then generates the frequent patterns from the tree. After constructing the FP-tree, if we merely use the conditional FP-trees (CFP-...
Batch incremental processing for FP-tree construction using FP-Growth algorithm
In the present scenario of global economy and World Wide Web, large sets of evolving and distributed data can be handled efficiently by incremental data mining. Frequent patterns are very important in knowledge discovery and data mining process, such as ...
Comments