skip to main content
10.1145/3640912.3640961acmotherconferencesArticle/Chapter ViewAbstractPublication PagescnmlConference Proceedingsconference-collections
research-article

Multi-source System Log Behavior Pattern Mining Method Based on FP-Growth

Authors Info & Claims
Published:22 February 2024Publication History

ABSTRACT

By analyzing and mining system logs, it is possible to effectively discover behavioral characteristics and anomalies of network users or systems. Through association analysis, patterns and correlations between different log sources can be identified, thereby enabling the detection of various intrusion behaviors in computer networks. To address the challenge of handling massive and complex log data, this paper proposes a multi-source system log behavior pattern mining method based on FP-Growth. This method efficiently extracts frequent patterns and abnormal behaviors from large-scale system logs. Firstly, to ensure data quality and facilitate subsequent data mining processes, we perform necessary structuring and cleansing of raw logs and transform them into a format suitable for data analysis. Subsequently, given the variations in structure and content across different types of logs, we conduct distinct feature extraction for each type of log to retain essential information and generate transaction items. The data is then organized into datasets based on temporal partitions, forming a transaction database as input for the association rule mining algorithm. Finally, utilizing the FP-Growth association rule mining algorithm, this paper explores the relationships between entries from various types of single and multi-source logs. Based on metrics such as support, appropriate association rules are selected for behavior pattern mining. Experimental results demonstrate that the proposed method effectively uncovers typical behavior patterns.

References

  1. Du M, Li F, Zheng G, DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning [C]// Computer and Communications Security. ACM, 2017: 1285-1298.Google ScholarGoogle Scholar
  2. Pahl C, Donnellan D. Data Mining Technology for the Evaluation of Web-based Teaching and Learning Systems [C]// 7th Int. Conference on E-Learning in Business, Government and Higher Education. Montreal, Canada: Association for the Advancement of Computing in Education (AACE), 2002.Google ScholarGoogle Scholar
  3. Alspaugh S, Chen B, Lin J, Analyzing log analysis: an empirical study of user log mining [C]// Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference. Philadelphia, USA: USENIX Association, 2014: 159-170.Google ScholarGoogle Scholar
  4. Oliner A, Ganapathi A, Xu W. Advances and challenges in log analysis [J]. Communications of the ACM, 2012, 55(2): 55-61.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chen Y, Srinivasan K, Goodson G R, Design implications for enterprise storage systems via multi-dimensional trace analysis [C]// Acm Symposium on Operating Systems Principles. ACM, 2011.Google ScholarGoogle Scholar
  6. Chen Y, Alspaugh S, Katz R. Interactive Analytical Processing in Big Data Systems: A Cross-Industry Study of MapReduce Workloads [J]. Proceedings of the Vldb Endowment, 2012, 5(12): 1802-1813.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Isermann R. Fault-diagnosis systems: an introduction from fault detection to fault tolerance [J]. Berlin: Springer, 2006, 6(13): 134-156.Google ScholarGoogle Scholar
  8. Gideon C, Moshe M, Eran R. System and method for risk detection and analysis in a computer network: US, US6952779 B1 [P]. 2008.Google ScholarGoogle Scholar
  9. Bakoben M, Adams N, Bellotti A. Uncertainty aware clustering for behaviour in enterprise networks [C]// IEEE International Conference on Data Mining Workshops. IEEE, 2017.Google ScholarGoogle Scholar
  10. . Lou J G, Qiang F, Yang S, Mining invariants from console logs for system problem detection [J]. Proc of Usenix Atc, 2010, 63(7): 23- 35.Google ScholarGoogle Scholar
  11. Fayyad U M, Chaudhuri S, Madigan D. Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining [C]. New York: AAAI Press, 1998: 35-38.Google ScholarGoogle Scholar
  12. Bridges S M, Vaughn R B. Intrusion detection via fuzzy data mining [J]. Proceedings Annual Canadian Information Technology Security Symposium, 2000, 38(6): 167-179.Google ScholarGoogle Scholar
  13. Maheswari B U, Sumathi P. A new clustering and preprocessing for web log mining [C]// Computing & Communication Technologies-Research, Innovation, and Vision for the Future (RIVF), 2014 IEEE RIVF International Conference on. IEEE, 2014: 25-29.Google ScholarGoogle Scholar
  14. Yan X, Zhang J, Xun Y, A parallel algorithm for mining constrained frequent patterns using MapReduce [J]. Soft Computing, 2017, 21(9): 2237-2249.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ahmed S A, Nath B. Modified FP-Growth: An efficient frequent pattern mining approach from FP-Tree [J]. Pattern Recognition and Machine Intelligence. Singapore: Springer Nature Singapore Pte Ltd., 2019, (2): 1-8.Google ScholarGoogle Scholar
  16. Shawkat M, Badawi M, El-ghamrawy S, An optimized FP-growth algorithm for discovery of association rules [J]. Journal of Supercomputing, 2022, 78: 5479-5506.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Yang J, Cai Y, Wei Y. 2016. Frequent pattern mining algorithm for uncertain data streams based on sliding window [C]// 2016 8th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC). IEEE.Google ScholarGoogle Scholar
  18. Hagberg A A, National L A, Alamos L, Exploring network structure, dynamics, and function using NetworkX [J]. Los Alamos National Lab. (LANL), Los Alamos, NM (United States), 2008, 34(6): 5-15.Google ScholarGoogle Scholar

Index Terms

  1. Multi-source System Log Behavior Pattern Mining Method Based on FP-Growth
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Other conferences
              CNML '23: Proceedings of the 2023 International Conference on Communication Network and Machine Learning
              October 2023
              446 pages
              ISBN:9798400716683
              DOI:10.1145/3640912

              Copyright © 2023 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 22 February 2024

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed limited
            • Article Metrics

              • Downloads (Last 12 months)6
              • Downloads (Last 6 weeks)3

              Other Metrics

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format .

            View HTML Format