Skip to main content

Automatic Graph-Based Clustering for Security Logs

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 926))

Abstract

Computer security events are recorded in several log files. It is necessary to cluster these logs to discover security threats, detect anomalies, or identify a particular error. A problem arises when large quantities of security log data need to be checked as existing tools do not provide sufficiently sophisticated grouping results. In addition, existing methods need user input parameters and it is not trivial to find optimal values for these. Therefore, we propose a method for the automatic clustering of security logs. First, we present a new graph-theoretic approach for security log clustering based on maximal clique percolation. Second, we add an intensity threshold to the obtained maximal clique to consider the edge weight before proceeds to the percolations. Third, we use the simulated annealing algorithm to optimize the number of percolations and intensity threshold for maximal clique percolation. The entire process is automatic and does not need any user input. Experimental results on various real-world datasets show that the proposed method achieves superior clustering results compared to other methods.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Abubaker, A., Baharum, A., Alrefaei, M.: Automatic clustering using multi-objective particle swarm and simulated annealing. PLoS One 10(7), e0130995 (2015)

    Article  Google Scholar 

  2. Basin, D., Schaller, P., Schläpfer, M.: Logging and log analysis. In: Applied Information Security, pp. 69–80. Springer, Heidelberg (2011)

    Google Scholar 

  3. Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python. O’Reilly Media, Inc., Newton (2009)

    MATH  Google Scholar 

  4. Bron, C., Kerbosch, J.: Algorithm 457: finding all cliques of an undirected graph. Commun. ACM 16(9), 575–577 (1973)

    Article  Google Scholar 

  5. Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. 3(1), 1–27 (1974)

    MathSciNet  MATH  Google Scholar 

  6. Chuvakin, A.: Scan 34 2005 from The Honeynet Project (2005). http://old.honeynet.org/scans/scan34/

  7. Chuvakin, A.: Free Honeynet Log Data for Research (2006). http://honeynet.org/node/456/

  8. Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. PAMI 1(2), 224–227 (1979)

    Article  Google Scholar 

  9. Farkas, I.J., Ábel, D., Palla, G., Vicsek, T.: Weighted network modules. New J. Phys. 9(6), 180 (2007)

    Article  Google Scholar 

  10. Fu, Q., Lou, J.G., Wang, Y., Li, J.: Execution anomaly detection in distributed systems through unstructured log analysis. In: Proceedings of the 9th IEEE International Conference on Data Mining, pp. 149–158 (2009)

    Google Scholar 

  11. Geisshirt, K.: Pluggable Authentication Modules. Packt Publishing, Birmingham (2007)

    Google Scholar 

  12. Hagberg, A., Schult, D., Swart, P.: Exploring network structure, dynamics, and function using NetworkX. In: Proceedings of the 7th Python in Science Conference, pp. 11–15 (2008)

    Google Scholar 

  13. Harary, F.: Graph Theory. Addison-Wesley, Reading (1994)

    MATH  Google Scholar 

  14. He, P., Zhu, J., He, S., Li, J., Lyu, M.R.: An evaluation study on log parsing and its use in log mining. In: Proceedings of the 46th IEEE/IFIP International Conference on Dependable Systems and Networks (2016)

    Google Scholar 

  15. Hofstede, R., Hendriks, L., Sperotto, A., Pras, A.: SSH Compromise Detection using NetFlow/IPFIX. ACM SIGCOMM Comput. Commun. Rev. 44(5), 20–26 (2014)

    Article  Google Scholar 

  16. Islam, H., Ahmed, T.: Anomaly clustering based on correspondence analysis. In: Proceedings of the 32nd IEEE International Conference on Advanced Information Networking and Applications, pp. 1019–1025 (2018)

    Google Scholar 

  17. Joshi, B., Bista, U., Ghimire, M.: Intelligent clustering scheme for log data streams. In: Proceedings of the 15th International Conference on Computational Linguistics and Intelligent Text Processing, pp. 454–465 (2014)

    Google Scholar 

  18. Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220(4598), 671–680 (1983)

    Article  MathSciNet  Google Scholar 

  19. Landauer, M., Wurzenberger, M., Skopik, F., Settanni, G., Filzmoser, P.: Dynamic log file analysis: an unsupervised cluster evolution approach for anomaly detection. Comput. Secur. 79, 94–116 (2018)

    Article  Google Scholar 

  20. Lin, Q., Zhang, H., Lou, J.G., Zhang, Y., Chen, X.: Log clustering based problem identification for online service systems. In: Proceedings of the 38th International Conference on Software Engineering Companion, pp. 102–111 (2016)

    Google Scholar 

  21. Makanju, A., Zincir-Heywood, A.N., Milios, E.E.: Clustering event logs using iterative partitioning. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1255–1264 (2009)

    Google Scholar 

  22. Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E.: Equation of state calculations by fast computing machines. J. Chem. Phys. 21(6), 1087–1092 (1953)

    Article  Google Scholar 

  23. National CyberWatch Center: Snort fast alert logs from The U.S. National CyberWatch (MACCDC) (2012). http://www.secrepo.com/maccdc2012/maccdc2012_fast_alert.7z

  24. Reid, F., McDaid, A., Hurley, N.: Percolation computation in complex networks. In: Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 274–281 (2012)

    Google Scholar 

  25. Sconzo, M.: SecRepo.com: security data samples repository (2014). http://www.secrepo.com/auth.log/auth.log.gz

  26. Studiawan, H., Payne, C., Sohel, F.: Graph clustering and anomaly detection of access control log for forensic purposes. Digit. Invest. 21(June), 76–87 (2017)

    Article  Google Scholar 

  27. Studiawan, H., Sohel, F., Payne, C.: Automatic log parser to support forensic analysis. In: Proceedings of the 16th Australian Digital Forensics Conference, pp. 1–10 (2018)

    Google Scholar 

  28. Sun, L.X., Danzer, K.: Fuzzy cluster analysis by simulated annealing. J. Chemometr. 10, 325–342 (1996)

    Article  Google Scholar 

  29. Taerat, N., Brandt, J., Gentile, A., Wong, M., Leangsuksun, C.: Baler: deterministic, lossless log message clustering tool. Comput. Sci. - Res. Dev. 26(3–4), 285–295 (2011)

    Article  Google Scholar 

  30. Tang, L., Li, T., Perng, C.S.: LogSig: generating system events from raw textual logs. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 785–794 (2011)

    Google Scholar 

  31. Vaarandi, R.: A data clustering algorithm for mining patterns from event logs. In: Proceedings of the IEEE Workshop on IP Operations and Management, pp. 119–126 (2003)

    Google Scholar 

  32. Vaarandi, R., Pihelgas, M.: LogCluster - a data clustering and pattern mining algorithm for event logs. In: Proceedings of the 11th International Conference on Network and Service Management, pp. 1–7 (2015)

    Google Scholar 

  33. Yang, W., Rueda, L., Ngom, A.: A simulated annealing approach to find the optimal parameters for fuzzy clustering microarray data. In: Proceedings of the 25th International Conference of the Chilean Computer Science Society, pp. 45–54 (2005)

    Google Scholar 

Download references

Acknowledgements

This work is supported by the Indonesia Lecturer Scholarship (BUDI) from Indonesia Endowment Fund or Education (LPDP), Ministry of Finance of Republic of Indonesia.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hudan Studiawan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Studiawan, H., Payne, C., Sohel, F. (2020). Automatic Graph-Based Clustering for Security Logs. In: Barolli, L., Takizawa, M., Xhafa, F., Enokido, T. (eds) Advanced Information Networking and Applications. AINA 2019. Advances in Intelligent Systems and Computing, vol 926. Springer, Cham. https://doi.org/10.1007/978-3-030-15032-7_77

Download citation

Publish with us

Policies and ethics