Abstract
Computer security events are recorded in several log files. It is necessary to cluster these logs to discover security threats, detect anomalies, or identify a particular error. A problem arises when large quantities of security log data need to be checked as existing tools do not provide sufficiently sophisticated grouping results. In addition, existing methods need user input parameters and it is not trivial to find optimal values for these. Therefore, we propose a method for the automatic clustering of security logs. First, we present a new graph-theoretic approach for security log clustering based on maximal clique percolation. Second, we add an intensity threshold to the obtained maximal clique to consider the edge weight before proceeds to the percolations. Third, we use the simulated annealing algorithm to optimize the number of percolations and intensity threshold for maximal clique percolation. The entire process is automatic and does not need any user input. Experimental results on various real-world datasets show that the proposed method achieves superior clustering results compared to other methods.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Abubaker, A., Baharum, A., Alrefaei, M.: Automatic clustering using multi-objective particle swarm and simulated annealing. PLoS One 10(7), e0130995 (2015)
Basin, D., Schaller, P., Schläpfer, M.: Logging and log analysis. In: Applied Information Security, pp. 69–80. Springer, Heidelberg (2011)
Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python. O’Reilly Media, Inc., Newton (2009)
Bron, C., Kerbosch, J.: Algorithm 457: finding all cliques of an undirected graph. Commun. ACM 16(9), 575–577 (1973)
Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. 3(1), 1–27 (1974)
Chuvakin, A.: Scan 34 2005 from The Honeynet Project (2005). http://old.honeynet.org/scans/scan34/
Chuvakin, A.: Free Honeynet Log Data for Research (2006). http://honeynet.org/node/456/
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. PAMI 1(2), 224–227 (1979)
Farkas, I.J., Ábel, D., Palla, G., Vicsek, T.: Weighted network modules. New J. Phys. 9(6), 180 (2007)
Fu, Q., Lou, J.G., Wang, Y., Li, J.: Execution anomaly detection in distributed systems through unstructured log analysis. In: Proceedings of the 9th IEEE International Conference on Data Mining, pp. 149–158 (2009)
Geisshirt, K.: Pluggable Authentication Modules. Packt Publishing, Birmingham (2007)
Hagberg, A., Schult, D., Swart, P.: Exploring network structure, dynamics, and function using NetworkX. In: Proceedings of the 7th Python in Science Conference, pp. 11–15 (2008)
Harary, F.: Graph Theory. Addison-Wesley, Reading (1994)
He, P., Zhu, J., He, S., Li, J., Lyu, M.R.: An evaluation study on log parsing and its use in log mining. In: Proceedings of the 46th IEEE/IFIP International Conference on Dependable Systems and Networks (2016)
Hofstede, R., Hendriks, L., Sperotto, A., Pras, A.: SSH Compromise Detection using NetFlow/IPFIX. ACM SIGCOMM Comput. Commun. Rev. 44(5), 20–26 (2014)
Islam, H., Ahmed, T.: Anomaly clustering based on correspondence analysis. In: Proceedings of the 32nd IEEE International Conference on Advanced Information Networking and Applications, pp. 1019–1025 (2018)
Joshi, B., Bista, U., Ghimire, M.: Intelligent clustering scheme for log data streams. In: Proceedings of the 15th International Conference on Computational Linguistics and Intelligent Text Processing, pp. 454–465 (2014)
Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220(4598), 671–680 (1983)
Landauer, M., Wurzenberger, M., Skopik, F., Settanni, G., Filzmoser, P.: Dynamic log file analysis: an unsupervised cluster evolution approach for anomaly detection. Comput. Secur. 79, 94–116 (2018)
Lin, Q., Zhang, H., Lou, J.G., Zhang, Y., Chen, X.: Log clustering based problem identification for online service systems. In: Proceedings of the 38th International Conference on Software Engineering Companion, pp. 102–111 (2016)
Makanju, A., Zincir-Heywood, A.N., Milios, E.E.: Clustering event logs using iterative partitioning. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1255–1264 (2009)
Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E.: Equation of state calculations by fast computing machines. J. Chem. Phys. 21(6), 1087–1092 (1953)
National CyberWatch Center: Snort fast alert logs from The U.S. National CyberWatch (MACCDC) (2012). http://www.secrepo.com/maccdc2012/maccdc2012_fast_alert.7z
Reid, F., McDaid, A., Hurley, N.: Percolation computation in complex networks. In: Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 274–281 (2012)
Sconzo, M.: SecRepo.com: security data samples repository (2014). http://www.secrepo.com/auth.log/auth.log.gz
Studiawan, H., Payne, C., Sohel, F.: Graph clustering and anomaly detection of access control log for forensic purposes. Digit. Invest. 21(June), 76–87 (2017)
Studiawan, H., Sohel, F., Payne, C.: Automatic log parser to support forensic analysis. In: Proceedings of the 16th Australian Digital Forensics Conference, pp. 1–10 (2018)
Sun, L.X., Danzer, K.: Fuzzy cluster analysis by simulated annealing. J. Chemometr. 10, 325–342 (1996)
Taerat, N., Brandt, J., Gentile, A., Wong, M., Leangsuksun, C.: Baler: deterministic, lossless log message clustering tool. Comput. Sci. - Res. Dev. 26(3–4), 285–295 (2011)
Tang, L., Li, T., Perng, C.S.: LogSig: generating system events from raw textual logs. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 785–794 (2011)
Vaarandi, R.: A data clustering algorithm for mining patterns from event logs. In: Proceedings of the IEEE Workshop on IP Operations and Management, pp. 119–126 (2003)
Vaarandi, R., Pihelgas, M.: LogCluster - a data clustering and pattern mining algorithm for event logs. In: Proceedings of the 11th International Conference on Network and Service Management, pp. 1–7 (2015)
Yang, W., Rueda, L., Ngom, A.: A simulated annealing approach to find the optimal parameters for fuzzy clustering microarray data. In: Proceedings of the 25th International Conference of the Chilean Computer Science Society, pp. 45–54 (2005)
Acknowledgements
This work is supported by the Indonesia Lecturer Scholarship (BUDI) from Indonesia Endowment Fund or Education (LPDP), Ministry of Finance of Republic of Indonesia.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Studiawan, H., Payne, C., Sohel, F. (2020). Automatic Graph-Based Clustering for Security Logs. In: Barolli, L., Takizawa, M., Xhafa, F., Enokido, T. (eds) Advanced Information Networking and Applications. AINA 2019. Advances in Intelligent Systems and Computing, vol 926. Springer, Cham. https://doi.org/10.1007/978-3-030-15032-7_77
Download citation
DOI: https://doi.org/10.1007/978-3-030-15032-7_77
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15031-0
Online ISBN: 978-3-030-15032-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)