Abstract
Serving as a facility to collect and analyze security data, monitor anomaly activities, Security Operation Center (SOC) provides defense measures to protect the enterprise and government system from malicious intrusion. As the cyber attacks are increasingly sophisticated and harmful, it becomes a global trend to share cyber threat intelligence (CTI) between SOCs and other security departments. Security analysts can get a comprehensive understanding of diverse cyber attacks’ features and make early warning and quick response for potential attacks by CTI analysis. More CTI reports generation and frequent CTI sharing cause an urgent need for much higher analysis efficiency capacity that traditional SOC does not have. Facing the big data challenge and limited professional security analysts resources, next generation SOC (NG-SOC) should emphasize greatly on processing security data like CTI reports automatically and efficiently through data mining and machine learning techniques. This paper presents a practical and efficient approach for gathering the large quantities of CTI sources into high-quality data and enhancing the CTI analysis ability of NG-SOC. Specifically, we first propose a multi-classification framework for CTI reports by combining two document embedding models and six machine learning classifiers respectively to group the same and similar threat reports together before they are analyzed. We collect 25092 CTI reports from open sources and label the reports based on their threat types and attack behaviors. Experiment results show that three classifiers can achieve higher prediction accuracy, which makes it applicable to process the massive volume of CTI reports efficiently for security analysts in NG-SOC and give early warning to help related users take proactive countermeasures to mitigate hidden costs or even avoid potential cyber attacks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Barros, A., Chuvakin, A.: How to plan, design, operate and evolve a SOC (2016)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)
Deliu, I., Leichter, C., Franke, K.: Collecting cyber threat intelligence from hacker forums via a two-stage, hybrid process using support vector machines and latent dirichlet allocation. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 5008–5013. IEEE (2018)
Elovici, Y., Shabtai, A., Moskovitch, R., Tahan, G., Glezer, C.: Applying machine learning techniques for detection of malicious code in network traffic. In: Hertzberg, J., Beetz, M., Englert, R. (eds.) KI 2007. LNCS (LNAI), vol. 4667, pp. 44–50. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74565-5_5
Gegick, M., Rotella, P., Xie, T.: Identifying security bug reports via text mining: an industrial case study. In: 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), Cape Town, South Africa, pp. 11–20. IEEE, IEEE Computer Society (2010)
Ghazi, Y., Anwar, Z., Mumtaz, R., Saleem, S., Tahir, A.: A supervised machine learning based approach for automatically extracting high-level threat intelligence from unstructured sources. In: 2018 International Conference on Frontiers of Information Technology (FIT), Islamabad, Pakistan, pp. 129–134. IEEE Computer Society (2018)
Harris, Z.S.: Distributional structure. Word 10(2–3), 146–162 (1954)
Huffman, D.A.: A method for the construction of minimum-redundancy codes. Proc. IRE 40(9), 1098–1101 (1952)
Husari, G., Al-Shaer, E., Ahmed, M., Chu, B., Niu, X.: TTPDrill: automatic and accurate extraction of threat actions from unstructured text of CTI sources. In: Proceedings of the 33rd Annual Computer Security Applications Conference, Orlando, FL, USA, pp. 103–115. ACM (2017)
Kambhampati, S., Knoblock, C.A. (eds.): Proceedings of IJCAI-03 Workshop on Information Integration on the Web (IIWeb 2003), Acapulco, Mexico, 9–10 August 2003 (2003)
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, Beijing, China, pp. 1188–1196. JMLR.org (2014)
Liao, X., Yuan, K., Wang, X., Li, Z., Xing, L., Beyah, R.: Acing the IOC game: toward automatic discovery and analysis of open-source cyber threat intelligence. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, pp. 755–766. ACM (2016)
McMillan, R.: Definition: threat intelligence. Gartner 2013 (2013)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, Lake Tahoe, Nevada, United States, pp. 3111–3119. MIT Press (2013)
Muniz, J., McIntyre, G., AlFardan, N.: Security Operations Center: Building, Operating, and Maintaining Your SOC. Cisco Press, Indianapolis (2015)
Packard, H.: 5G/SOC: SOC generations. HP ESP Security Intelligence and Operations Consulting Services (2013). http://www.cnmeonline.com/myresources/hpe/docs/HP_ArcSight_WhitePapers_5G-SOC_SOC_Generations.PDF. Accessed 25 Aug 2019
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Platt, J., et al.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Classif. 10(3), 61–74 (1999)
Pournouri, S., Zargari, S., Akhgar, B.: Predicting the cyber attackers; a comparison of different classification techniques. In: Jahankhani, H. (ed.) Cyber Criminology. ASTSA, pp. 169–181. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-97181-0_8
Rebekah Brown, R.M.L.: The evolution of cyber threat intelligence (CTI): 2019 SANS CTI survey, February 2019. https://www.sans.org/reading-room/whitepapers/threats/paper/38790. Accessed 25 Aug 2019
Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, Valletta, Malta, pp. 45–50. ELRA, May 2010. http://is.muni.cz/publication/884893/en
Rumelhart, D.E., Hinton, G.E., Williams, R.J., et al.: Learning representations by back-propagating errors. Cogn. Model. 5(3), 1 (1988)
Scandariato, R., Walden, J., Hovsepyan, A., Joosen, W.: Predicting vulnerable software components via text mining. IEEE Trans. Software Eng. 40(10), 993–1006 (2014)
Shackleford, D.: Who’s using cyberthreat intelligence and how? SANS Institute (2015)
Shevchenko, S.: Welcome to threatexpert blog!, February 2008. http://blog.threatexpert.com/2008/02/welcome-to-threatexpert-blog.html. Accessed 25 Aug 2019
Symantec: Petya ransomware outbreak: Here’s what you need to know, December 2017. https://www.symantec.com/blogs/threat-intelligence/petya-ransomware-wiper. Accessed 25 Aug 2019
Tan, P.N., et al.: Introduction to Data Mining. Pearson Education India, New Delhi (2007)
Tounsi, W., Rais, H.: A survey on technical threat intelligence in the age of sophisticated cyber attacks. Comput. Secur. 72, 212–233 (2018)
Xiao, X., Paradkar, A., Thummalapenta, S., Xie, T.: Automated extraction of security policies from natural-language software documents. In: Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, p. 12. ACM (2012)
Zhu, Z., Dumitras, T.: ChainSmith: automatically learning the semantics of malicious campaigns by mining threat intelligence reports. In: 2018 IEEE European Symposium on Security and Privacy (EuroS&P), London, United Kingdom, pp. 458–472. IEEE (2018)
Acknowledgments
This research work is partially supported and funded by the SPIRIT Smart Nation Research Centre, School of Computer Science and Engineering, Nanyang Technological University (Account No: M4082416.020.706922).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Yang, W., Lam, KY. (2020). Automated Cyber Threat Intelligence Reports Classification for Early Warning of Cyber Attacks in Next Generation SOC. In: Zhou, J., Luo, X., Shen, Q., Xu, Z. (eds) Information and Communications Security. ICICS 2019. Lecture Notes in Computer Science(), vol 11999. Springer, Cham. https://doi.org/10.1007/978-3-030-41579-2_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-41579-2_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41578-5
Online ISBN: 978-3-030-41579-2
eBook Packages: Computer ScienceComputer Science (R0)