Automated Cyber Threat Intelligence Reports Classification for Early Warning of Cyber Attacks in Next Generation SOC

Yang, Wenzhuo; Lam, Kwok-Yan

doi:10.1007/978-3-030-41579-2_9

Wenzhuo Yang¹² &
Kwok-Yan Lam¹²

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11999))

Included in the following conference series:

International Conference on Information and Communications Security

3131 Accesses
10 Citations

Abstract

Serving as a facility to collect and analyze security data, monitor anomaly activities, Security Operation Center (SOC) provides defense measures to protect the enterprise and government system from malicious intrusion. As the cyber attacks are increasingly sophisticated and harmful, it becomes a global trend to share cyber threat intelligence (CTI) between SOCs and other security departments. Security analysts can get a comprehensive understanding of diverse cyber attacks’ features and make early warning and quick response for potential attacks by CTI analysis. More CTI reports generation and frequent CTI sharing cause an urgent need for much higher analysis efficiency capacity that traditional SOC does not have. Facing the big data challenge and limited professional security analysts resources, next generation SOC (NG-SOC) should emphasize greatly on processing security data like CTI reports automatically and efficiently through data mining and machine learning techniques. This paper presents a practical and efficient approach for gathering the large quantities of CTI sources into high-quality data and enhancing the CTI analysis ability of NG-SOC. Specifically, we first propose a multi-classification framework for CTI reports by combining two document embedding models and six machine learning classifiers respectively to group the same and similar threat reports together before they are analyzed. We collect 25092 CTI reports from open sources and label the reports based on their threat types and attack behaviors. Experiment results show that three classifiers can achieve higher prediction accuracy, which makes it applicable to process the massive volume of CTI reports efficiently for security analysts in NG-SOC and give early warning to help related users take proactive countermeasures to mitigate hidden costs or even avoid potential cyber attacks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Barros, A., Chuvakin, A.: How to plan, design, operate and evolve a SOC (2016)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)
MATH Google Scholar
Deliu, I., Leichter, C., Franke, K.: Collecting cyber threat intelligence from hacker forums via a two-stage, hybrid process using support vector machines and latent dirichlet allocation. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 5008–5013. IEEE (2018)
Google Scholar
Elovici, Y., Shabtai, A., Moskovitch, R., Tahan, G., Glezer, C.: Applying machine learning techniques for detection of malicious code in network traffic. In: Hertzberg, J., Beetz, M., Englert, R. (eds.) KI 2007. LNCS (LNAI), vol. 4667, pp. 44–50. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74565-5_5
Chapter Google Scholar
Gegick, M., Rotella, P., Xie, T.: Identifying security bug reports via text mining: an industrial case study. In: 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), Cape Town, South Africa, pp. 11–20. IEEE, IEEE Computer Society (2010)
Google Scholar
Ghazi, Y., Anwar, Z., Mumtaz, R., Saleem, S., Tahir, A.: A supervised machine learning based approach for automatically extracting high-level threat intelligence from unstructured sources. In: 2018 International Conference on Frontiers of Information Technology (FIT), Islamabad, Pakistan, pp. 129–134. IEEE Computer Society (2018)
Google Scholar
Harris, Z.S.: Distributional structure. Word 10(2–3), 146–162 (1954)
Article Google Scholar
Huffman, D.A.: A method for the construction of minimum-redundancy codes. Proc. IRE 40(9), 1098–1101 (1952)
Article Google Scholar
Husari, G., Al-Shaer, E., Ahmed, M., Chu, B., Niu, X.: TTPDrill: automatic and accurate extraction of threat actions from unstructured text of CTI sources. In: Proceedings of the 33rd Annual Computer Security Applications Conference, Orlando, FL, USA, pp. 103–115. ACM (2017)
Google Scholar
Kambhampati, S., Knoblock, C.A. (eds.): Proceedings of IJCAI-03 Workshop on Information Integration on the Web (IIWeb 2003), Acapulco, Mexico, 9–10 August 2003 (2003)
Google Scholar
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, Beijing, China, pp. 1188–1196. JMLR.org (2014)
Google Scholar
Liao, X., Yuan, K., Wang, X., Li, Z., Xing, L., Beyah, R.: Acing the IOC game: toward automatic discovery and analysis of open-source cyber threat intelligence. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, pp. 755–766. ACM (2016)
Google Scholar
McMillan, R.: Definition: threat intelligence. Gartner 2013 (2013)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, Lake Tahoe, Nevada, United States, pp. 3111–3119. MIT Press (2013)
Google Scholar
Muniz, J., McIntyre, G., AlFardan, N.: Security Operations Center: Building, Operating, and Maintaining Your SOC. Cisco Press, Indianapolis (2015)
Google Scholar
Packard, H.: 5G/SOC: SOC generations. HP ESP Security Intelligence and Operations Consulting Services (2013). http://www.cnmeonline.com/myresources/hpe/docs/HP_ArcSight_WhitePapers_5G-SOC_SOC_Generations.PDF. Accessed 25 Aug 2019
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Platt, J., et al.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Classif. 10(3), 61–74 (1999)
Google Scholar
Pournouri, S., Zargari, S., Akhgar, B.: Predicting the cyber attackers; a comparison of different classification techniques. In: Jahankhani, H. (ed.) Cyber Criminology. ASTSA, pp. 169–181. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-97181-0_8
Chapter Google Scholar
Rebekah Brown, R.M.L.: The evolution of cyber threat intelligence (CTI): 2019 SANS CTI survey, February 2019. https://www.sans.org/reading-room/whitepapers/threats/paper/38790. Accessed 25 Aug 2019
Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, Valletta, Malta, pp. 45–50. ELRA, May 2010. http://is.muni.cz/publication/884893/en
Rumelhart, D.E., Hinton, G.E., Williams, R.J., et al.: Learning representations by back-propagating errors. Cogn. Model. 5(3), 1 (1988)
MATH Google Scholar
Scandariato, R., Walden, J., Hovsepyan, A., Joosen, W.: Predicting vulnerable software components via text mining. IEEE Trans. Software Eng. 40(10), 993–1006 (2014)
Article Google Scholar
Shackleford, D.: Who’s using cyberthreat intelligence and how? SANS Institute (2015)
Google Scholar
Shevchenko, S.: Welcome to threatexpert blog!, February 2008. http://blog.threatexpert.com/2008/02/welcome-to-threatexpert-blog.html. Accessed 25 Aug 2019
Symantec: Petya ransomware outbreak: Here’s what you need to know, December 2017. https://www.symantec.com/blogs/threat-intelligence/petya-ransomware-wiper. Accessed 25 Aug 2019
Tan, P.N., et al.: Introduction to Data Mining. Pearson Education India, New Delhi (2007)
Google Scholar
Tounsi, W., Rais, H.: A survey on technical threat intelligence in the age of sophisticated cyber attacks. Comput. Secur. 72, 212–233 (2018)
Article Google Scholar
Xiao, X., Paradkar, A., Thummalapenta, S., Xie, T.: Automated extraction of security policies from natural-language software documents. In: Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, p. 12. ACM (2012)
Google Scholar
Zhu, Z., Dumitras, T.: ChainSmith: automatically learning the semantics of malicious campaigns by mining threat intelligence reports. In: 2018 IEEE European Symposium on Security and Privacy (EuroS&P), London, United Kingdom, pp. 458–472. IEEE (2018)
Google Scholar

Download references

Acknowledgments

This research work is partially supported and funded by the SPIRIT Smart Nation Research Centre, School of Computer Science and Engineering, Nanyang Technological University (Account No: M4082416.020.706922).

Author information

Authors and Affiliations

School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
Wenzhuo Yang & Kwok-Yan Lam

Authors

Wenzhuo Yang
View author publications
You can also search for this author in PubMed Google Scholar
Kwok-Yan Lam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kwok-Yan Lam .

Editor information

Editors and Affiliations

Singapore University of Technology and Design, Singapore, Singapore
Jianying Zhou
The Hong Kong Polytechnic University, Kowloon, Hong Kong
Xiapu Luo
Peking University, Beijing, China
Qingni Shen
Institute of Information Engineering, Beijing, China
Zhen Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, W., Lam, KY. (2020). Automated Cyber Threat Intelligence Reports Classification for Early Warning of Cyber Attacks in Next Generation SOC. In: Zhou, J., Luo, X., Shen, Q., Xu, Z. (eds) Information and Communications Security. ICICS 2019. Lecture Notes in Computer Science(), vol 11999. Springer, Cham. https://doi.org/10.1007/978-3-030-41579-2_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-41579-2_9
Published: 18 February 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41578-5
Online ISBN: 978-3-030-41579-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics