Skip to main content

Automated Cyber Threat Intelligence Reports Classification for Early Warning of Cyber Attacks in Next Generation SOC

  • Conference paper
  • First Online:
Book cover Information and Communications Security (ICICS 2019)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11999))

Included in the following conference series:

Abstract

Serving as a facility to collect and analyze security data, monitor anomaly activities, Security Operation Center (SOC) provides defense measures to protect the enterprise and government system from malicious intrusion. As the cyber attacks are increasingly sophisticated and harmful, it becomes a global trend to share cyber threat intelligence (CTI) between SOCs and other security departments. Security analysts can get a comprehensive understanding of diverse cyber attacks’ features and make early warning and quick response for potential attacks by CTI analysis. More CTI reports generation and frequent CTI sharing cause an urgent need for much higher analysis efficiency capacity that traditional SOC does not have. Facing the big data challenge and limited professional security analysts resources, next generation SOC (NG-SOC) should emphasize greatly on processing security data like CTI reports automatically and efficiently through data mining and machine learning techniques. This paper presents a practical and efficient approach for gathering the large quantities of CTI sources into high-quality data and enhancing the CTI analysis ability of NG-SOC. Specifically, we first propose a multi-classification framework for CTI reports by combining two document embedding models and six machine learning classifiers respectively to group the same and similar threat reports together before they are analyzed. We collect 25092 CTI reports from open sources and label the reports based on their threat types and attack behaviors. Experiment results show that three classifiers can achieve higher prediction accuracy, which makes it applicable to process the massive volume of CTI reports efficiently for security analysts in NG-SOC and give early warning to help related users take proactive countermeasures to mitigate hidden costs or even avoid potential cyber attacks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Barros, A., Chuvakin, A.: How to plan, design, operate and evolve a SOC (2016)

    Google Scholar 

  2. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)

    MATH  Google Scholar 

  3. Deliu, I., Leichter, C., Franke, K.: Collecting cyber threat intelligence from hacker forums via a two-stage, hybrid process using support vector machines and latent dirichlet allocation. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 5008–5013. IEEE (2018)

    Google Scholar 

  4. Elovici, Y., Shabtai, A., Moskovitch, R., Tahan, G., Glezer, C.: Applying machine learning techniques for detection of malicious code in network traffic. In: Hertzberg, J., Beetz, M., Englert, R. (eds.) KI 2007. LNCS (LNAI), vol. 4667, pp. 44–50. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74565-5_5

    Chapter  Google Scholar 

  5. Gegick, M., Rotella, P., Xie, T.: Identifying security bug reports via text mining: an industrial case study. In: 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), Cape Town, South Africa, pp. 11–20. IEEE, IEEE Computer Society (2010)

    Google Scholar 

  6. Ghazi, Y., Anwar, Z., Mumtaz, R., Saleem, S., Tahir, A.: A supervised machine learning based approach for automatically extracting high-level threat intelligence from unstructured sources. In: 2018 International Conference on Frontiers of Information Technology (FIT), Islamabad, Pakistan, pp. 129–134. IEEE Computer Society (2018)

    Google Scholar 

  7. Harris, Z.S.: Distributional structure. Word 10(2–3), 146–162 (1954)

    Article  Google Scholar 

  8. Huffman, D.A.: A method for the construction of minimum-redundancy codes. Proc. IRE 40(9), 1098–1101 (1952)

    Article  Google Scholar 

  9. Husari, G., Al-Shaer, E., Ahmed, M., Chu, B., Niu, X.: TTPDrill: automatic and accurate extraction of threat actions from unstructured text of CTI sources. In: Proceedings of the 33rd Annual Computer Security Applications Conference, Orlando, FL, USA, pp. 103–115. ACM (2017)

    Google Scholar 

  10. Kambhampati, S., Knoblock, C.A. (eds.): Proceedings of IJCAI-03 Workshop on Information Integration on the Web (IIWeb 2003), Acapulco, Mexico, 9–10 August 2003 (2003)

    Google Scholar 

  11. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, Beijing, China, pp. 1188–1196. JMLR.org (2014)

    Google Scholar 

  12. Liao, X., Yuan, K., Wang, X., Li, Z., Xing, L., Beyah, R.: Acing the IOC game: toward automatic discovery and analysis of open-source cyber threat intelligence. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, pp. 755–766. ACM (2016)

    Google Scholar 

  13. McMillan, R.: Definition: threat intelligence. Gartner 2013 (2013)

    Google Scholar 

  14. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  15. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, Lake Tahoe, Nevada, United States, pp. 3111–3119. MIT Press (2013)

    Google Scholar 

  16. Muniz, J., McIntyre, G., AlFardan, N.: Security Operations Center: Building, Operating, and Maintaining Your SOC. Cisco Press, Indianapolis (2015)

    Google Scholar 

  17. Packard, H.: 5G/SOC: SOC generations. HP ESP Security Intelligence and Operations Consulting Services (2013). http://www.cnmeonline.com/myresources/hpe/docs/HP_ArcSight_WhitePapers_5G-SOC_SOC_Generations.PDF. Accessed 25 Aug 2019

  18. Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  19. Platt, J., et al.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Classif. 10(3), 61–74 (1999)

    Google Scholar 

  20. Pournouri, S., Zargari, S., Akhgar, B.: Predicting the cyber attackers; a comparison of different classification techniques. In: Jahankhani, H. (ed.) Cyber Criminology. ASTSA, pp. 169–181. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-97181-0_8

    Chapter  Google Scholar 

  21. Rebekah Brown, R.M.L.: The evolution of cyber threat intelligence (CTI): 2019 SANS CTI survey, February 2019. https://www.sans.org/reading-room/whitepapers/threats/paper/38790. Accessed 25 Aug 2019

  22. Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, Valletta, Malta, pp. 45–50. ELRA, May 2010. http://is.muni.cz/publication/884893/en

  23. Rumelhart, D.E., Hinton, G.E., Williams, R.J., et al.: Learning representations by back-propagating errors. Cogn. Model. 5(3), 1 (1988)

    MATH  Google Scholar 

  24. Scandariato, R., Walden, J., Hovsepyan, A., Joosen, W.: Predicting vulnerable software components via text mining. IEEE Trans. Software Eng. 40(10), 993–1006 (2014)

    Article  Google Scholar 

  25. Shackleford, D.: Who’s using cyberthreat intelligence and how? SANS Institute (2015)

    Google Scholar 

  26. Shevchenko, S.: Welcome to threatexpert blog!, February 2008. http://blog.threatexpert.com/2008/02/welcome-to-threatexpert-blog.html. Accessed 25 Aug 2019

  27. Symantec: Petya ransomware outbreak: Here’s what you need to know, December 2017. https://www.symantec.com/blogs/threat-intelligence/petya-ransomware-wiper. Accessed 25 Aug 2019

  28. Tan, P.N., et al.: Introduction to Data Mining. Pearson Education India, New Delhi (2007)

    Google Scholar 

  29. Tounsi, W., Rais, H.: A survey on technical threat intelligence in the age of sophisticated cyber attacks. Comput. Secur. 72, 212–233 (2018)

    Article  Google Scholar 

  30. Xiao, X., Paradkar, A., Thummalapenta, S., Xie, T.: Automated extraction of security policies from natural-language software documents. In: Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, p. 12. ACM (2012)

    Google Scholar 

  31. Zhu, Z., Dumitras, T.: ChainSmith: automatically learning the semantics of malicious campaigns by mining threat intelligence reports. In: 2018 IEEE European Symposium on Security and Privacy (EuroS&P), London, United Kingdom, pp. 458–472. IEEE (2018)

    Google Scholar 

Download references

Acknowledgments

This research work is partially supported and funded by the SPIRIT Smart Nation Research Centre, School of Computer Science and Engineering, Nanyang Technological University (Account No: M4082416.020.706922).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kwok-Yan Lam .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yang, W., Lam, KY. (2020). Automated Cyber Threat Intelligence Reports Classification for Early Warning of Cyber Attacks in Next Generation SOC. In: Zhou, J., Luo, X., Shen, Q., Xu, Z. (eds) Information and Communications Security. ICICS 2019. Lecture Notes in Computer Science(), vol 11999. Springer, Cham. https://doi.org/10.1007/978-3-030-41579-2_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-41579-2_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-41578-5

  • Online ISBN: 978-3-030-41579-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics