Skip to main content

Cybersecurity Text Data Classification and Optimization for CTI Systems

  • Conference paper
  • First Online:
Web, Artificial Intelligence and Network Applications (WAINA 2020)

Abstract

Cyber threat intelligence systems provide a way to prioritize alerts and allow security teams to focus on critical threats and utilize their resources more efficiently. One challenge in these systems comes in accurately classifying the data that is input and processed within the system which is critical to producing meaningful output. To tackle this problem, in this paper we research text-based cybersecurity data classification methods using a multi-layer keyword filtering method and unsupervised learning methods using doc2vec. We also look at how we can optimize the accuracy and efficiency of cyber threat intelligence systems through the use of ensemble learning. This research will help with prioritization of cyber threat intelligence systems which allow security teams to use their resources more efficiently.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 229.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 299.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Torres, A.: Building a world-class security operations center: a roadmap. SANS Institute, May 2015

    Google Scholar 

  2. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014)

    Google Scholar 

  3. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  4. Liu, Y., Yao, X.: Ensemble learning via negative correlation. Neural Netw. 12(10), 1399–1404 (1999)

    Article  Google Scholar 

  5. Hernandez-Suarez, A., Sanchez-Perez, G., Toscano-Medina, K., Martinez-Hernandez, V., Perez-Meana, H., Olivares-Mercado, J., Sanchez, V.: Social sentiment sensor in Twitter for predicting cyber-attacks using \({l}\)1 regularization. Sensors 18(5), 1380 (2018)

    Article  Google Scholar 

  6. Mittal, S., Das, P.K., Mulwad, V., Joshi, A., Finin, T.: CyberTwitter: using Twitter to generate alerts for cybersecurity threats and vulnerabilities. In: Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 860–867. IEEE Press (2016)

    Google Scholar 

  7. Lee, K.-C., Hsieh, C.-H., Wei, L.-J., Mao, C.-H., Dai, J.-H., Kuang, Y.-T.: Sec-Buzzer: cyber security emerging topic mining with open threat intelligence retrieval and timeline event annotation. Soft Comput. 21(11), 2883–2896 (2017)

    Article  Google Scholar 

  8. Le Sceller, Q., Karbab, E.B., Debbabi, M., Iqbal, F.: SONAR: automatic detection of cyber security events over the Twitter stream. In: Proceedings of the 12th International Conference on Availability, Reliability and Security, p. 23. ACM (2017)

    Google Scholar 

  9. Mendsaikhan, O., Hasegawa, H., Yamaguchi, Y., Shimada, H.: Identification of cybersecurity specific content using the Doc2Vec language model. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 396–401 (2019)

    Google Scholar 

  10. Rodriguez, A., Okamura, K.: Generating real time cyber situational awareness information through social media data mining. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 2, pp. 502–507. IEEE (2019)

    Google Scholar 

  11. Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pp. 142–150. Association for Computational Linguistics (2011)

    Google Scholar 

  12. Rehurek, R., Sojka, P.: Gensim—statistical semantics in python. Statistical semantics; gensim; Python; LDA; SVD (2011)

    Google Scholar 

Download references

Acknowledgements

This research was supported by JSPS KAKENHI Grant Number JP16K00480.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Koji Okamura .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rodriguez, A., Okamura, K. (2020). Cybersecurity Text Data Classification and Optimization for CTI Systems. In: Barolli, L., Amato, F., Moscato, F., Enokido, T., Takizawa, M. (eds) Web, Artificial Intelligence and Network Applications. WAINA 2020. Advances in Intelligent Systems and Computing, vol 1150. Springer, Cham. https://doi.org/10.1007/978-3-030-44038-1_37

Download citation

Publish with us

Policies and ethics