Cybersecurity Text Data Classification and Optimization for CTI Systems

Rodriguez, Ariel; Okamura, Koji

doi:10.1007/978-3-030-44038-1_37

Ariel Rodriguez¹⁹ &
Koji Okamura¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1150))

Included in the following conference series:

Workshops of the International Conference on Advanced Information Networking and Applications

2448 Accesses
3 Citations

Abstract

Cyber threat intelligence systems provide a way to prioritize alerts and allow security teams to focus on critical threats and utilize their resources more efficiently. One challenge in these systems comes in accurately classifying the data that is input and processed within the system which is critical to producing meaningful output. To tackle this problem, in this paper we research text-based cybersecurity data classification methods using a multi-layer keyword filtering method and unsupervised learning methods using doc2vec. We also look at how we can optimize the accuracy and efficiency of cyber threat intelligence systems through the use of ensemble learning. This research will help with prioritization of cyber threat intelligence systems which allow security teams to use their resources more efficiently.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 229.00; Price excludes VAT (USA)

Softcover Book: USD 299.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Torres, A.: Building a world-class security operations center: a roadmap. SANS Institute, May 2015
Google Scholar
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Liu, Y., Yao, X.: Ensemble learning via negative correlation. Neural Netw. 12(10), 1399–1404 (1999)
Article Google Scholar
Hernandez-Suarez, A., Sanchez-Perez, G., Toscano-Medina, K., Martinez-Hernandez, V., Perez-Meana, H., Olivares-Mercado, J., Sanchez, V.: Social sentiment sensor in Twitter for predicting cyber-attacks using \({l}\)1 regularization. Sensors 18(5), 1380 (2018)
Article Google Scholar
Mittal, S., Das, P.K., Mulwad, V., Joshi, A., Finin, T.: CyberTwitter: using Twitter to generate alerts for cybersecurity threats and vulnerabilities. In: Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 860–867. IEEE Press (2016)
Google Scholar
Lee, K.-C., Hsieh, C.-H., Wei, L.-J., Mao, C.-H., Dai, J.-H., Kuang, Y.-T.: Sec-Buzzer: cyber security emerging topic mining with open threat intelligence retrieval and timeline event annotation. Soft Comput. 21(11), 2883–2896 (2017)
Article Google Scholar
Le Sceller, Q., Karbab, E.B., Debbabi, M., Iqbal, F.: SONAR: automatic detection of cyber security events over the Twitter stream. In: Proceedings of the 12th International Conference on Availability, Reliability and Security, p. 23. ACM (2017)
Google Scholar
Mendsaikhan, O., Hasegawa, H., Yamaguchi, Y., Shimada, H.: Identification of cybersecurity specific content using the Doc2Vec language model. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 396–401 (2019)
Google Scholar
Rodriguez, A., Okamura, K.: Generating real time cyber situational awareness information through social media data mining. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 2, pp. 502–507. IEEE (2019)
Google Scholar
Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pp. 142–150. Association for Computational Linguistics (2011)
Google Scholar
Rehurek, R., Sojka, P.: Gensim—statistical semantics in python. Statistical semantics; gensim; Python; LDA; SVD (2011)
Google Scholar

Download references

Acknowledgements

This research was supported by JSPS KAKENHI Grant Number JP16K00480.

Author information

Authors and Affiliations

Kyushu University, Fukuoka, Japan
Ariel Rodriguez & Koji Okamura

Authors

Ariel Rodriguez
View author publications
You can also search for this author in PubMed Google Scholar
Koji Okamura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Koji Okamura .

Editor information

Editors and Affiliations

Department of Information and Communication Engineering, Fukuoka Institute of Technology, Fukuoka, Japan
Leonard Barolli
Department of Electrical Engineering and Information Technology, University of Naples “Frederico II”, Naples, Italy
Flora Amato
Department of Political Science, University of Campania Luigi Vanvitelli, Caserta, Italy
Francesco Moscato
Faculty of Business Administration, Rissho University, Tokyo, Japan
Tomoya Enokido
Department of Advanced Sciences, Hosei University, Tokyo, Japan
Makoto Takizawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rodriguez, A., Okamura, K. (2020). Cybersecurity Text Data Classification and Optimization for CTI Systems. In: Barolli, L., Amato, F., Moscato, F., Enokido, T., Takizawa, M. (eds) Web, Artificial Intelligence and Network Applications. WAINA 2020. Advances in Intelligent Systems and Computing, vol 1150. Springer, Cham. https://doi.org/10.1007/978-3-030-44038-1_37

Download citation

DOI: https://doi.org/10.1007/978-3-030-44038-1_37
Published: 31 March 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-44037-4
Online ISBN: 978-3-030-44038-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics