Skip to main content

CySecAlert: An Alert Generation System for Cyber Security Events Using Open Source Intelligence Data

  • Conference paper
  • First Online:
Information and Communications Security (ICICS 2021)

Abstract

Receiving relevant information on possible cyber threats, attacks, and data breaches in a timely manner is crucial for early response. The social media platform Twitter hosts an active cyber security community. Their activities are often monitored manually by security experts, such as Computer Emergency Response Teams (CERTs). We thus propose a Twitter-based alert generation system that issues alerts to a system operator as soon as new relevant cyber security related topics emerge. Thereby, our system allows us to monitor user accounts with significantly less workload. Our system applies a supervised classifier, based on active learning, that detects tweets containing relevant information. The results indicate that uncertainty sampling can reduce the amount of manual relevance classification effort and enhance the classifier performance substantially compared to random sampling. Our approach reduces the number of accounts and tweets that are needed for the classifier training, thus making the tool easily and rapidly adaptable to the specific context while also supporting data minimization for Open Source Intelligence (OSINT). Relevant tweets are clustered by a greedy stream clustering algorithm in order to identify significant events. The proposed system is able to work near real-time within the required 15-min time frameand detects up to 93.8% of relevant events with a false alert rate of 14.81%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/PEASEC/CySecAlert.

  2. 2.

    Twitter4J Version 4.0.7 (twitter4j.org/en/index.html on 14.08.2020).

  3. 3.

    Weka v3.8.4(https://www.cs.waikato.ac.nz/ml/weka/ on 14.08.2020).

References

  1. Reuter, C., Kaufhold, M.A.: Fifteen years of social media in emergencies: a retrospective review and future directions for crisis informatics. J. Contingencies Crisis Manage. 26(1), 41–57 (2018)

    Article  Google Scholar 

  2. Husák, M., Jirsík, T., Yang, S.J.: SoK: contemporary issues and challenges to enable cyber situational awareness for network security. In: Proceedings of the 15th International Conference on Availability, Reliability and Security. ARES 2020. Association for Computing Machinery, New York, NY, USA (2020)

    Google Scholar 

  3. Yang, W., Lam, K.Y.: Automated cyber threat intelligence reports classification for early warning of cyber attacks in next generation SOC. In: International Conference on Information and Communication Systems (ICICS), pp. 145–164 (2020)

    Google Scholar 

  4. Mittal, S., Das, P.K., Mulwad, V., Joshi, A., Finin, T.: CyberTwitter: using Twitter to generate alerts for cybersecurity threats and vulnerabilities. In: 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 860–867. IEEE (2016)

    Google Scholar 

  5. Behzadan, V., Aguirre, C., Bose, A., Hsu, W.: Corpus and deep learning classifier for collection of cyber threat indicators in Twitter stream. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 5002–5007. IEEE (2018)

    Google Scholar 

  6. Tundis, A., Ruppert, S., Mühlhäuser, M.: On the automated assessment of open-source cyber threat intelligence sources. In: Krzhizhanovskaya, V.V., et al. (eds.) ICCS 2020. LNCS, vol. 12138, pp. 453–467. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-50417-5_34

    Chapter  Google Scholar 

  7. Alves, F., Andongabo, A., Gashi, I., Ferreira, P.M., Bessani, A.: Follow the blue bird: a study on threat data published on Twitter. In: Chen, L., Li, N., Liang, K., Schneider, S. (eds.) ESORICS 2020. LNCS, vol. 12308, pp. 217–236. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58951-6_11

    Chapter  Google Scholar 

  8. Koops, B.J., Hoepman, J.H., Leenes, R.: Open-source intelligence and privacy by design. Comput. Law Secur. Rev. 29(6), 676–688 (2013)

    Article  Google Scholar 

  9. Sabottke, C., Suciu, O., Dumitras, T.: Vulnerability disclosure in the age of social media: exploiting Twitter for predicting real-world exploits. In: 24th USENIX Security Symposium USENIX Security 15, pp. 1041–1056 (2015)

    Google Scholar 

  10. Atefeh, F., Khreich, W.: A survey of techniques for event detection in Twitter. Comput. Intell. 31(1), 132–164 (2015)

    Article  MathSciNet  Google Scholar 

  11. Alves, F., Bettini, A., Ferreira, P.M., Bessani, A.: Processing tweets for cybersecurity threat awareness. arXiv preprint arXiv:1904.02072 (2019)

  12. Trabelsi, S., et al.: Mining social networks for software vulnerabilities monitoring. In: 2015 7th International Conference on New Technologies, Mobility and Security (NTMS), pp. 1–7. IEEE (2015)

    Google Scholar 

  13. Hasan, M., Orgun, M.A., Schwitter, R.: A survey on real-time event detection from the Twitter data stream. J. Inf. Sci. 44(4), 443–463 (2018)

    Article  Google Scholar 

  14. Kaufhold, M.A., Bayer, M., Reuter, C.: Rapid relevance classification of social media posts in disasters and emergencies: A system and evaluation featuring active, incremental and online learning. Inf. Process. Manage. 57(1), 102132 (2020)

    Google Scholar 

  15. Habdank, M., Rodehutskors, N., Koch, R.: Relevancy assessment of tweets using supervised learning techniques: mining emergency related tweets for automated relevancy classification. In: 2017 4th International Conference on Information and Communication Technologies for Disaster Management (ICT-DM), pp. 1–8. IEEE (2017)

    Google Scholar 

  16. Settles, B.: Active learning literature survey. University of Wisconsin (2010)

    Google Scholar 

  17. Imran, M., Mitra, P., Srivastava, J.: Enabling rapid classification of social media communications during crises. Int. J. Inf. Syst. Crisis Response Manage. (IJISCRAM) 8(3), 1–17 (2016)

    Article  Google Scholar 

  18. Lewis, D.D., Catlett, J.: Heterogeneous uncertainty sampling for supervised learning. In: Machine Learning Proceedings 1994, pp. 148–156. Elsevier (1994)

    Google Scholar 

  19. Allan, J., Lavrenko, V., Jin, H.: First story detection in TDT is hard. In: Proceedings of the Ninth International Conference on Information and Knowledge Management, pp. 374–381 (2000)

    Google Scholar 

  20. Ritter, A., Wright, E., Casey, W., Mitchell, T.: Weakly supervised extraction of computer security events from Twitter. In: Proceedings of the 24th International Conference on World Wide Web, pp. 896–905 (2015)

    Google Scholar 

  21. Concone, F., De Paola, A., Re, G.L., Morana, M.: Twitter analysis for real-time malware discovery. In: 2017 AEIT International Annual Conference, pp. 1–6. IEEE (2017)

    Google Scholar 

  22. Dionisio, N., Alves, F., Ferreira, P.M., Bessani, A.: Cyberthreat detection from twitter using deep neural networks. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2019)

    Google Scholar 

  23. Bose, A., Behzadan, V., Aguirre, C., Hsu, W.H.: A novel approach for detection and ranking of trendy and emerging cyber threat events in Twitter streams. In: Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 871–878 (2019)

    Google Scholar 

  24. Mayring, P.: Qualitative content analysis. Companion Qual. Res. 1(2004), 159–176 (2004)

    Google Scholar 

  25. Sapienza, A., Ernala, S.K., Bessi, A., Lerman, K., Ferrara, E.: Discover: mining online chatter for emerging cyber threats. In: Companion Proceedings of the The Web Conference 2018, pp. 983–990 (2018)

    Google Scholar 

  26. Le Sceller, Q., Karbab, E.B., Debbabi, M., Iqbal, F.: Sonar: automatic detection of cyber security events over the Twitter stream. In: Proceedings of the 12th International Conference on Availability, Reliability and Security (ARES), pp. 1–11 (2017)

    Google Scholar 

  27. Lee, K.C., Hsieh, C.H., Wei, L.J., Mao, C.H., Dai, J.H., Kuang, Y.T.: Sec-buzzer: cyber security emerging topic mining with open threat intelligence retrieval and timeline event annotation. Soft. Comput. 21(11), 2883–2896 (2017)

    Article  Google Scholar 

  28. Dionísio, N., Alves, F., Ferreira, P.M., Bessani, A.: Towards end-to-end cyberthreat detection from twitter using multi-task learning. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2020)

    Google Scholar 

  29. Fang, Y., Gao, J., Liu, Z., Huang, C.: Detecting cyber threat event from twitter using IDCNN and BiLSTM. Appl. Sci. 10(17), 5922 (2020)

    Article  Google Scholar 

  30. Ji, T., Zhang, X., Self, N., Fu, K., Lu, C.T., Ramakrishnan, N.: Feature driven learning framework for cybersecurity event detection. In: Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 196–203 (2019)

    Google Scholar 

  31. Khandpur, R.P., Ji, T., Jan, S., Wang, G., Lu, C.T., Ramakrishnan, N.: Crowdsourcing cybersecurity: Cyber attack detection using social media. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 1049–1057 (2017)

    Google Scholar 

  32. Mittal, S., Joshi, A., Finin, T.: Cyber-all-intel: an AI for security related threat intelligence. arXiv preprint arXiv:1905.02895 (2019)

  33. Simran, K., Balakrishna, P., Vinayakumar, R., Soman, K.P.: Deep learning approach for enhanced cyber threat indicators in Twitter stream. In: Thampi, S.M., Martinez Perez, G., Ko, R., Rawat, D.B. (eds.) SSCC 2019. CCIS, vol. 1208, pp. 135–145. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-4825-3_11

    Chapter  Google Scholar 

  34. Bernard, J., Zeppelzauer, M., Lehmann, M., Müller, M., Sedlmair, M.: Towards user-centered active learning algorithms. In: Computer Graphics Forum, vol. 37, pp. 121–132. Wiley Online Library (2018)

    Google Scholar 

  35. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

Download references

Acknoledgements

This work was supported by the German Federal Ministry for Education and Research (BMBF) in the projects CYWARN (13N15407) and KontiKat (13N14351), as well as by the BMBF and the Hessian Ministry of Higher Education, Research, Science and the Arts within their joint support of the National Research Center for Applied Cybersecurity ATHENE. We would like to thank the anonymous reviewers for their valuable and constructive comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thea Riebe .

Editor information

Editors and Affiliations

Appendices

Appendix A Dataset

Table 4 provides the websites and blogs we used to retrieve 170 accounts of the leading cyber security experts on Twitter, from which we gathered the dataset of 350,061 English tweets (see Sect. 3.1).

Table 4. Sources for cyber security experts on Twitter

Appendix B Codebook

In Table 5 the codebook [24] for the annotation of tweets is presented, which is applied to the coding of the dataset (see Sect. 3.1). Table 5 gives an overview of the codes’ definitions.

Table 5. Codebook for tweet relevance classification.

Appendix C Classifier Comparison

Figure 3 depicts the results of active classifier comparison. Experiment details are discussed in Sect. 3.2.

Fig. 3.
figure 3

Performance comparison of Naive Bayes (red), kNN with \(k=50\) (blue) and Random Forest (brown) classifier with uncertainty sampling based on their respective model on dataset S1 (left) and S2 (right). Average over 5 executions using Cross-Validation. (Color figure online)

Appendix D Alert Generation by Similarity Threshold

Table 6 depicts how recall and alert generation is impacted by the similarity threshold of the greedy clustering (see Sect. 3.3).

Table 6. Performance measures of greedy clustering-based generated alerts for different similarity thresholds and for alert count thresholds 3 and 5 for the datasets S1 and S2, respectively.

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Riebe, T. et al. (2021). CySecAlert: An Alert Generation System for Cyber Security Events Using Open Source Intelligence Data. In: Gao, D., Li, Q., Guan, X., Liao, X. (eds) Information and Communications Security. ICICS 2021. Lecture Notes in Computer Science(), vol 12918. Springer, Cham. https://doi.org/10.1007/978-3-030-86890-1_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86890-1_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86889-5

  • Online ISBN: 978-3-030-86890-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics