Skip to main content

Cyber Security Researchers on Online Social Networks: From the Lens of the UK’s ACEs-CSR on Twitter

  • Conference paper
  • First Online:
Security and Privacy in Social Networks and Big Data (SocialSec 2023)

Abstract

Much work in the literature has studied different types of cyber security related users and communities on OSNs, such as activists, hacktivists, hackers, cyber criminals. A few studies also covered no-expert users who discussed cyber security related topics, however, to the best of our knowledge, none has studied activities of cyber security researchers on OSNs. This paper fills this gap using a data-driven analysis of the presence of the UK’s Academic Centres of Excellence in Cyber Security Research (ACEs-CSR) on Twitter. We created machine learning classifiers to identify cyber security and research related accounts. Then, starting from 19 seed accounts of the ACEs-CSR, a social network graph of 1,817 research-related accounts that were followers or friends of at least one ACE-CSR was constructed. We conducted a comprehensive analysis of the data we collected: a social structural analysis of the social graph; a topic modelling analysis to identify the main topics discussed publicly by researchers in ACEs-CSR network, and a sentiment analysis of how researchers perceived the ACE-CSR programme and the ACEs-CSR. Our study revealed several findings: 1) graph-based analysis and community detection algorithms are useful in detecting sub-communities of researchers to help understand how they are formed and what they represent; 2) topic modelling can identify topics discussed by cyber security researchers (e.g., cyber security incidents, vulnerabilities, threats, privacy, data protection laws, cryptography, research, education, cyber conflict, and politics); and 3) sentiment analysis showed a generally positive sentiment about the ACE-CSR programme and ACEs-CSR. Our work showed the feasibility and usefulness of large-scale automated analyses of cyber security researchers on Twitter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    IP addresses can sometimes carry location-related information. We considered such information less reliable and too complicated to process, so decided to exclude it.

References

  1. Andreotta, M., et al.: Analyzing social media data: a mixed-methods framework combining computational and qualitative text analysis. Behav. Res. Methods 51(4), 1766–1781 (2019). https://doi.org/10.3758/s13428-019-01202-8

    Article  Google Scholar 

  2. Aslan, C.B., Li, S., Celebi, F.V., Tian, H.: The world of defacers: looking through the lens of their activities on Twitter. IEEE Access 8, 204132–204143 (2020). https://doi.org/10.1109/ACCESS.2020.3037015

    Article  Google Scholar 

  3. Aslan, B., Belen Sağlam, R., Li, S.: Automatic detection of cyber security related accounts on online social networks: Twitter as an example. In: Proceedings of the 9th International Conference on Social Media and Society, pp. 236–240. ACM (2018). https://doi.org/10.1145/3217804.3217919

  4. Bastian, M., Heymann, S., Jacomy, M.: Gephi: an open source software for exploring and manipulating networks. Proc. Int. AAAI Conf. Web Soc. Med. 3(1), 361–362 (2009). https://doi.org/10.1609/icwsm.v3i1.13937

    Article  Google Scholar 

  5. Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012). https://doi.org/10.1145/2133806.2133826

    Article  Google Scholar 

  6. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). https://www.jmlr.org/papers/v3/blei03a.html

  7. Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Statist. Mech. Theory Exp. 2008(10), P10008:1–P10008:12 (2008). https://doi.org/10.1088/1742-5468/2008/10/p10008

  8. Bostock, M.: d3-hierarchy: 2D layout algorithms for visualizing hierarchical data (2022). https://github.com/d3/d3-hierarchy

  9. GeoNames: Cities (2022). http://www.geonames.org/

  10. Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. (PNAS) 99(12), 7821–7826 (2002). https://doi.org/10.1073/pnas.122653799

    Article  MathSciNet  MATH  Google Scholar 

  11. Hipo: University domains (2022). github.com/Hipo/university-domains-list

    Google Scholar 

  12. Hutto, C.J., Gilbert, E.: VADER: a parsimonious rule-based model for sentiment analysis of social media text. Proc. Int. AAAI Conf. Web Soc. Med. 8(1), 216–225 (2014). https://doi.org/10.1609/icwsm.v8i1.14550

    Article  Google Scholar 

  13. Jones, K., Nurse, J.R.C., Li, S.: Behind the mask: a computational study of Anonymous’ presence on Twitter. Proc. Int. AAAI Conf. Web Soc. Med. 14(1), 327–338 (2020). https://doi.org/10.1609/icwsm.v14i1.7303

    Article  Google Scholar 

  14. Jones, K., Nurse, J.R.C., Li, S.: Out of the shadows: analyzing anonymous’ Twitter resurgence during the 2020 black lives matter protests. Proc. Int. AAAI Conf. Web Soc. Med. 16(1), 417–428 (2022). https://doi.org/10.1609/icwsm.v16i1.19303

    Article  Google Scholar 

  15. Kigerl, A.: Profiling cybercriminals: topic model clustering of carding forum member comment histories. Soc. Sci. Comput. Rev. 36(5), 591–609 (2018). https://doi.org/10.1177/0894439317730296

    Article  Google Scholar 

  16. Lambiotte, R., Delvenne, J.C., Barahona, M.: Random walks, Markov processes and the multiscale modular organization of complex networks. IEEE Trans. Netw. Sci. Eng. 1(2), 76–90 (2014). https://doi.org/10.1109/tnse.2015.2391998

    Article  MathSciNet  Google Scholar 

  17. Loria, S.: TextBlob: Simplified text processing (2022). https://textblob.readthedocs.io/en/dev/

  18. Mahaini, M.I., Li, S.: Detecting cyber security related Twitter accounts and different sub-groups: A multi-classifier approach. In: Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 599–606. ACM (11 2021). https://doi.org/10.1145/3487351.3492716

  19. Moscato, V., Sperlì, G.: A survey about community detection over on-line social and heterogeneous information networks. Knowl. Based Syst. 224, 107112:1–107112:13 (2021). https://doi.org/10.1016/j.knosys.2021.107112

  20. National Cyber Security Centre (NCSC), UK: Academic Centres of Excellence in Cyber Security Research (2019). https://www.ncsc.gov.uk/information/academic-centres-excellence-cyber-security-research

  21. Newman, M.E.J.: Equivalence between modularity optimization and maximum likelihood methods for community detection. Phys. Rev. E. 94(5), 052315:1–052315:8 (2016). https://doi.org/10.1103/PhysRevE.94.052315

  22. Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E. 69(2), 026113:1–026113:15 (2004). https://doi.org/10.1103/PhysRevE.69.026113

  23. NLTK Team: NLTK: Natural language toolkit (2023). https://www.nltk.org/

  24. Nouh, M., Nurse, J.R.C.: Identifying key-players in online activist groups on the Facebook social network. In: Proceedings of the 2015 IEEE International Conference on Data Mining Workshop, pp. 969–978. IEEE (2015). https://doi.org/10.1109/icdmw.2015.88

  25. Pattnaik, N., Li, S., Nurse, J.R.C.: Perspectives of non-expert users on cyber security and privacy: an analysis of online discussions on Twitter. Comput. Secur. 125, 103008:1–103008:15 (2023). https://doi.org/10.1016/j.cose.2022.103008

  26. Pedregosa, F., et al.: Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011). https://jmlr.org/papers/v12/pedregosa11a.html

  27. Řehůřek, R.: Gensim: Topic modelling for humans (2022). https://radimrehurek.com/gensim/index.html

  28. Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. Elra (2010). http://is.muni.cz/publication/884893/en

  29. Röder, M., Both, A., Hinneburg, A.: Exploring the space of topic coherence measures. In: Proceedings of the 8th ACM International Conference on Web Search and Data Mining, pp. 399–408. ACM (2015). https://doi.org/10.1145/2684822.2685324

  30. Saura, J.R., Palacios-Marqués, D., Ribeiro-Soriano, D.: Using data mining techniques to explore security issues in smart living environments in Twitter. Comput. Commun. 179, 285–295 (2021). https://doi.org/10.1016/j.comcom.2021.08.021

    Article  Google Scholar 

  31. Schubert, E., Sander, J., Ester, M., Kriegel, H.P., Xu, X.: DBSCAN revisited, revisited: Why and how you should (still) use DBSCAN. ACM Trans. Database Syst. 42(3), 19:1–19:21 (2017). https://doi.org/10.1145/3068335

  32. Sievert, C., Shirley, K.: LDAvis: a method for visualizing and interpreting topics. In: Proceedings of the 2014 Workshop on Interactive Language Learning, Visualization, and Interfaces, pp. 63–70. ACL (2014). https://doi.org/10.3115/v1/W14-3110

  33. Soni, K.: locationtagger (2022). https://pypi.org/project/locationtagger/

  34. Tavabi, N., Bartley, N., Abeliuk, A., Soni, S., Ferrara, E., Lerman, K.: Characterizing activity on the deep and dark web. In: Companion Proceedings of the 2019 World Wide Web Conference, pp. 206–213. ACM (2019). https://doi.org/10.1145/3308560.3316502

  35. Traag, V.A., Waltman, L., van Eck, N.J.: From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 9(1), 5233:1–5233:12 (2019). https://doi.org/10.1038/s41598-019-41695-z

  36. We Are Social: DIGITAL 2023: What we learned. Special report, We Are Social Ltd (2023). https://wearesocial.com/uk/blog/2023/01/digital-2023/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamad Imad Mahaini .

Editor information

Editors and Affiliations

Appendices

A Evaluating Baseline/Individual Classifiers Performance

Classifiers Training: before using the classifiers reported in [18], we re-validated their performance with our ACEs-CSR dataset (i.e. about 42,000 Twitter accounts), which is different from the ones these classifiers were trained with originally. We utilised the same original labelled datasets and followed the same steps for the feature extraction phase from [18]. After that, we selected the best-performing feature sets according to the reported results: C, L, PBC, and PBCL (see the original study for more details on the feature sets). We re-trained the classifiers using the same original models, Decision Tree (DT), Random Forest (RF), Logistic Regression (LR), SVM with linear kernel (SVM-L), and SVM with RBF kernel (SVM-R). To see if we could get better results, we added two more models: Extra Trees (ET) and eXtreme Gradient Boosting (XGBoost). The training process was also done using the Scikit-Learn library with 5-fold stratified cross-validation. The training results are shown in Table 1. We show only the best-performing feature sets.

Our results were similar to the original ones for the first five models. As for the ET models, we noticed a similarity in performance compared to the RF models. This was expected as they are quite similar methods. In some cases, the ET models performed slightly better than the RF models. The XGBoost models performed well for the Baseline classification task with the PBCL feature set, where the F1-score is 91%, similar to the RF and ET models. However, XGBoost was slightly ahead of all the other models (in terms of F1-score) using the PBCL feature set. To summarise the results, we noticed that RF and ET models performed well across all the classification tasks. As for the feature sets, we found that for both Baseline and Individual classification tasks, the PBCL feature set seemed to be a good and stable choice.

Manual Evaluation: to evaluate the performance of the trained classifiers on the prediction dataset, we had to manually verify the results by selecting a subset of Twitter accounts for each classification task and manually labelling them. After that, we compared the actual labels with the predicted labels to calculate the confusion matrix. Next, Accuracy, F1, Precision, and Recall were calculated. The results of the manual verification are shown in Table 6. For the Baseline classifier evaluation, we randomly selected 1,154 samples. The F1-score was 90%, which means a 2% drop in performance compared to the F1-score from the original training/testing results, reported in [18]. For the Individual classifier, we selected 1,003 samples, and the F1-score was 85%, representing a 5% drop in performance. However, considering the significant difference in size between the original training dataset and our prediction dataset (2k vs. 42k accounts) and the relatively small performance drop, we can confidently assert that both the Baseline and Individual classifiers are good enough for our case study.

Table 6. Re-validation results of the Baseline and Individual classifiers

B Issue with TextBlob Sentiment Analyser

Below are some example tweets that were wrongly classified by the TextBlob sentiment analyser as negative, while the VADER sentiment analyser classified them correctly as positive.

  • Our Academic Centre of Excellence in Cyber Security Research becomes active this week.

  • Academic Centre of Excellence in Cyber Security Research Open Day @ucl: @uclisec hosting an open day at the ACE center November 15th #infosec #CyberSecurity.

  • Congratulations to @UniKent @KingsCollegeLon and @cardiffuni who join @UniofOxford and 13 other UK universities as Academic Centres of Excellence in Cyber Security Research, announced recently by the National Cyber Security Centre @NCSC and @EPSRC.

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mahaini, M.I., Li, S. (2023). Cyber Security Researchers on Online Social Networks: From the Lens of the UK’s ACEs-CSR on Twitter. In: Arief, B., Monreale, A., Sirivianos, M., Li, S. (eds) Security and Privacy in Social Networks and Big Data. SocialSec 2023. Lecture Notes in Computer Science, vol 14097. Springer, Singapore. https://doi.org/10.1007/978-981-99-5177-2_8

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-5177-2_8

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-5176-5

  • Online ISBN: 978-981-99-5177-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics