Abstract
Much work in the literature has studied different types of cyber security related users and communities on OSNs, such as activists, hacktivists, hackers, cyber criminals. A few studies also covered no-expert users who discussed cyber security related topics, however, to the best of our knowledge, none has studied activities of cyber security researchers on OSNs. This paper fills this gap using a data-driven analysis of the presence of the UK’s Academic Centres of Excellence in Cyber Security Research (ACEs-CSR) on Twitter. We created machine learning classifiers to identify cyber security and research related accounts. Then, starting from 19 seed accounts of the ACEs-CSR, a social network graph of 1,817 research-related accounts that were followers or friends of at least one ACE-CSR was constructed. We conducted a comprehensive analysis of the data we collected: a social structural analysis of the social graph; a topic modelling analysis to identify the main topics discussed publicly by researchers in ACEs-CSR network, and a sentiment analysis of how researchers perceived the ACE-CSR programme and the ACEs-CSR. Our study revealed several findings: 1) graph-based analysis and community detection algorithms are useful in detecting sub-communities of researchers to help understand how they are formed and what they represent; 2) topic modelling can identify topics discussed by cyber security researchers (e.g., cyber security incidents, vulnerabilities, threats, privacy, data protection laws, cryptography, research, education, cyber conflict, and politics); and 3) sentiment analysis showed a generally positive sentiment about the ACE-CSR programme and ACEs-CSR. Our work showed the feasibility and usefulness of large-scale automated analyses of cyber security researchers on Twitter.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
IP addresses can sometimes carry location-related information. We considered such information less reliable and too complicated to process, so decided to exclude it.
References
Andreotta, M., et al.: Analyzing social media data: a mixed-methods framework combining computational and qualitative text analysis. Behav. Res. Methods 51(4), 1766–1781 (2019). https://doi.org/10.3758/s13428-019-01202-8
Aslan, C.B., Li, S., Celebi, F.V., Tian, H.: The world of defacers: looking through the lens of their activities on Twitter. IEEE Access 8, 204132–204143 (2020). https://doi.org/10.1109/ACCESS.2020.3037015
Aslan, B., Belen Sağlam, R., Li, S.: Automatic detection of cyber security related accounts on online social networks: Twitter as an example. In: Proceedings of the 9th International Conference on Social Media and Society, pp. 236–240. ACM (2018). https://doi.org/10.1145/3217804.3217919
Bastian, M., Heymann, S., Jacomy, M.: Gephi: an open source software for exploring and manipulating networks. Proc. Int. AAAI Conf. Web Soc. Med. 3(1), 361–362 (2009). https://doi.org/10.1609/icwsm.v3i1.13937
Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012). https://doi.org/10.1145/2133806.2133826
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). https://www.jmlr.org/papers/v3/blei03a.html
Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Statist. Mech. Theory Exp. 2008(10), P10008:1–P10008:12 (2008). https://doi.org/10.1088/1742-5468/2008/10/p10008
Bostock, M.: d3-hierarchy: 2D layout algorithms for visualizing hierarchical data (2022). https://github.com/d3/d3-hierarchy
GeoNames: Cities (2022). http://www.geonames.org/
Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. (PNAS) 99(12), 7821–7826 (2002). https://doi.org/10.1073/pnas.122653799
Hipo: University domains (2022). github.com/Hipo/university-domains-list
Hutto, C.J., Gilbert, E.: VADER: a parsimonious rule-based model for sentiment analysis of social media text. Proc. Int. AAAI Conf. Web Soc. Med. 8(1), 216–225 (2014). https://doi.org/10.1609/icwsm.v8i1.14550
Jones, K., Nurse, J.R.C., Li, S.: Behind the mask: a computational study of Anonymous’ presence on Twitter. Proc. Int. AAAI Conf. Web Soc. Med. 14(1), 327–338 (2020). https://doi.org/10.1609/icwsm.v14i1.7303
Jones, K., Nurse, J.R.C., Li, S.: Out of the shadows: analyzing anonymous’ Twitter resurgence during the 2020 black lives matter protests. Proc. Int. AAAI Conf. Web Soc. Med. 16(1), 417–428 (2022). https://doi.org/10.1609/icwsm.v16i1.19303
Kigerl, A.: Profiling cybercriminals: topic model clustering of carding forum member comment histories. Soc. Sci. Comput. Rev. 36(5), 591–609 (2018). https://doi.org/10.1177/0894439317730296
Lambiotte, R., Delvenne, J.C., Barahona, M.: Random walks, Markov processes and the multiscale modular organization of complex networks. IEEE Trans. Netw. Sci. Eng. 1(2), 76–90 (2014). https://doi.org/10.1109/tnse.2015.2391998
Loria, S.: TextBlob: Simplified text processing (2022). https://textblob.readthedocs.io/en/dev/
Mahaini, M.I., Li, S.: Detecting cyber security related Twitter accounts and different sub-groups: A multi-classifier approach. In: Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 599–606. ACM (11 2021). https://doi.org/10.1145/3487351.3492716
Moscato, V., Sperlì, G.: A survey about community detection over on-line social and heterogeneous information networks. Knowl. Based Syst. 224, 107112:1–107112:13 (2021). https://doi.org/10.1016/j.knosys.2021.107112
National Cyber Security Centre (NCSC), UK: Academic Centres of Excellence in Cyber Security Research (2019). https://www.ncsc.gov.uk/information/academic-centres-excellence-cyber-security-research
Newman, M.E.J.: Equivalence between modularity optimization and maximum likelihood methods for community detection. Phys. Rev. E. 94(5), 052315:1–052315:8 (2016). https://doi.org/10.1103/PhysRevE.94.052315
Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E. 69(2), 026113:1–026113:15 (2004). https://doi.org/10.1103/PhysRevE.69.026113
NLTK Team: NLTK: Natural language toolkit (2023). https://www.nltk.org/
Nouh, M., Nurse, J.R.C.: Identifying key-players in online activist groups on the Facebook social network. In: Proceedings of the 2015 IEEE International Conference on Data Mining Workshop, pp. 969–978. IEEE (2015). https://doi.org/10.1109/icdmw.2015.88
Pattnaik, N., Li, S., Nurse, J.R.C.: Perspectives of non-expert users on cyber security and privacy: an analysis of online discussions on Twitter. Comput. Secur. 125, 103008:1–103008:15 (2023). https://doi.org/10.1016/j.cose.2022.103008
Pedregosa, F., et al.: Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011). https://jmlr.org/papers/v12/pedregosa11a.html
Řehůřek, R.: Gensim: Topic modelling for humans (2022). https://radimrehurek.com/gensim/index.html
Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. Elra (2010). http://is.muni.cz/publication/884893/en
Röder, M., Both, A., Hinneburg, A.: Exploring the space of topic coherence measures. In: Proceedings of the 8th ACM International Conference on Web Search and Data Mining, pp. 399–408. ACM (2015). https://doi.org/10.1145/2684822.2685324
Saura, J.R., Palacios-Marqués, D., Ribeiro-Soriano, D.: Using data mining techniques to explore security issues in smart living environments in Twitter. Comput. Commun. 179, 285–295 (2021). https://doi.org/10.1016/j.comcom.2021.08.021
Schubert, E., Sander, J., Ester, M., Kriegel, H.P., Xu, X.: DBSCAN revisited, revisited: Why and how you should (still) use DBSCAN. ACM Trans. Database Syst. 42(3), 19:1–19:21 (2017). https://doi.org/10.1145/3068335
Sievert, C., Shirley, K.: LDAvis: a method for visualizing and interpreting topics. In: Proceedings of the 2014 Workshop on Interactive Language Learning, Visualization, and Interfaces, pp. 63–70. ACL (2014). https://doi.org/10.3115/v1/W14-3110
Soni, K.: locationtagger (2022). https://pypi.org/project/locationtagger/
Tavabi, N., Bartley, N., Abeliuk, A., Soni, S., Ferrara, E., Lerman, K.: Characterizing activity on the deep and dark web. In: Companion Proceedings of the 2019 World Wide Web Conference, pp. 206–213. ACM (2019). https://doi.org/10.1145/3308560.3316502
Traag, V.A., Waltman, L., van Eck, N.J.: From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 9(1), 5233:1–5233:12 (2019). https://doi.org/10.1038/s41598-019-41695-z
We Are Social: DIGITAL 2023: What we learned. Special report, We Are Social Ltd (2023). https://wearesocial.com/uk/blog/2023/01/digital-2023/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Evaluating Baseline/Individual Classifiers Performance
Classifiers Training: before using the classifiers reported in [18], we re-validated their performance with our ACEs-CSR dataset (i.e. about 42,000 Twitter accounts), which is different from the ones these classifiers were trained with originally. We utilised the same original labelled datasets and followed the same steps for the feature extraction phase from [18]. After that, we selected the best-performing feature sets according to the reported results: C, L, PBC, and PBCL (see the original study for more details on the feature sets). We re-trained the classifiers using the same original models, Decision Tree (DT), Random Forest (RF), Logistic Regression (LR), SVM with linear kernel (SVM-L), and SVM with RBF kernel (SVM-R). To see if we could get better results, we added two more models: Extra Trees (ET) and eXtreme Gradient Boosting (XGBoost). The training process was also done using the Scikit-Learn library with 5-fold stratified cross-validation. The training results are shown in Table 1. We show only the best-performing feature sets.
Our results were similar to the original ones for the first five models. As for the ET models, we noticed a similarity in performance compared to the RF models. This was expected as they are quite similar methods. In some cases, the ET models performed slightly better than the RF models. The XGBoost models performed well for the Baseline classification task with the PBCL feature set, where the F1-score is 91%, similar to the RF and ET models. However, XGBoost was slightly ahead of all the other models (in terms of F1-score) using the PBCL feature set. To summarise the results, we noticed that RF and ET models performed well across all the classification tasks. As for the feature sets, we found that for both Baseline and Individual classification tasks, the PBCL feature set seemed to be a good and stable choice.
Manual Evaluation: to evaluate the performance of the trained classifiers on the prediction dataset, we had to manually verify the results by selecting a subset of Twitter accounts for each classification task and manually labelling them. After that, we compared the actual labels with the predicted labels to calculate the confusion matrix. Next, Accuracy, F1, Precision, and Recall were calculated. The results of the manual verification are shown in Table 6. For the Baseline classifier evaluation, we randomly selected 1,154 samples. The F1-score was 90%, which means a 2% drop in performance compared to the F1-score from the original training/testing results, reported in [18]. For the Individual classifier, we selected 1,003 samples, and the F1-score was 85%, representing a 5% drop in performance. However, considering the significant difference in size between the original training dataset and our prediction dataset (2k vs. 42k accounts) and the relatively small performance drop, we can confidently assert that both the Baseline and Individual classifiers are good enough for our case study.
B Issue with TextBlob Sentiment Analyser
Below are some example tweets that were wrongly classified by the TextBlob sentiment analyser as negative, while the VADER sentiment analyser classified them correctly as positive.
-
Our Academic Centre of Excellence in Cyber Security Research becomes active this week.
-
Academic Centre of Excellence in Cyber Security Research Open Day @ucl: @uclisec hosting an open day at the ACE center November 15th #infosec #CyberSecurity.
-
Congratulations to @UniKent @KingsCollegeLon and @cardiffuni who join @UniofOxford and 13 other UK universities as Academic Centres of Excellence in Cyber Security Research, announced recently by the National Cyber Security Centre @NCSC and @EPSRC.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Mahaini, M.I., Li, S. (2023). Cyber Security Researchers on Online Social Networks: From the Lens of the UK’s ACEs-CSR on Twitter. In: Arief, B., Monreale, A., Sirivianos, M., Li, S. (eds) Security and Privacy in Social Networks and Big Data. SocialSec 2023. Lecture Notes in Computer Science, vol 14097. Springer, Singapore. https://doi.org/10.1007/978-981-99-5177-2_8
Download citation
DOI: https://doi.org/10.1007/978-981-99-5177-2_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-5176-5
Online ISBN: 978-981-99-5177-2
eBook Packages: Computer ScienceComputer Science (R0)