Abstract
The number of malicious bots is increasing rapidly with the growing popularity of social media. We evaluate the importance of 19 commonly used features for Twitter bot detection. Our goal is to propose a set of minimal user-specific features for developing scalable Twitter bot detection systems. To identify the most important features, we apply three model inspection methods - Permutation Importance (PI), SHapely Additive exPlanation (SHAP), and Local Interpretable Model-agnostic Explanations (LIME). We find that the number of followers, friends, and favourites, and the rate of Tweets, making friends and liking Tweets are the most important user-specific features for Twitter bot detection. We apply the Wilcoxon signed rank test to compare the performance of the models trained using all features, using the important features and the features not found as important in our evaluation, respectively. We observe that there are no significant differences between the performance of the models trained using all features and the models trained using the important features. On the other hand, the models using the unimportant features by our evaluation show statistically significant poor performance. We demonstrate that the above six features are sufficient to identify Twitter bots.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Chen, R.-C., Dewi, C., Huang, S.-W., Caraka, R.E.: Selecting critical features for data classification based on machine learning methods. J. Big Data 7(1), 1–26 (2020). https://doi.org/10.1186/s40537-020-00327-4
Christoph, M.: Interpretable Machine Learning. A Guide for Making Black Box Models Explainable, Creative Commons Attribution, 2 edn (2022). https://christophm.github.io/interpretable-ml-book/
Cresci, S.: A decade of social bot detection. Commun. ACM 63(10), 72–83 (2020)
Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., Tesconi, M.: Fame for sale: efficient detection of fake twitter followers. Decis. Support Syst. 80, 56–71 (2015)
Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., Tesconi, M.: The paradigm-shift of social spambots: evidence, theories, and tools for the arms race. In: Proceedings of the 26th International Conference on World Wide Web Companion, WWW 2017 Companion, pp. 963–972 (2017)
Cresci, S., Lillo, F., Regoli, D., Tardelli, S., Tesconi, M.: Cashtag piggybacking: uncovering spam and bot activity in stock microblogs on twitter. ACM Trans. Web (TWEB) 13, 1–27 (2019)
Davis, C.A., Varol, O., Ferrara, E., Flammini, A., Menczer, F.: Botornot: a system to evaluate social bots (2016)
Engin, A.: The cognitive ability and working memory framework: interpreting cognitive reflection test results in the domain of the cognitive experiential theory. CEJOR 29(1), 227–245 (2021)
Feng, S., et al.: Twibot-22: towards graph-based twitter bot detection. arXiv preprint arXiv:2206.04564 (2022)
Feng, S., Wan, H., Wang, N., Li, J., Luo, M.: Twibot-20: a comprehensive twitter bot detection benchmark. In: Proceedings of the 30th ACM International Conference on Information & Knowledge Management (2021)
Fisher, A., Rudin, C., Dominici, F.: All models are wrong, but many are useful: learning a variable’s importance by studying an entire class of prediction models simultaneously. J. Mach. Learn. Res. 20, 1–81 (2019)
Fisher, M., Cox, J.W., Hermann, P.: Pizzagate: from rumor, to hashtag, to gunfire in D.C. (2016). https://www.washingtonpost.com/local/pizzagate-from-rumor-to-hashtag-to-gunfire-in-dc/2016/12/06/4c7def50-bbd4-11e6-94ac-3d324840106c_story.html
Gilani, Z., Farahbakhsh, R., Tyson, G., Wang, L., Crowcroft, J.: Of bots and humans (on twitter). In: ASONAM 2017: Advances in Social Networks Analysis and Mining 2017, pp. 349–354. Association for Computing Machinery (2017)
Gilani, Z., Kochmar, E., Crowcroft, J.: Classification of twitter accounts into automated agents and human users. In: Diesner, J., Ferrari, E., Xu, G. (eds.) Advances in Social Networks Analysis and Mining 2017, pp. 489–496. Association for Computing Machinery (2017)
Grömping, U.: Variable importance assessment in regression: linear regression versus random forest. Am. Stat. 308–319 (2009)
Lee, K., Eoff, B., Caverlee, J.: Seven months with the devils: a long-term study of content polluters on twitter. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 5, no. 1, pp. 185–192 (2021)
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Mazza, M., Cresci, S., Avvenuti, M., Quattrociocchi, W., Tesconi, M.: Rtbust: exploiting temporal patterns for botnet detection on twitter. In: Proceedings of the 10th ACM Conference on Web Science, pp. 183–192 (2019)
Parr, T., Wilson, J.D., Hamrick, J.: Nonparametric feature impact and importance. arXiv preprint arXiv:2006.04750 1 (2020)
Rey, D., Neuhäuser, M.: Wilcoxon-Signed-Rank Test, pp. 1658–1659 (2011)
Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should i trust you?” explaining the predictions of any classifier. In: KDD 2016: The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144. ACM, New York (2016)
Samuels, E., Kelly, M.: How false hope spread about hydroxychloroquine to treat COVID-19 - and the consequences that followed (2020). https://www.washingtonpost.com/politics/2020/04/13/how-false-hope-spread-about-hydroxychloroquine-its-consequences/
Shao, C., Ciampaglia, G.L., Flammini, A., Menczer, F.: Hoaxy: a platform for tracking online misinformation. In: WWW 2016: 25th International World Wide Web Conference, pp. 745–750. International Conference Companion on World Wide Web (2016)
Strobl, C., Boulesteix, A.L., Zeileis, A., Hothorn, T.: Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinform. 8, 1–21 (2007)
Tao, J., Kang, Y.: Features importance analysis for emotional speech classification. In: Tao, J., Tan, T., Picard, R.W. (eds.) ACII 2005. LNCS, vol. 3784, pp. 449–457. Springer, Heidelberg (2005). https://doi.org/10.1007/11573548_58
Varol, O., Ferrara, E., Davis, C., Menczer, F., Flammini, A.: Online human-bot interactions: detection, estimation, and characterization. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 11. AAAI (2017)
Wetschoreck, F., Krabel, T., Krishnamurthy, S.: 8080labs/ppscore: zenodo release (2020). https://doi.org/10.5281/zenodo.4091345
Wu, Y., Ngai, E.W., Wu, P., Wu, C.: Fake online reviews: literature review, synthesis, and directions for future research. Decis. Support Syst. 132 (2020)
Yang, K.C., Varol, O., Davis, C.A., Ferrara, E., Flammini, A., Menczer, F.: Arming the public with artificial intelligence to counter social bots. Hum. Behav. Emerg. Technol. 1, 48–61 (2019)
Yang, K.C., Varol, O., Hui, P.M., Menczer, F.: Scalable and generalizable social bot detection through data selection. In: Conference on Artificial Intelligence, vol. 34, pp. 1096–1103. AAAI (2020)
Zhou, X., Zafarani, R.: A survey of fake news: fundamental theories, detection methods, and opportunities. ACM Comput. Surv. (CSUR) 53, 1–40 (2020)
Zhu, F., Jiang, M., Qiu, Y., Sun, C., Wang, M.: RSLIME: an efficient feature importance analysis approach for industrial recommendation systems. In: International Joint Conference on Neural Networks (IJCNN), pp. 1–6. IEEE (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Tabassum, F., Mubarak, S., Liu, L., Du, J.T. (2023). How Many Features Do We Need to Identify Bots on Twitter?. In: Sserwanga, I., et al. Information for a Better World: Normality, Virtuality, Physicality, Inclusivity. iConference 2023. Lecture Notes in Computer Science, vol 13971. Springer, Cham. https://doi.org/10.1007/978-3-031-28035-1_22
Download citation
DOI: https://doi.org/10.1007/978-3-031-28035-1_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28034-4
Online ISBN: 978-3-031-28035-1
eBook Packages: Computer ScienceComputer Science (R0)