How Many Features Do We Need to Identify Bots on Twitter?

Tabassum, Fatima; Mubarak, Sameera; Liu, Lin; Du, Jia Tina

doi:10.1007/978-3-031-28035-1_22

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13971))

Included in the following conference series:

International Conference on Information

1032 Accesses

Abstract

The number of malicious bots is increasing rapidly with the growing popularity of social media. We evaluate the importance of 19 commonly used features for Twitter bot detection. Our goal is to propose a set of minimal user-specific features for developing scalable Twitter bot detection systems. To identify the most important features, we apply three model inspection methods - Permutation Importance (PI), SHapely Additive exPlanation (SHAP), and Local Interpretable Model-agnostic Explanations (LIME). We find that the number of followers, friends, and favourites, and the rate of Tweets, making friends and liking Tweets are the most important user-specific features for Twitter bot detection. We apply the Wilcoxon signed rank test to compare the performance of the models trained using all features, using the important features and the features not found as important in our evaluation, respectively. We observe that there are no significant differences between the performance of the models trained using all features and the models trained using the important features. On the other hand, the models using the unimportant features by our evaluation show statistically significant poor performance. We demonstrate that the above six features are sufficient to identify Twitter bots.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Article MATH Google Scholar
Chen, R.-C., Dewi, C., Huang, S.-W., Caraka, R.E.: Selecting critical features for data classification based on machine learning methods. J. Big Data 7(1), 1–26 (2020). https://doi.org/10.1186/s40537-020-00327-4
Article Google Scholar
Christoph, M.: Interpretable Machine Learning. A Guide for Making Black Box Models Explainable, Creative Commons Attribution, 2 edn (2022). https://christophm.github.io/interpretable-ml-book/
Cresci, S.: A decade of social bot detection. Commun. ACM 63(10), 72–83 (2020)
Article Google Scholar
Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., Tesconi, M.: Fame for sale: efficient detection of fake twitter followers. Decis. Support Syst. 80, 56–71 (2015)
Article Google Scholar
Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., Tesconi, M.: The paradigm-shift of social spambots: evidence, theories, and tools for the arms race. In: Proceedings of the 26th International Conference on World Wide Web Companion, WWW 2017 Companion, pp. 963–972 (2017)
Google Scholar
Cresci, S., Lillo, F., Regoli, D., Tardelli, S., Tesconi, M.: Cashtag piggybacking: uncovering spam and bot activity in stock microblogs on twitter. ACM Trans. Web (TWEB) 13, 1–27 (2019)
Article Google Scholar
Davis, C.A., Varol, O., Ferrara, E., Flammini, A., Menczer, F.: Botornot: a system to evaluate social bots (2016)
Google Scholar
Engin, A.: The cognitive ability and working memory framework: interpreting cognitive reflection test results in the domain of the cognitive experiential theory. CEJOR 29(1), 227–245 (2021)
Article MathSciNet MATH Google Scholar
Feng, S., et al.: Twibot-22: towards graph-based twitter bot detection. arXiv preprint arXiv:2206.04564 (2022)
Feng, S., Wan, H., Wang, N., Li, J., Luo, M.: Twibot-20: a comprehensive twitter bot detection benchmark. In: Proceedings of the 30th ACM International Conference on Information & Knowledge Management (2021)
Google Scholar
Fisher, A., Rudin, C., Dominici, F.: All models are wrong, but many are useful: learning a variable’s importance by studying an entire class of prediction models simultaneously. J. Mach. Learn. Res. 20, 1–81 (2019)
MathSciNet MATH Google Scholar
Fisher, M., Cox, J.W., Hermann, P.: Pizzagate: from rumor, to hashtag, to gunfire in D.C. (2016). https://www.washingtonpost.com/local/pizzagate-from-rumor-to-hashtag-to-gunfire-in-dc/2016/12/06/4c7def50-bbd4-11e6-94ac-3d324840106c_story.html
Gilani, Z., Farahbakhsh, R., Tyson, G., Wang, L., Crowcroft, J.: Of bots and humans (on twitter). In: ASONAM 2017: Advances in Social Networks Analysis and Mining 2017, pp. 349–354. Association for Computing Machinery (2017)
Google Scholar
Gilani, Z., Kochmar, E., Crowcroft, J.: Classification of twitter accounts into automated agents and human users. In: Diesner, J., Ferrari, E., Xu, G. (eds.) Advances in Social Networks Analysis and Mining 2017, pp. 489–496. Association for Computing Machinery (2017)
Google Scholar
Grömping, U.: Variable importance assessment in regression: linear regression versus random forest. Am. Stat. 308–319 (2009)
Google Scholar
Lee, K., Eoff, B., Caverlee, J.: Seven months with the devils: a long-term study of content polluters on twitter. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 5, no. 1, pp. 185–192 (2021)
Google Scholar
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Mazza, M., Cresci, S., Avvenuti, M., Quattrociocchi, W., Tesconi, M.: Rtbust: exploiting temporal patterns for botnet detection on twitter. In: Proceedings of the 10th ACM Conference on Web Science, pp. 183–192 (2019)
Google Scholar
Parr, T., Wilson, J.D., Hamrick, J.: Nonparametric feature impact and importance. arXiv preprint arXiv:2006.04750 1 (2020)
Rey, D., Neuhäuser, M.: Wilcoxon-Signed-Rank Test, pp. 1658–1659 (2011)
Google Scholar
Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should i trust you?” explaining the predictions of any classifier. In: KDD 2016: The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144. ACM, New York (2016)
Google Scholar
Samuels, E., Kelly, M.: How false hope spread about hydroxychloroquine to treat COVID-19 - and the consequences that followed (2020). https://www.washingtonpost.com/politics/2020/04/13/how-false-hope-spread-about-hydroxychloroquine-its-consequences/
Shao, C., Ciampaglia, G.L., Flammini, A., Menczer, F.: Hoaxy: a platform for tracking online misinformation. In: WWW 2016: 25th International World Wide Web Conference, pp. 745–750. International Conference Companion on World Wide Web (2016)
Google Scholar
Strobl, C., Boulesteix, A.L., Zeileis, A., Hothorn, T.: Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinform. 8, 1–21 (2007)
Article Google Scholar
Tao, J., Kang, Y.: Features importance analysis for emotional speech classification. In: Tao, J., Tan, T., Picard, R.W. (eds.) ACII 2005. LNCS, vol. 3784, pp. 449–457. Springer, Heidelberg (2005). https://doi.org/10.1007/11573548_58
Chapter Google Scholar
Varol, O., Ferrara, E., Davis, C., Menczer, F., Flammini, A.: Online human-bot interactions: detection, estimation, and characterization. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 11. AAAI (2017)
Google Scholar
Wetschoreck, F., Krabel, T., Krishnamurthy, S.: 8080labs/ppscore: zenodo release (2020). https://doi.org/10.5281/zenodo.4091345
Wu, Y., Ngai, E.W., Wu, P., Wu, C.: Fake online reviews: literature review, synthesis, and directions for future research. Decis. Support Syst. 132 (2020)
Google Scholar
Yang, K.C., Varol, O., Davis, C.A., Ferrara, E., Flammini, A., Menczer, F.: Arming the public with artificial intelligence to counter social bots. Hum. Behav. Emerg. Technol. 1, 48–61 (2019)
Article Google Scholar
Yang, K.C., Varol, O., Hui, P.M., Menczer, F.: Scalable and generalizable social bot detection through data selection. In: Conference on Artificial Intelligence, vol. 34, pp. 1096–1103. AAAI (2020)
Google Scholar
Zhou, X., Zafarani, R.: A survey of fake news: fundamental theories, detection methods, and opportunities. ACM Comput. Surv. (CSUR) 53, 1–40 (2020)
Article Google Scholar
Zhu, F., Jiang, M., Qiu, Y., Sun, C., Wang, M.: RSLIME: an efficient feature importance analysis approach for industrial recommendation systems. In: International Joint Conference on Neural Networks (IJCNN), pp. 1–6. IEEE (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

STEM, University of South Australia, Mawson Lakes, SA, 5095, Australia
Fatima Tabassum, Sameera Mubarak, Lin Liu & Jia Tina Du

Authors

Fatima Tabassum
View author publications
You can also search for this author in PubMed Google Scholar
Sameera Mubarak
View author publications
You can also search for this author in PubMed Google Scholar
Lin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jia Tina Du
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fatima Tabassum .

Editor information

Editors and Affiliations

iSchool Organization, Berlin, Germany
Isaac Sserwanga
Victoria University of Wellington, Wellington, New Zealand
Anne Goulding
University of Missouri, Chicago, IL, USA
Heather Moulaison-Sandy
University of South Australia, Adelaide, SA, Australia
Jia Tina Du
University of Porto, Porto, Portugal
António Lucas Soares
Monash University, Clayton, VIC, Australia
Viviane Hessami
University of Tennessee at Knoxville, Knoxville, TN, USA
Rebecca D. Frank

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tabassum, F., Mubarak, S., Liu, L., Du, J.T. (2023). How Many Features Do We Need to Identify Bots on Twitter?. In: Sserwanga, I., et al. Information for a Better World: Normality, Virtuality, Physicality, Inclusivity. iConference 2023. Lecture Notes in Computer Science, vol 13971. Springer, Cham. https://doi.org/10.1007/978-3-031-28035-1_22

Download citation

DOI: https://doi.org/10.1007/978-3-031-28035-1_22
Published: 10 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28034-4
Online ISBN: 978-3-031-28035-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

How Many Features Do We Need to Identify Bots on Twitter?