Skip to main content

How Many Features Do We Need to Identify Bots on Twitter?

  • Conference paper
  • First Online:
Information for a Better World: Normality, Virtuality, Physicality, Inclusivity (iConference 2023)

Abstract

The number of malicious bots is increasing rapidly with the growing popularity of social media. We evaluate the importance of 19 commonly used features for Twitter bot detection. Our goal is to propose a set of minimal user-specific features for developing scalable Twitter bot detection systems. To identify the most important features, we apply three model inspection methods - Permutation Importance (PI), SHapely Additive exPlanation (SHAP), and Local Interpretable Model-agnostic Explanations (LIME). We find that the number of followers, friends, and favourites, and the rate of Tweets, making friends and liking Tweets are the most important user-specific features for Twitter bot detection. We apply the Wilcoxon signed rank test to compare the performance of the models trained using all features, using the important features and the features not found as important in our evaluation, respectively. We observe that there are no significant differences between the performance of the models trained using all features and the models trained using the important features. On the other hand, the models using the unimportant features by our evaluation show statistically significant poor performance. We demonstrate that the above six features are sufficient to identify Twitter bots.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/Microsoft/LightGBM/.

  2. 2.

    https://xgboost.ai/.

  3. 3.

    https://github.com/slundberg/shap.

  4. 4.

    https://github.com/marcotcr/lime.

References

  1. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)

    Article  MATH  Google Scholar 

  2. Chen, R.-C., Dewi, C., Huang, S.-W., Caraka, R.E.: Selecting critical features for data classification based on machine learning methods. J. Big Data 7(1), 1–26 (2020). https://doi.org/10.1186/s40537-020-00327-4

    Article  Google Scholar 

  3. Christoph, M.: Interpretable Machine Learning. A Guide for Making Black Box Models Explainable, Creative Commons Attribution, 2 edn (2022). https://christophm.github.io/interpretable-ml-book/

  4. Cresci, S.: A decade of social bot detection. Commun. ACM 63(10), 72–83 (2020)

    Article  Google Scholar 

  5. Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., Tesconi, M.: Fame for sale: efficient detection of fake twitter followers. Decis. Support Syst. 80, 56–71 (2015)

    Article  Google Scholar 

  6. Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., Tesconi, M.: The paradigm-shift of social spambots: evidence, theories, and tools for the arms race. In: Proceedings of the 26th International Conference on World Wide Web Companion, WWW 2017 Companion, pp. 963–972 (2017)

    Google Scholar 

  7. Cresci, S., Lillo, F., Regoli, D., Tardelli, S., Tesconi, M.: Cashtag piggybacking: uncovering spam and bot activity in stock microblogs on twitter. ACM Trans. Web (TWEB) 13, 1–27 (2019)

    Article  Google Scholar 

  8. Davis, C.A., Varol, O., Ferrara, E., Flammini, A., Menczer, F.: Botornot: a system to evaluate social bots (2016)

    Google Scholar 

  9. Engin, A.: The cognitive ability and working memory framework: interpreting cognitive reflection test results in the domain of the cognitive experiential theory. CEJOR 29(1), 227–245 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  10. Feng, S., et al.: Twibot-22: towards graph-based twitter bot detection. arXiv preprint arXiv:2206.04564 (2022)

  11. Feng, S., Wan, H., Wang, N., Li, J., Luo, M.: Twibot-20: a comprehensive twitter bot detection benchmark. In: Proceedings of the 30th ACM International Conference on Information & Knowledge Management (2021)

    Google Scholar 

  12. Fisher, A., Rudin, C., Dominici, F.: All models are wrong, but many are useful: learning a variable’s importance by studying an entire class of prediction models simultaneously. J. Mach. Learn. Res. 20, 1–81 (2019)

    MathSciNet  MATH  Google Scholar 

  13. Fisher, M., Cox, J.W., Hermann, P.: Pizzagate: from rumor, to hashtag, to gunfire in D.C. (2016). https://www.washingtonpost.com/local/pizzagate-from-rumor-to-hashtag-to-gunfire-in-dc/2016/12/06/4c7def50-bbd4-11e6-94ac-3d324840106c_story.html

  14. Gilani, Z., Farahbakhsh, R., Tyson, G., Wang, L., Crowcroft, J.: Of bots and humans (on twitter). In: ASONAM 2017: Advances in Social Networks Analysis and Mining 2017, pp. 349–354. Association for Computing Machinery (2017)

    Google Scholar 

  15. Gilani, Z., Kochmar, E., Crowcroft, J.: Classification of twitter accounts into automated agents and human users. In: Diesner, J., Ferrari, E., Xu, G. (eds.) Advances in Social Networks Analysis and Mining 2017, pp. 489–496. Association for Computing Machinery (2017)

    Google Scholar 

  16. Grömping, U.: Variable importance assessment in regression: linear regression versus random forest. Am. Stat. 308–319 (2009)

    Google Scholar 

  17. Lee, K., Eoff, B., Caverlee, J.: Seven months with the devils: a long-term study of content polluters on twitter. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 5, no. 1, pp. 185–192 (2021)

    Google Scholar 

  18. Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  19. Mazza, M., Cresci, S., Avvenuti, M., Quattrociocchi, W., Tesconi, M.: Rtbust: exploiting temporal patterns for botnet detection on twitter. In: Proceedings of the 10th ACM Conference on Web Science, pp. 183–192 (2019)

    Google Scholar 

  20. Parr, T., Wilson, J.D., Hamrick, J.: Nonparametric feature impact and importance. arXiv preprint arXiv:2006.04750 1 (2020)

  21. Rey, D., Neuhäuser, M.: Wilcoxon-Signed-Rank Test, pp. 1658–1659 (2011)

    Google Scholar 

  22. Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should i trust you?” explaining the predictions of any classifier. In: KDD 2016: The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144. ACM, New York (2016)

    Google Scholar 

  23. Samuels, E., Kelly, M.: How false hope spread about hydroxychloroquine to treat COVID-19 - and the consequences that followed (2020). https://www.washingtonpost.com/politics/2020/04/13/how-false-hope-spread-about-hydroxychloroquine-its-consequences/

  24. Shao, C., Ciampaglia, G.L., Flammini, A., Menczer, F.: Hoaxy: a platform for tracking online misinformation. In: WWW 2016: 25th International World Wide Web Conference, pp. 745–750. International Conference Companion on World Wide Web (2016)

    Google Scholar 

  25. Strobl, C., Boulesteix, A.L., Zeileis, A., Hothorn, T.: Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinform. 8, 1–21 (2007)

    Article  Google Scholar 

  26. Tao, J., Kang, Y.: Features importance analysis for emotional speech classification. In: Tao, J., Tan, T., Picard, R.W. (eds.) ACII 2005. LNCS, vol. 3784, pp. 449–457. Springer, Heidelberg (2005). https://doi.org/10.1007/11573548_58

    Chapter  Google Scholar 

  27. Varol, O., Ferrara, E., Davis, C., Menczer, F., Flammini, A.: Online human-bot interactions: detection, estimation, and characterization. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 11. AAAI (2017)

    Google Scholar 

  28. Wetschoreck, F., Krabel, T., Krishnamurthy, S.: 8080labs/ppscore: zenodo release (2020). https://doi.org/10.5281/zenodo.4091345

  29. Wu, Y., Ngai, E.W., Wu, P., Wu, C.: Fake online reviews: literature review, synthesis, and directions for future research. Decis. Support Syst. 132 (2020)

    Google Scholar 

  30. Yang, K.C., Varol, O., Davis, C.A., Ferrara, E., Flammini, A., Menczer, F.: Arming the public with artificial intelligence to counter social bots. Hum. Behav. Emerg. Technol. 1, 48–61 (2019)

    Article  Google Scholar 

  31. Yang, K.C., Varol, O., Hui, P.M., Menczer, F.: Scalable and generalizable social bot detection through data selection. In: Conference on Artificial Intelligence, vol. 34, pp. 1096–1103. AAAI (2020)

    Google Scholar 

  32. Zhou, X., Zafarani, R.: A survey of fake news: fundamental theories, detection methods, and opportunities. ACM Comput. Surv. (CSUR) 53, 1–40 (2020)

    Article  Google Scholar 

  33. Zhu, F., Jiang, M., Qiu, Y., Sun, C., Wang, M.: RSLIME: an efficient feature importance analysis approach for industrial recommendation systems. In: International Joint Conference on Neural Networks (IJCNN), pp. 1–6. IEEE (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fatima Tabassum .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tabassum, F., Mubarak, S., Liu, L., Du, J.T. (2023). How Many Features Do We Need to Identify Bots on Twitter?. In: Sserwanga, I., et al. Information for a Better World: Normality, Virtuality, Physicality, Inclusivity. iConference 2023. Lecture Notes in Computer Science, vol 13971. Springer, Cham. https://doi.org/10.1007/978-3-031-28035-1_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-28035-1_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-28034-4

  • Online ISBN: 978-3-031-28035-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics