Skip to main content

Malicious Bot Detection in Online Social Networks: Arming Handcrafted Features with Deep Learning

  • Conference paper
  • First Online:
Book cover Social Informatics (SocInfo 2020)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12467))

Included in the following conference series:

Abstract

Online social networks (OSNs) have long been suffering from various types of malicious bots (e.g., spammers, fake followers, social bots, and content polluters). Recent studies show that they have also been actively involved in delivering hate speeches and disseminating misinformation. Over several years, researchers have proposed multiple approaches to identify some types of them to lower their impact on the OSNs. However, their strategies mostly focused on handcrafted features to capture characteristics of malicious users, or their deep learning approaches may only work under certain situations (e.g., under the dense retweets/sharing behavior). To overcome the limitation of the prior work, in this paper, we propose a novel framework that incorporates handcrafted features and automatically learned features by deep learning methods from various perspectives. It automatically makes the balance between them to make the final prediction toward detecting malicious bots. In particular, we (i) combine publicly available 15 Twitter user datasets and categorize these accounts into two groups (i.e., legitimate accounts and malicious bot accounts); and (ii) propose a deep learning framework that jointly learns various features and detects malicious accounts. Our experimental results show that our proposed model outperforms 7 state-of-the-art methods, achieving 0.901 accuracy. Our ablation study shows that all types of our features positively contribute to enhancing the model performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We use terms user and account, interchangeably.

  2. 2.

    https://bit.ly/39mGlnm and https://bit.ly/3hlpt38.

  3. 3.

    https://help.twitter.com/en/rules-and-policies/twitter-rules.

  4. 4.

    https://botometer.iuni.iu.edu/bot-repository/datasets.html.

  5. 5.

    http://liwc.wpengine.com/.

References

  1. Adewole, K.S., Anuar, N.B., Kamsin, A., Varathan, K.D., Razak, S.A.: Malicious accounts: dark of the social networks. J. Netw. Comput. Appl. 79, 41–67 (2017)

    Article  Google Scholar 

  2. Albadi, N., Kurdi, M., Mishra, S.: Hateful people or hateful bots? detection and characterization of bots spreading religious hatred in Arabic social media. CSCW (2019)

    Google Scholar 

  3. Alfifi, M., Caverlee, J.: Badly evolved? Exploring long-surviving suspicious users on twitter. In: Ciampaglia, G.L., Mashhadi, A., Yasseri, T. (eds.) SocInfo 2017. LNCS, vol. 10539, pp. 218–233. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67217-5_14

    Chapter  Google Scholar 

  4. Alfifi, M., Kaghazgaran, P., Caverlee, J., Morstatter, F.: Measuring the impact of ISIS social media strategy. In: MIS2 (2018)

    Google Scholar 

  5. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015)

    Google Scholar 

  6. Beskow, D.M., Carley, K.M.: Bot conversations are different: leveraging network metrics for bot detection in twitter. In: ASONAM (2018)

    Google Scholar 

  7. Bhat, S.Y., Abulaish, M.: Community-based features for identifying spammers in online social networks. In: ASONAM (2013)

    Google Scholar 

  8. Campello, R.J.G.B., Moulavi, D., Sander, J.: Density-based clustering based on hierarchical density estimates. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013. LNCS (LNAI), vol. 7819, pp. 160–172. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37456-2_14

    Chapter  Google Scholar 

  9. Cer, D., et al.: Universal sentence encoder for English. In: EMNLP (2018)

    Google Scholar 

  10. Chavoshi, N., Hamooni, H., Mueen, A.: Temporal patterns in bot activities. In: WWW (2017)

    Google Scholar 

  11. Chavoshi, N., Mueen, A.: Model bots, not humans on social media. In: ASONAM (2018)

    Google Scholar 

  12. Conroy, N.J., Rubin, V.L., Chen, Y.: Automatic deception detection: methods for finding fake news. In: Proceedings of the 78th ASIS&T Annual Meeting (2015)

    Google Scholar 

  13. Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., Tesconi, M.: Fame for sale: efficient detection of fake twitter followers. Decis. Support Syst. 80, 56–71 (2015)

    Article  Google Scholar 

  14. Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., Tesconi, M.: The paradigm-shift of social spambots: evidence, theories, and tools for the arms race. In: WWW (2017)

    Google Scholar 

  15. Cresci, S., Lillo, F., Regoli, D., Tardelli, S., Tesconi, M.: \$ fake: Evidence of spam and bot activity in stock microblogs on twitter. In: ICWSM (2018)

    Google Scholar 

  16. Cresci, S., Lillo, F., Regoli, D., Tardelli, S., Tesconi, M.: Cashtag piggybacking: uncovering spam and bot activity in stock microblogs on twitter. ACM Trans. Web (TWEB) 13(2), 1–27 (2019)

    Article  Google Scholar 

  17. Cresci, S., Petrocchi, M., Spognardi, A., Tognazzi, S.: Better safe than sorry: an adversarial approach to improve social bot detection. In: WebSci (2019)

    Google Scholar 

  18. Davis, C.A., Varol, O., Ferrara, E., Flammini, A., Menczer, F.: Botornot: a system to evaluate social bots. In: WWW (2016)

    Google Scholar 

  19. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL (2019)

    Google Scholar 

  20. Dhingra, B., Zhou, Z., Fitzpatrick, D., Muehl, M., Cohen, W.: Tweet2vec: character-based distributed representations for social media. In: ACL (2016)

    Google Scholar 

  21. Ferrara, E.: Measuring social spam and the effect of bots on information diffusion in social media. In: Lehmann, S., Ahn, Y.-Y. (eds.) Complex Spreading Phenomena in Social Systems. CSS, pp. 229–255. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77332-2_13

    Chapter  Google Scholar 

  22. Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: ICML (2016)

    Google Scholar 

  23. Gilani, Z., Farahbakhsh, R., Tyson, G., Wang, L., Crowcroft, J.: Of bots and humans (on twitter). In: ASONAM (2017)

    Google Scholar 

  24. Howard, J., Ruder, S.: Universal language model fine-tuning for text classification. In: ACL, July 2018

    Google Scholar 

  25. Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP (2014)

    Google Scholar 

  26. Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: ICLR (2014)

    Google Scholar 

  27. Ko, R.: Social media is full of bots spreading Covid-19 anxiety. Don’t fall for it (2020). https://www.sciencealert.com/bots-are-causing-anxiety-by-spreading-coronavirus-misinformation

  28. Kudugunta, S., Ferrara, E.: Deep neural networks for bot detection. Inf. Sci. 467, 312–322 (2018)

    Article  Google Scholar 

  29. Lee, K., Eoff, B.D., Caverlee, J.: Seven months with the devils: a long-term study of content polluters on twitter. In: ICWSM (2011)

    Google Scholar 

  30. Ma, J., Gao, W., Wei, Z., Lu, Y., Wong, K.F.: Detect rumors using time series of social context information on microblogging websites. In: CIKM (2015)

    Google Scholar 

  31. Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: ICML (2013)

    Google Scholar 

  32. Mazza, M., Cresci, S., Avvenuti, M., Quattrociocchi, W., Tesconi, M.: RTbust: exploiting temporal patterns for botnet detection on twitter. In: WEBSCI (2019)

    Google Scholar 

  33. Miller, Z., Dickinson, B., Deitrick, W., Hu, W., Wang, A.H.: Twitter spammer detection using data stream clustering. Inf. Sci. 260, 64–73 (2014)

    Article  Google Scholar 

  34. Morstatter, F., Wu, L., Nazer, T.H., Carley, K.M., Liu, H.: A new approach to bot detection: striking the balance between precision and recall. In: ASONAM (2016)

    Google Scholar 

  35. Ruths, D.: The misinformation machine. Science 363(6425), 348 (2019)

    Article  Google Scholar 

  36. Shao, C., Ciampaglia, G.L., Varol, O., Yang, K.C., Flammini, A., Menczer, F.: The spread of low-credibility content by social bots. Nat. Commun. 9(1), 4787 (2018)

    Article  Google Scholar 

  37. Subrahmanian, V., et al.: The darpa twitter bot challenge. Computer 49(6), 38–46 (2016)

    Article  Google Scholar 

  38. Varol, O., Ferrara, E., Davis, C.A., Menczer, F., Flammini, A.: Online human-bot interactions: detection, estimation, and characterization. In: ICWSM (2017)

    Google Scholar 

  39. Wang, Z., Oates, T.: Encoding time series as images for visual inspection and classification using tiled convolutional neural networks. In: AAAI-W (2015)

    Google Scholar 

  40. Yang, C., Harkreader, R., Gu, G.: Empirical evaluation and new design for fighting evolving twitter spammers. IEEE Trans. Inf. Forens. Secur. 8(8), 1280–1293 (2013)

    Article  Google Scholar 

  41. Yang, K.C., Varol, O., Davis, C.A., Ferrara, E., Flammini, A., Menczer, F.: Arming the public with artificial intelligence to counter social bots. Hum. Behav. Emerg. Technol. 1(1), 48–61 (2019)

    Article  Google Scholar 

  42. Yang, K.C., Varol, O., Hui, P.M., Menczer, F.: Scalable and generalizable social bot detection through data selection. In: AAAI (2020)

    Google Scholar 

  43. Young, L.Y.: The effect of moderator bots on abusive language use. In: ICPRAI (2018)

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by NSF grant CNS-1755536, AWS Cloud Credits for Research, and Google Cloud.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Guanyi Mou or Kyumin Lee .

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 Account Status

As we keep the original information of Lee’11, Cresci’15 and Cresci’17, we checked the current status of those malicious bots as shown in Table 6. Overall 68.3% malicious bots are still alive on Twitter, some of which lived more than ten years. This fact indicates that there is a great room to improve the current Twitter’s bot detection system.

1.2 A.2 Source Dataset Details

We list the original user types that each dataset contains as follows:

Lee’11 [29]: content polluters, and legitimate users

Cresci’15 [13]: various kinds of fake accounts

Cresci’17 [14]: traditional & social spambots, and legitimate users

Cresci’18 [15, 16]: stock related bots, and legitimate users

RTBust’19 [32]: retweet bots, and legitimate users

Gilani’17 [23]: bots, and legitimate users

Varol’17 [38]: bots, and legitimate users

Midterm’18 [42]: political bots, and legitimate users

Botwiki’19 [42]: social and bots

Political’19 [41]: political bots

Pronbots’19 [41]: bots advertising scam sites

Vendor’19 [41]: fake followers

Verified’19 [41]: verified legitimate users

Celebrity’19 [41]: celebrity accounts (legitimate)

Botometer’19 [41]: bots and legitimate users

We grouped legitimate users, verified accounts and celebrity accounts as legitimate, while other types of accounts as malicious bots.

Table 6. Recent status of malicious accounts.

1.3 A.3 Detailed Baseline Descriptions

Lee’11 [29]. Authors proposed handcrafted features extracted from user profiles, posting contents and the change of following/follower list over time. We built their best Random Forest model without the network features.

Kim’14 [25]. This is a convolutional text classification architecture that achieved comparable performance against state-of-the-art models. Its hyper-parameters are stable across different domains. We applied his work in using the tweets posted by each user for classifying the accounts.

Tweet2Vec’16 [20]. Tweet2Vec was proposed as a general-purpose tweet embedding framework, trained with neural networks for the hashtag prediction subtask. This work generates domain-specific feature representations of tweets. We constructed a bot detection model, following the proposed architecture, where the embedding layer is followed with fully connected layers.

Chavoshi’18 [11]. Authors proposed a method for mapping the posting timestamp pairs into 2D images to make better use of the temporal posting behavior information of each account. Convolutional neural networks can be applied for the downstream bot detection task.

Kudugunta’19 [28]. This is a framework using LSTM for learning content features and then combine them with several handcrafted features.

RTBust’19 [32]. RTBust is a framework using temporal retweet/tweet patterns for bot detection. Such a framework captures the information in tweet/retweet sequences and extracted features using the variational autoencoder (VAE) [26]. Then the feature embedding generated by the encoders is fed into HDBSCAN [8], an unsupervised clustering method. Outliers are treated as malicious bots.

Yang’19 [42]. Random Forest built on various authors’ proposed features.

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mou, G., Lee, K. (2020). Malicious Bot Detection in Online Social Networks: Arming Handcrafted Features with Deep Learning. In: Aref, S., et al. Social Informatics. SocInfo 2020. Lecture Notes in Computer Science(), vol 12467. Springer, Cham. https://doi.org/10.1007/978-3-030-60975-7_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-60975-7_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-60974-0

  • Online ISBN: 978-3-030-60975-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics