Malicious Bot Detection in Online Social Networks: Arming Handcrafted Features with Deep Learning

Mou, Guanyi; Lee, Kyumin

doi:10.1007/978-3-030-60975-7_17

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12467))

Included in the following conference series:

International Conference on Social Informatics

2389 Accesses
6 Citations

Abstract

Online social networks (OSNs) have long been suffering from various types of malicious bots (e.g., spammers, fake followers, social bots, and content polluters). Recent studies show that they have also been actively involved in delivering hate speeches and disseminating misinformation. Over several years, researchers have proposed multiple approaches to identify some types of them to lower their impact on the OSNs. However, their strategies mostly focused on handcrafted features to capture characteristics of malicious users, or their deep learning approaches may only work under certain situations (e.g., under the dense retweets/sharing behavior). To overcome the limitation of the prior work, in this paper, we propose a novel framework that incorporates handcrafted features and automatically learned features by deep learning methods from various perspectives. It automatically makes the balance between them to make the final prediction toward detecting malicious bots. In particular, we (i) combine publicly available 15 Twitter user datasets and categorize these accounts into two groups (i.e., legitimate accounts and malicious bot accounts); and (ii) propose a deep learning framework that jointly learns various features and detects malicious accounts. Our experimental results show that our proposed model outperforms 7 state-of-the-art methods, achieving 0.901 accuracy. Our ablation study shows that all types of our features positively contribute to enhancing the model performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We use terms user and account, interchangeably.
2.
https://bit.ly/39mGlnm and https://bit.ly/3hlpt38.
3.
https://help.twitter.com/en/rules-and-policies/twitter-rules.
4.
https://botometer.iuni.iu.edu/bot-repository/datasets.html.
5.
http://liwc.wpengine.com/.

References

Adewole, K.S., Anuar, N.B., Kamsin, A., Varathan, K.D., Razak, S.A.: Malicious accounts: dark of the social networks. J. Netw. Comput. Appl. 79, 41–67 (2017)
Article Google Scholar
Albadi, N., Kurdi, M., Mishra, S.: Hateful people or hateful bots? detection and characterization of bots spreading religious hatred in Arabic social media. CSCW (2019)
Google Scholar
Alfifi, M., Caverlee, J.: Badly evolved? Exploring long-surviving suspicious users on twitter. In: Ciampaglia, G.L., Mashhadi, A., Yasseri, T. (eds.) SocInfo 2017. LNCS, vol. 10539, pp. 218–233. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67217-5_14
Chapter Google Scholar
Alfifi, M., Kaghazgaran, P., Caverlee, J., Morstatter, F.: Measuring the impact of ISIS social media strategy. In: MIS2 (2018)
Google Scholar
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015)
Google Scholar
Beskow, D.M., Carley, K.M.: Bot conversations are different: leveraging network metrics for bot detection in twitter. In: ASONAM (2018)
Google Scholar
Bhat, S.Y., Abulaish, M.: Community-based features for identifying spammers in online social networks. In: ASONAM (2013)
Google Scholar
Campello, R.J.G.B., Moulavi, D., Sander, J.: Density-based clustering based on hierarchical density estimates. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013. LNCS (LNAI), vol. 7819, pp. 160–172. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37456-2_14
Chapter Google Scholar
Cer, D., et al.: Universal sentence encoder for English. In: EMNLP (2018)
Google Scholar
Chavoshi, N., Hamooni, H., Mueen, A.: Temporal patterns in bot activities. In: WWW (2017)
Google Scholar
Chavoshi, N., Mueen, A.: Model bots, not humans on social media. In: ASONAM (2018)
Google Scholar
Conroy, N.J., Rubin, V.L., Chen, Y.: Automatic deception detection: methods for finding fake news. In: Proceedings of the 78th ASIS&T Annual Meeting (2015)
Google Scholar
Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., Tesconi, M.: Fame for sale: efficient detection of fake twitter followers. Decis. Support Syst. 80, 56–71 (2015)
Article Google Scholar
Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., Tesconi, M.: The paradigm-shift of social spambots: evidence, theories, and tools for the arms race. In: WWW (2017)
Google Scholar
Cresci, S., Lillo, F., Regoli, D., Tardelli, S., Tesconi, M.: \$ fake: Evidence of spam and bot activity in stock microblogs on twitter. In: ICWSM (2018)
Google Scholar
Cresci, S., Lillo, F., Regoli, D., Tardelli, S., Tesconi, M.: Cashtag piggybacking: uncovering spam and bot activity in stock microblogs on twitter. ACM Trans. Web (TWEB) 13(2), 1–27 (2019)
Article Google Scholar
Cresci, S., Petrocchi, M., Spognardi, A., Tognazzi, S.: Better safe than sorry: an adversarial approach to improve social bot detection. In: WebSci (2019)
Google Scholar
Davis, C.A., Varol, O., Ferrara, E., Flammini, A., Menczer, F.: Botornot: a system to evaluate social bots. In: WWW (2016)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL (2019)
Google Scholar
Dhingra, B., Zhou, Z., Fitzpatrick, D., Muehl, M., Cohen, W.: Tweet2vec: character-based distributed representations for social media. In: ACL (2016)
Google Scholar
Ferrara, E.: Measuring social spam and the effect of bots on information diffusion in social media. In: Lehmann, S., Ahn, Y.-Y. (eds.) Complex Spreading Phenomena in Social Systems. CSS, pp. 229–255. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77332-2_13
Chapter Google Scholar
Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: ICML (2016)
Google Scholar
Gilani, Z., Farahbakhsh, R., Tyson, G., Wang, L., Crowcroft, J.: Of bots and humans (on twitter). In: ASONAM (2017)
Google Scholar
Howard, J., Ruder, S.: Universal language model fine-tuning for text classification. In: ACL, July 2018
Google Scholar
Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP (2014)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: ICLR (2014)
Google Scholar
Ko, R.: Social media is full of bots spreading Covid-19 anxiety. Don’t fall for it (2020). https://www.sciencealert.com/bots-are-causing-anxiety-by-spreading-coronavirus-misinformation
Kudugunta, S., Ferrara, E.: Deep neural networks for bot detection. Inf. Sci. 467, 312–322 (2018)
Article Google Scholar
Lee, K., Eoff, B.D., Caverlee, J.: Seven months with the devils: a long-term study of content polluters on twitter. In: ICWSM (2011)
Google Scholar
Ma, J., Gao, W., Wei, Z., Lu, Y., Wong, K.F.: Detect rumors using time series of social context information on microblogging websites. In: CIKM (2015)
Google Scholar
Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: ICML (2013)
Google Scholar
Mazza, M., Cresci, S., Avvenuti, M., Quattrociocchi, W., Tesconi, M.: RTbust: exploiting temporal patterns for botnet detection on twitter. In: WEBSCI (2019)
Google Scholar
Miller, Z., Dickinson, B., Deitrick, W., Hu, W., Wang, A.H.: Twitter spammer detection using data stream clustering. Inf. Sci. 260, 64–73 (2014)
Article Google Scholar
Morstatter, F., Wu, L., Nazer, T.H., Carley, K.M., Liu, H.: A new approach to bot detection: striking the balance between precision and recall. In: ASONAM (2016)
Google Scholar
Ruths, D.: The misinformation machine. Science 363(6425), 348 (2019)
Article Google Scholar
Shao, C., Ciampaglia, G.L., Varol, O., Yang, K.C., Flammini, A., Menczer, F.: The spread of low-credibility content by social bots. Nat. Commun. 9(1), 4787 (2018)
Article Google Scholar
Subrahmanian, V., et al.: The darpa twitter bot challenge. Computer 49(6), 38–46 (2016)
Article Google Scholar
Varol, O., Ferrara, E., Davis, C.A., Menczer, F., Flammini, A.: Online human-bot interactions: detection, estimation, and characterization. In: ICWSM (2017)
Google Scholar
Wang, Z., Oates, T.: Encoding time series as images for visual inspection and classification using tiled convolutional neural networks. In: AAAI-W (2015)
Google Scholar
Yang, C., Harkreader, R., Gu, G.: Empirical evaluation and new design for fighting evolving twitter spammers. IEEE Trans. Inf. Forens. Secur. 8(8), 1280–1293 (2013)
Article Google Scholar
Yang, K.C., Varol, O., Davis, C.A., Ferrara, E., Flammini, A., Menczer, F.: Arming the public with artificial intelligence to counter social bots. Hum. Behav. Emerg. Technol. 1(1), 48–61 (2019)
Article Google Scholar
Yang, K.C., Varol, O., Hui, P.M., Menczer, F.: Scalable and generalizable social bot detection through data selection. In: AAAI (2020)
Google Scholar
Young, L.Y.: The effect of moderator bots on abusive language use. In: ICPRAI (2018)
Google Scholar

Download references

Acknowledgements

This work was supported in part by NSF grant CNS-1755536, AWS Cloud Credits for Research, and Google Cloud.

Author information

Authors and Affiliations

Worcester Polytechnic Institute, Worcester, MA, 01609, USA
Guanyi Mou & Kyumin Lee

Authors

Guanyi Mou
View author publications
You can also search for this author in PubMed Google Scholar
Kyumin Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Guanyi Mou or Kyumin Lee .

Editor information

Editors and Affiliations

Max Planck Institute for Demographic Research, Rostock, Germany
Samin Aref
University of Sheffield, Sheffield, UK
Kalina Bontcheva
King’s College London, London, UK
Marco Braghieri
Umeå University, Umeå, Sweden
Frank Dignum
ISTI-CNR, Pisa, Italy
Fosca Giannotti
University of Pisa, Pisa, Italy
Francesco Grisolia
University of Pisa, Pisa, Italy
Dino Pedreschi

A Appendix

1.1 A.1 Account Status

As we keep the original information of Lee’11, Cresci’15 and Cresci’17, we checked the current status of those malicious bots as shown in Table 6. Overall 68.3% malicious bots are still alive on Twitter, some of which lived more than ten years. This fact indicates that there is a great room to improve the current Twitter’s bot detection system.

1.2 A.2 Source Dataset Details

We list the original user types that each dataset contains as follows:

Lee’11 [29]: content polluters, and legitimate users

Cresci’15 [13]: various kinds of fake accounts

Cresci’17 [14]: traditional & social spambots, and legitimate users

Cresci’18 [15, 16]: stock related bots, and legitimate users

RTBust’19 [32]: retweet bots, and legitimate users

Gilani’17 [23]: bots, and legitimate users

Varol’17 [38]: bots, and legitimate users

Midterm’18 [42]: political bots, and legitimate users

Botwiki’19 [42]: social and bots

Political’19 [41]: political bots

Pronbots’19 [41]: bots advertising scam sites

Vendor’19 [41]: fake followers

Verified’19 [41]: verified legitimate users

Celebrity’19 [41]: celebrity accounts (legitimate)

Botometer’19 [41]: bots and legitimate users

We grouped legitimate users, verified accounts and celebrity accounts as legitimate, while other types of accounts as malicious bots.

Table 6. Recent status of malicious accounts.

Full size table

1.3 A.3 Detailed Baseline Descriptions

Lee’11 [29]. Authors proposed handcrafted features extracted from user profiles, posting contents and the change of following/follower list over time. We built their best Random Forest model without the network features.

Kim’14 [25]. This is a convolutional text classification architecture that achieved comparable performance against state-of-the-art models. Its hyper-parameters are stable across different domains. We applied his work in using the tweets posted by each user for classifying the accounts.

Tweet2Vec’16 [20]. Tweet2Vec was proposed as a general-purpose tweet embedding framework, trained with neural networks for the hashtag prediction subtask. This work generates domain-specific feature representations of tweets. We constructed a bot detection model, following the proposed architecture, where the embedding layer is followed with fully connected layers.

Chavoshi’18 [11]. Authors proposed a method for mapping the posting timestamp pairs into 2D images to make better use of the temporal posting behavior information of each account. Convolutional neural networks can be applied for the downstream bot detection task.

Kudugunta’19 [28]. This is a framework using LSTM for learning content features and then combine them with several handcrafted features.

RTBust’19 [32]. RTBust is a framework using temporal retweet/tweet patterns for bot detection. Such a framework captures the information in tweet/retweet sequences and extracted features using the variational autoencoder (VAE) [26]. Then the feature embedding generated by the encoders is fed into HDBSCAN [8], an unsupervised clustering method. Outliers are treated as malicious bots.

Yang’19 [42]. Random Forest built on various authors’ proposed features.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mou, G., Lee, K. (2020). Malicious Bot Detection in Online Social Networks: Arming Handcrafted Features with Deep Learning. In: Aref, S., et al. Social Informatics. SocInfo 2020. Lecture Notes in Computer Science(), vol 12467. Springer, Cham. https://doi.org/10.1007/978-3-030-60975-7_17

Download citation

DOI: https://doi.org/10.1007/978-3-030-60975-7_17
Published: 07 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60974-0
Online ISBN: 978-3-030-60975-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Malicious Bot Detection in Online Social Networks: Arming Handcrafted Features with Deep Learning

Abstract

Access this chapter

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 Account Status

1.2 A.2 Source Dataset Details

1.3 A.3 Detailed Baseline Descriptions

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation