Abstract
Online social networks (OSNs) have long been suffering from various types of malicious bots (e.g., spammers, fake followers, social bots, and content polluters). Recent studies show that they have also been actively involved in delivering hate speeches and disseminating misinformation. Over several years, researchers have proposed multiple approaches to identify some types of them to lower their impact on the OSNs. However, their strategies mostly focused on handcrafted features to capture characteristics of malicious users, or their deep learning approaches may only work under certain situations (e.g., under the dense retweets/sharing behavior). To overcome the limitation of the prior work, in this paper, we propose a novel framework that incorporates handcrafted features and automatically learned features by deep learning methods from various perspectives. It automatically makes the balance between them to make the final prediction toward detecting malicious bots. In particular, we (i) combine publicly available 15 Twitter user datasets and categorize these accounts into two groups (i.e., legitimate accounts and malicious bot accounts); and (ii) propose a deep learning framework that jointly learns various features and detects malicious accounts. Our experimental results show that our proposed model outperforms 7 state-of-the-art methods, achieving 0.901 accuracy. Our ablation study shows that all types of our features positively contribute to enhancing the model performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We use terms user and account, interchangeably.
- 2.
- 3.
- 4.
- 5.
References
Adewole, K.S., Anuar, N.B., Kamsin, A., Varathan, K.D., Razak, S.A.: Malicious accounts: dark of the social networks. J. Netw. Comput. Appl. 79, 41–67 (2017)
Albadi, N., Kurdi, M., Mishra, S.: Hateful people or hateful bots? detection and characterization of bots spreading religious hatred in Arabic social media. CSCW (2019)
Alfifi, M., Caverlee, J.: Badly evolved? Exploring long-surviving suspicious users on twitter. In: Ciampaglia, G.L., Mashhadi, A., Yasseri, T. (eds.) SocInfo 2017. LNCS, vol. 10539, pp. 218–233. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67217-5_14
Alfifi, M., Kaghazgaran, P., Caverlee, J., Morstatter, F.: Measuring the impact of ISIS social media strategy. In: MIS2 (2018)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015)
Beskow, D.M., Carley, K.M.: Bot conversations are different: leveraging network metrics for bot detection in twitter. In: ASONAM (2018)
Bhat, S.Y., Abulaish, M.: Community-based features for identifying spammers in online social networks. In: ASONAM (2013)
Campello, R.J.G.B., Moulavi, D., Sander, J.: Density-based clustering based on hierarchical density estimates. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013. LNCS (LNAI), vol. 7819, pp. 160–172. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37456-2_14
Cer, D., et al.: Universal sentence encoder for English. In: EMNLP (2018)
Chavoshi, N., Hamooni, H., Mueen, A.: Temporal patterns in bot activities. In: WWW (2017)
Chavoshi, N., Mueen, A.: Model bots, not humans on social media. In: ASONAM (2018)
Conroy, N.J., Rubin, V.L., Chen, Y.: Automatic deception detection: methods for finding fake news. In: Proceedings of the 78th ASIS&T Annual Meeting (2015)
Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., Tesconi, M.: Fame for sale: efficient detection of fake twitter followers. Decis. Support Syst. 80, 56–71 (2015)
Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., Tesconi, M.: The paradigm-shift of social spambots: evidence, theories, and tools for the arms race. In: WWW (2017)
Cresci, S., Lillo, F., Regoli, D., Tardelli, S., Tesconi, M.: \$ fake: Evidence of spam and bot activity in stock microblogs on twitter. In: ICWSM (2018)
Cresci, S., Lillo, F., Regoli, D., Tardelli, S., Tesconi, M.: Cashtag piggybacking: uncovering spam and bot activity in stock microblogs on twitter. ACM Trans. Web (TWEB) 13(2), 1–27 (2019)
Cresci, S., Petrocchi, M., Spognardi, A., Tognazzi, S.: Better safe than sorry: an adversarial approach to improve social bot detection. In: WebSci (2019)
Davis, C.A., Varol, O., Ferrara, E., Flammini, A., Menczer, F.: Botornot: a system to evaluate social bots. In: WWW (2016)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL (2019)
Dhingra, B., Zhou, Z., Fitzpatrick, D., Muehl, M., Cohen, W.: Tweet2vec: character-based distributed representations for social media. In: ACL (2016)
Ferrara, E.: Measuring social spam and the effect of bots on information diffusion in social media. In: Lehmann, S., Ahn, Y.-Y. (eds.) Complex Spreading Phenomena in Social Systems. CSS, pp. 229–255. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77332-2_13
Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: ICML (2016)
Gilani, Z., Farahbakhsh, R., Tyson, G., Wang, L., Crowcroft, J.: Of bots and humans (on twitter). In: ASONAM (2017)
Howard, J., Ruder, S.: Universal language model fine-tuning for text classification. In: ACL, July 2018
Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP (2014)
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: ICLR (2014)
Ko, R.: Social media is full of bots spreading Covid-19 anxiety. Don’t fall for it (2020). https://www.sciencealert.com/bots-are-causing-anxiety-by-spreading-coronavirus-misinformation
Kudugunta, S., Ferrara, E.: Deep neural networks for bot detection. Inf. Sci. 467, 312–322 (2018)
Lee, K., Eoff, B.D., Caverlee, J.: Seven months with the devils: a long-term study of content polluters on twitter. In: ICWSM (2011)
Ma, J., Gao, W., Wei, Z., Lu, Y., Wong, K.F.: Detect rumors using time series of social context information on microblogging websites. In: CIKM (2015)
Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: ICML (2013)
Mazza, M., Cresci, S., Avvenuti, M., Quattrociocchi, W., Tesconi, M.: RTbust: exploiting temporal patterns for botnet detection on twitter. In: WEBSCI (2019)
Miller, Z., Dickinson, B., Deitrick, W., Hu, W., Wang, A.H.: Twitter spammer detection using data stream clustering. Inf. Sci. 260, 64–73 (2014)
Morstatter, F., Wu, L., Nazer, T.H., Carley, K.M., Liu, H.: A new approach to bot detection: striking the balance between precision and recall. In: ASONAM (2016)
Ruths, D.: The misinformation machine. Science 363(6425), 348 (2019)
Shao, C., Ciampaglia, G.L., Varol, O., Yang, K.C., Flammini, A., Menczer, F.: The spread of low-credibility content by social bots. Nat. Commun. 9(1), 4787 (2018)
Subrahmanian, V., et al.: The darpa twitter bot challenge. Computer 49(6), 38–46 (2016)
Varol, O., Ferrara, E., Davis, C.A., Menczer, F., Flammini, A.: Online human-bot interactions: detection, estimation, and characterization. In: ICWSM (2017)
Wang, Z., Oates, T.: Encoding time series as images for visual inspection and classification using tiled convolutional neural networks. In: AAAI-W (2015)
Yang, C., Harkreader, R., Gu, G.: Empirical evaluation and new design for fighting evolving twitter spammers. IEEE Trans. Inf. Forens. Secur. 8(8), 1280–1293 (2013)
Yang, K.C., Varol, O., Davis, C.A., Ferrara, E., Flammini, A., Menczer, F.: Arming the public with artificial intelligence to counter social bots. Hum. Behav. Emerg. Technol. 1(1), 48–61 (2019)
Yang, K.C., Varol, O., Hui, P.M., Menczer, F.: Scalable and generalizable social bot detection through data selection. In: AAAI (2020)
Young, L.Y.: The effect of moderator bots on abusive language use. In: ICPRAI (2018)
Acknowledgements
This work was supported in part by NSF grant CNS-1755536, AWS Cloud Credits for Research, and Google Cloud.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
A Appendix
A Appendix
1.1 A.1 Account Status
As we keep the original information of Lee’11, Cresci’15 and Cresci’17, we checked the current status of those malicious bots as shown in Table 6. Overall 68.3% malicious bots are still alive on Twitter, some of which lived more than ten years. This fact indicates that there is a great room to improve the current Twitter’s bot detection system.
1.2 A.2 Source Dataset Details
We list the original user types that each dataset contains as follows:
Lee’11 [29]: content polluters, and legitimate users
Cresci’15 [13]: various kinds of fake accounts
Cresci’17 [14]: traditional & social spambots, and legitimate users
Cresci’18 [15, 16]: stock related bots, and legitimate users
RTBust’19 [32]: retweet bots, and legitimate users
Gilani’17 [23]: bots, and legitimate users
Varol’17 [38]: bots, and legitimate users
Midterm’18 [42]: political bots, and legitimate users
Botwiki’19 [42]: social and bots
Political’19 [41]: political bots
Pronbots’19 [41]: bots advertising scam sites
Vendor’19 [41]: fake followers
Verified’19 [41]: verified legitimate users
Celebrity’19 [41]: celebrity accounts (legitimate)
Botometer’19 [41]: bots and legitimate users
We grouped legitimate users, verified accounts and celebrity accounts as legitimate, while other types of accounts as malicious bots.
1.3 A.3 Detailed Baseline Descriptions
Lee’11 [29]. Authors proposed handcrafted features extracted from user profiles, posting contents and the change of following/follower list over time. We built their best Random Forest model without the network features.
Kim’14 [25]. This is a convolutional text classification architecture that achieved comparable performance against state-of-the-art models. Its hyper-parameters are stable across different domains. We applied his work in using the tweets posted by each user for classifying the accounts.
Tweet2Vec’16 [20]. Tweet2Vec was proposed as a general-purpose tweet embedding framework, trained with neural networks for the hashtag prediction subtask. This work generates domain-specific feature representations of tweets. We constructed a bot detection model, following the proposed architecture, where the embedding layer is followed with fully connected layers.
Chavoshi’18 [11]. Authors proposed a method for mapping the posting timestamp pairs into 2D images to make better use of the temporal posting behavior information of each account. Convolutional neural networks can be applied for the downstream bot detection task.
Kudugunta’19 [28]. This is a framework using LSTM for learning content features and then combine them with several handcrafted features.
RTBust’19 [32]. RTBust is a framework using temporal retweet/tweet patterns for bot detection. Such a framework captures the information in tweet/retweet sequences and extracted features using the variational autoencoder (VAE) [26]. Then the feature embedding generated by the encoders is fed into HDBSCAN [8], an unsupervised clustering method. Outliers are treated as malicious bots.
Yang’19 [42]. Random Forest built on various authors’ proposed features.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Mou, G., Lee, K. (2020). Malicious Bot Detection in Online Social Networks: Arming Handcrafted Features with Deep Learning. In: Aref, S., et al. Social Informatics. SocInfo 2020. Lecture Notes in Computer Science(), vol 12467. Springer, Cham. https://doi.org/10.1007/978-3-030-60975-7_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-60975-7_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60974-0
Online ISBN: 978-3-030-60975-7
eBook Packages: Computer ScienceComputer Science (R0)