Towards Generating Spam Queries for Retrieving Spam Accounts in Large-Scale Twitter Data

Washha, Mahdi; Qaroush, Aziz; Mezghani, Manel; Sedes, Florence

doi:10.1007/978-3-319-93375-7_18

Mahdi Washha¹⁰,
Aziz Qaroush¹¹,
Manel Mezghani¹⁰ &
…
Florence Sedes¹⁰

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 321))

Included in the following conference series:

International Conference on Enterprise Information Systems

Abstract

Twitter, as a top microblogging site, has became a valuable source of up-to-date and real-time information for a wide range of social-based researches and applications. Intuitively, the main factor of having an acceptable performance in those recherches and applications is the working and relying on information having an adequate quality. However, given the painful truth that Twitter has turned out a fertile environment for publishing noisy information in different forms. Consequently, maintaining the condition of high quality is a serious challenge, requiring great efforts from Twitter’s administrators and researchers to address the information quality issues. Social spam is a common type of the noisy information, which is created and circulated by ill-intentioned users, so-called social spammers. More precisely, they misuse all possible services provided by Twitter to propagate their spam content, leading to have a large information pollution flowing in Twitter’s network. As Twitter’s anti-spam mechanism is not both effective and immune towards the spam problem, enormous recherches have been dedicated to develop methods that detect and filter out spam accounts and tweets. However, these methods are not scalable when handling large-scale Twitter data. Indeed, as a mandatory step, the need for an additional information from Twitter’s servers, limited to a few number of requests per 15 min time window, is the main barrier for making these methods too effective, requiring months to handle large-scale Twitter data. Instead of inspecting every account existing in a given large-scale Twitter data in a sequential or randomly fashion, in this paper, we explore the applicability of information retrieval (IR) concept to retrieve a sub-set of accounts having high probability of being spam ones. Specifically, we introduce a design of an unsupervised method that partially processes a large-scale of tweets to generate spam queries related to account’s attributes. Then, the spam queries are issued to retrieve and rank the highly potential spam accounts existing in the given large-scale Twitter accounts. Our experimental evaluation shows the efficiency of generating spam queries from different attributes to retrieve spam accounts in terms of precision, recall, and normalized discounted cumulative gain at different ranks.

The work described in this paper is an extended version to the published work presented in [1].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Information Quality in Social Networks: A Collaborative Method for Detecting Spam Tweets in Trending Topics

Exploiting abused trending topics to identify spam campaigns in Twitter

Article 13 July 2016

Detecting Social Spammers in Colombia 2014 Presidential Election

Notes

References

Washha, M., Qaroush, A., Mezghani, M., Sèdes, F.: Information quality in social networks: predicting spammy naming patterns for retrieving Twitter spam accounts. In: Proceedings of the 19th International Conference on Enterprise Information Systems, ICEIS 2017, Porto, Portugal, 26–29 April 2017, vol. 2, pp. 610–622. SciTePress (2017)
Google Scholar
Benevenuto, F., Magno, G., Rodrigues, T., Almeida, V.: Detecting spammers on Twitter. In: Collaboration, Electronic messaging, Anti-abuse and Spam Conference (CEAS), p. 12 (2010)
Google Scholar
Wang, A.H.: Don’t follow me: spam detection in Twitter. In: Proceedings of the 2010 International Conference on Security and Cryptography (SECRYPT), pp. 1–10, July 2010
Google Scholar
Lee, K., Caverlee, J., Webb, S.: Uncovering social spammers: social honeypots + machine learning. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, pp. 435–442. ACM, New York (2010)
Google Scholar
McCord, M., Chuah, M.: Spam detection on Twitter using traditional classifiers. In: Calero, J.M.A., Yang, L.T., Mármol, F.G., García Villalba, L.J., Li, A.X., Wang, Y. (eds.) ATC 2011. LNCS, vol. 6906, pp. 175–186. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23496-5_13
Chapter Google Scholar
Stringhini, G., Kruegel, C., Vigna, G.: Detecting spammers on social networks. In: Proceedings of the 26th Annual Computer Security Applications Conference, ACSAC 2010, pp. 1–9. ACM, New York (2010)
Google Scholar
Yang, C., Harkreader, R.C., Gu, G.: Die free or live hard? Empirical evaluation and new design for fighting evolving Twitter spammers. In: Sommer, R., Balzarotti, D., Maier, G. (eds.) RAID 2011. LNCS, vol. 6961, pp. 318–337. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23644-0_17
Chapter Google Scholar
Amleshwaram, A.A., Reddy, N., Yadav, S., Gu, G., Yang, C.: CATS: characterizing automation of Twitter spammers. In: 2013 Fifth International Conference on Communication Systems and Networks (COMSNETS), pp. 1–10. IEEE (2013)
Google Scholar
Cao, C., Caverlee, J.: Detecting spam URLs in social media via behavioral analysis. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds.) ECIR 2015. LNCS, vol. 9022, pp. 703–714. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16354-3_77
Chapter Google Scholar
Chu, Z., Widjaja, I., Wang, H.: Detecting social spam campaigns on Twitter. In: Bao, F., Samarati, P., Zhou, J. (eds.) ACNS 2012. LNCS, vol. 7341, pp. 455–472. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31284-7_27
Chapter Google Scholar
Meda, C., Bisio, F., Gastaldo, P., Zunino, R.: A machine learning approach for Twitter spammers detection. In: 2014 International Carnahan Conference on Security Technology (ICCST), pp. 1–6. IEEE (2014)
Google Scholar
Santos, I., Miñambres-Marcos, I., Laorden, C., Galán-García, P., Santamaría-Ibirika, A., Bringas, P.G.: (2014) Twitter Content-based Spam Filtering. In: Herrero, Á., et al. (eds.) SOCO 2013-CISIS 2013-ICEUTE 2013. AISC, vol. 239, pp. 449–458. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-01854-6_46
Google Scholar
Martinez-Romo, J., Araujo, L.: Detecting malicious tweets in trending topics using a statistical analysis of language. Expert Syst. Appl. 40(8), 2992–3000 (2013)
Article Google Scholar
Kaplan, A.M., Haenlein, M.: The early bird catches the news: nine things you should know about micro-blogging. Bus. Horiz. 54(2), 105–113 (2011)
Article Google Scholar
Agarwal, N., Yiliyasi, Y.: Information quality challenges in social media. In: International Conference on Information Quality (ICIQ), pp. 234–248 (2010)
Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Book Google Scholar
Twitter: The Twitter rules (2016). https://support.twitter.com/articles/18311. Accessed 1 Mar 2016
Washha, M., Qaroush, A., Mezghani, M., Sedes, F.: Information quality in social networks: a collaborative method for detecting spam tweets in trending topics. In: Benferhat, S., Tabia, K., Ali, M. (eds.) IEA/AIE 2017. LNCS (LNAI), vol. 10351, pp. 211–223. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-60045-1_24
Chapter Google Scholar
Washha, M., Shilleh, D., Ghawadrah, Y., Jazi, R., Sèdes, F.: Information quality in online social networks: a fast unsupervised social spam detection method for trending topics. In: Proceedings of the 19th International Conference on Enterprise Information Systems, ICEIS 2017, Porto, Portugal, 26–29 April 2017, vol. 2, pp. 663–675. SciTePress (2017)
Google Scholar
Yang, C., Harkreader, R., Zhang, J., Shin, S., Gu, G.: Analyzing spammers’ social networks for fun and profit: a case study of cyber criminal ecosystem on Twitter. In: Proceedings of the 21st International Conference on World Wide Web, WWW 2012, pp. 71–80. ACM, New York (2012)
Google Scholar
Chu, Z., Gianvecchio, S., Wang, H., Jajodia, S.: Detecting automation of twitter accounts: are you a human, bot, or cyborg? IEEE Trans. Dependable Secure Comput. 9(6), 811–824 (2012)
Article Google Scholar
Hu, X., Tang, J., Liu, H.: Online social spammer detection. In: AAAI, pp. 59–65 (2014)
Google Scholar
Hu, X., Tang, J., Zhang, Y., Liu, H.: Social spammer detection in microblogging. In: IJCAI, vol. 13, pp. 2633–2639. Citeseer (2013)
Google Scholar
Washha, M., Qaroush, A., Sèdes, F.: Leveraging time for spammers detection on Twitter. In: Proceedings of the 8th International Conference on Management of Digital EcoSystems, pp. 109–116. ACM (2016)
Google Scholar
Yang, J., Leskovec, J.: Overlapping community detection at scale: a nonnegative matrix factorization approach. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, WSDM 2013, pp. 587–596. ACM, New York (2013)
Google Scholar
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)
Article MathSciNet Google Scholar
He, B.: Probability Ranking Principle, pp. 2168–2169. Springer, Boston (2009)
Google Scholar
Chen, C., Zhang, J., Xie, Y., Xiang, Y., Zhou, W., Hassan, M.M., AlElaiwi, A., Alrubaian, M.: A performance evaluation of machine learning-based streaming spam tweets detection. IEEE Trans. Comput. Soc. Syst. 2(3), 65–76 (2015)
Article Google Scholar

Download references

Author information

Authors and Affiliations

IRIT - Paul Sabatier University, Toulouse, France
Mahdi Washha, Manel Mezghani & Florence Sedes
Department of Electrical and Computer Engineering, Birzeit University, Ramallah, Palestine
Aziz Qaroush

Authors

Mahdi Washha
View author publications
You can also search for this author in PubMed Google Scholar
Aziz Qaroush
View author publications
You can also search for this author in PubMed Google Scholar
Manel Mezghani
View author publications
You can also search for this author in PubMed Google Scholar
Florence Sedes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mahdi Washha .

Editor information

Editors and Affiliations

MODESTE/ESEO, Angers, France
Slimane Hammoudi
Warsaw University of Technology, Warsaw, Poland
Michał Śmiałek
MODESTE/ESEO, Angers, France
Olivier Camp
INSTICC, Polytechnic Institute of Setúbal, Setúbal, Poland
Joaquim Filipe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Washha, M., Qaroush, A., Mezghani, M., Sedes, F. (2018). Towards Generating Spam Queries for Retrieving Spam Accounts in Large-Scale Twitter Data. In: Hammoudi, S., Śmiałek, M., Camp, O., Filipe, J. (eds) Enterprise Information Systems. ICEIS 2017. Lecture Notes in Business Information Processing, vol 321. Springer, Cham. https://doi.org/10.1007/978-3-319-93375-7_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-93375-7_18
Published: 16 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93374-0
Online ISBN: 978-3-319-93375-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics