Skip to main content

Towards Generating Spam Queries for Retrieving Spam Accounts in Large-Scale Twitter Data

  • Conference paper
  • First Online:
Enterprise Information Systems (ICEIS 2017)

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 321))

Included in the following conference series:

  • 819 Accesses

Abstract

Twitter, as a top microblogging site, has became a valuable source of up-to-date and real-time information for a wide range of social-based researches and applications. Intuitively, the main factor of having an acceptable performance in those recherches and applications is the working and relying on information having an adequate quality. However, given the painful truth that Twitter has turned out a fertile environment for publishing noisy information in different forms. Consequently, maintaining the condition of high quality is a serious challenge, requiring great efforts from Twitter’s administrators and researchers to address the information quality issues. Social spam is a common type of the noisy information, which is created and circulated by ill-intentioned users, so-called social spammers. More precisely, they misuse all possible services provided by Twitter to propagate their spam content, leading to have a large information pollution flowing in Twitter’s network. As Twitter’s anti-spam mechanism is not both effective and immune towards the spam problem, enormous recherches have been dedicated to develop methods that detect and filter out spam accounts and tweets. However, these methods are not scalable when handling large-scale Twitter data. Indeed, as a mandatory step, the need for an additional information from Twitter’s servers, limited to a few number of requests per 15 min time window, is the main barrier for making these methods too effective, requiring months to handle large-scale Twitter data. Instead of inspecting every account existing in a given large-scale Twitter data in a sequential or randomly fashion, in this paper, we explore the applicability of information retrieval (IR) concept to retrieve a sub-set of accounts having high probability of being spam ones. Specifically, we introduce a design of an unsupervised method that partially processes a large-scale of tweets to generate spam queries related to account’s attributes. Then, the spam queries are issued to retrieve and rank the highly potential spam accounts existing in the given large-scale Twitter accounts. Our experimental evaluation shows the efficiency of generating spam queries from different attributes to retrieve spam accounts in terms of precision, recall, and normalized discounted cumulative gain at different ranks.

The work described in this paper is an extended version to the published work presented in [1].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://dev.twitter.com/rest/public.

  2. 2.

    http://web.itu.edu.tr/sgunduz/courses/mikroisl/ascii.html.

References

  1. Washha, M., Qaroush, A., Mezghani, M., Sèdes, F.: Information quality in social networks: predicting spammy naming patterns for retrieving Twitter spam accounts. In: Proceedings of the 19th International Conference on Enterprise Information Systems, ICEIS 2017, Porto, Portugal, 26–29 April 2017, vol. 2, pp. 610–622. SciTePress (2017)

    Google Scholar 

  2. Benevenuto, F., Magno, G., Rodrigues, T., Almeida, V.: Detecting spammers on Twitter. In: Collaboration, Electronic messaging, Anti-abuse and Spam Conference (CEAS), p. 12 (2010)

    Google Scholar 

  3. Wang, A.H.: Don’t follow me: spam detection in Twitter. In: Proceedings of the 2010 International Conference on Security and Cryptography (SECRYPT), pp. 1–10, July 2010

    Google Scholar 

  4. Lee, K., Caverlee, J., Webb, S.: Uncovering social spammers: social honeypots + machine learning. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, pp. 435–442. ACM, New York (2010)

    Google Scholar 

  5. McCord, M., Chuah, M.: Spam detection on Twitter using traditional classifiers. In: Calero, J.M.A., Yang, L.T., Mármol, F.G., García Villalba, L.J., Li, A.X., Wang, Y. (eds.) ATC 2011. LNCS, vol. 6906, pp. 175–186. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23496-5_13

    Chapter  Google Scholar 

  6. Stringhini, G., Kruegel, C., Vigna, G.: Detecting spammers on social networks. In: Proceedings of the 26th Annual Computer Security Applications Conference, ACSAC 2010, pp. 1–9. ACM, New York (2010)

    Google Scholar 

  7. Yang, C., Harkreader, R.C., Gu, G.: Die free or live hard? Empirical evaluation and new design for fighting evolving Twitter spammers. In: Sommer, R., Balzarotti, D., Maier, G. (eds.) RAID 2011. LNCS, vol. 6961, pp. 318–337. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23644-0_17

    Chapter  Google Scholar 

  8. Amleshwaram, A.A., Reddy, N., Yadav, S., Gu, G., Yang, C.: CATS: characterizing automation of Twitter spammers. In: 2013 Fifth International Conference on Communication Systems and Networks (COMSNETS), pp. 1–10. IEEE (2013)

    Google Scholar 

  9. Cao, C., Caverlee, J.: Detecting spam URLs in social media via behavioral analysis. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds.) ECIR 2015. LNCS, vol. 9022, pp. 703–714. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16354-3_77

    Chapter  Google Scholar 

  10. Chu, Z., Widjaja, I., Wang, H.: Detecting social spam campaigns on Twitter. In: Bao, F., Samarati, P., Zhou, J. (eds.) ACNS 2012. LNCS, vol. 7341, pp. 455–472. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31284-7_27

    Chapter  Google Scholar 

  11. Meda, C., Bisio, F., Gastaldo, P., Zunino, R.: A machine learning approach for Twitter spammers detection. In: 2014 International Carnahan Conference on Security Technology (ICCST), pp. 1–6. IEEE (2014)

    Google Scholar 

  12. Santos, I., Miñambres-Marcos, I., Laorden, C., Galán-García, P., Santamaría-Ibirika, A., Bringas, P.G.: (2014) Twitter Content-based Spam Filtering. In: Herrero, Á., et al. (eds.) SOCO 2013-CISIS 2013-ICEUTE 2013. AISC, vol. 239, pp. 449–458. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-01854-6_46

    Google Scholar 

  13. Martinez-Romo, J., Araujo, L.: Detecting malicious tweets in trending topics using a statistical analysis of language. Expert Syst. Appl. 40(8), 2992–3000 (2013)

    Article  Google Scholar 

  14. Kaplan, A.M., Haenlein, M.: The early bird catches the news: nine things you should know about micro-blogging. Bus. Horiz. 54(2), 105–113 (2011)

    Article  Google Scholar 

  15. Agarwal, N., Yiliyasi, Y.: Information quality challenges in social media. In: International Conference on Information Quality (ICIQ), pp. 234–248 (2010)

    Google Scholar 

  16. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)

    Book  Google Scholar 

  17. Twitter: The Twitter rules (2016). https://support.twitter.com/articles/18311. Accessed 1 Mar 2016

  18. Washha, M., Qaroush, A., Mezghani, M., Sedes, F.: Information quality in social networks: a collaborative method for detecting spam tweets in trending topics. In: Benferhat, S., Tabia, K., Ali, M. (eds.) IEA/AIE 2017. LNCS (LNAI), vol. 10351, pp. 211–223. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-60045-1_24

    Chapter  Google Scholar 

  19. Washha, M., Shilleh, D., Ghawadrah, Y., Jazi, R., Sèdes, F.: Information quality in online social networks: a fast unsupervised social spam detection method for trending topics. In: Proceedings of the 19th International Conference on Enterprise Information Systems, ICEIS 2017, Porto, Portugal, 26–29 April 2017, vol. 2, pp. 663–675. SciTePress (2017)

    Google Scholar 

  20. Yang, C., Harkreader, R., Zhang, J., Shin, S., Gu, G.: Analyzing spammers’ social networks for fun and profit: a case study of cyber criminal ecosystem on Twitter. In: Proceedings of the 21st International Conference on World Wide Web, WWW 2012, pp. 71–80. ACM, New York (2012)

    Google Scholar 

  21. Chu, Z., Gianvecchio, S., Wang, H., Jajodia, S.: Detecting automation of twitter accounts: are you a human, bot, or cyborg? IEEE Trans. Dependable Secure Comput. 9(6), 811–824 (2012)

    Article  Google Scholar 

  22. Hu, X., Tang, J., Liu, H.: Online social spammer detection. In: AAAI, pp. 59–65 (2014)

    Google Scholar 

  23. Hu, X., Tang, J., Zhang, Y., Liu, H.: Social spammer detection in microblogging. In: IJCAI, vol. 13, pp. 2633–2639. Citeseer (2013)

    Google Scholar 

  24. Washha, M., Qaroush, A., Sèdes, F.: Leveraging time for spammers detection on Twitter. In: Proceedings of the 8th International Conference on Management of Digital EcoSystems, pp. 109–116. ACM (2016)

    Google Scholar 

  25. Yang, J., Leskovec, J.: Overlapping community detection at scale: a nonnegative matrix factorization approach. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, WSDM 2013, pp. 587–596. ACM, New York (2013)

    Google Scholar 

  26. Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)

    Article  MathSciNet  Google Scholar 

  27. He, B.: Probability Ranking Principle, pp. 2168–2169. Springer, Boston (2009)

    Google Scholar 

  28. Chen, C., Zhang, J., Xie, Y., Xiang, Y., Zhou, W., Hassan, M.M., AlElaiwi, A., Alrubaian, M.: A performance evaluation of machine learning-based streaming spam tweets detection. IEEE Trans. Comput. Soc. Syst. 2(3), 65–76 (2015)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mahdi Washha .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Washha, M., Qaroush, A., Mezghani, M., Sedes, F. (2018). Towards Generating Spam Queries for Retrieving Spam Accounts in Large-Scale Twitter Data. In: Hammoudi, S., Śmiałek, M., Camp, O., Filipe, J. (eds) Enterprise Information Systems. ICEIS 2017. Lecture Notes in Business Information Processing, vol 321. Springer, Cham. https://doi.org/10.1007/978-3-319-93375-7_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-93375-7_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-93374-0

  • Online ISBN: 978-3-319-93375-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics