Skip to main content

Fuzzy String Matching Algorithm for Spam Detection in Twitter

  • Conference paper
  • First Online:
Security and Privacy (ISEA-ISAP 2019)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 939))

Included in the following conference series:

Abstract

In recent times one of the most popular Internet activity around the world is visiting online social websites. The number of users and time spent by users on these social networks is increasing exponentially. Moreover, users tend to rely on the trustworthiness of data present on these networks. But in wrong hands this trustworthiness can easily be exploited and used to spread spams. Users can easily be harassed by spam messages which waste time and can fool users to click on malicious links. Spam effects many different type of electronic communications including instant messaging, email and social networks. But due to open nature, huge user base and reliance on users for data, social networks are worst hit because of spams. To detect spams from the social networks it is desirable to find new unsupervised techniques which can save the training cost which is required in supervised techniques.

In this article we present an unsupervised, distributed and decentralized technique to detect and remove spams from social networks. We present a new technique which uses fuzzy based method to detect spams, which can detect spams even from a single message stream. To handle huge data in networks, we implement our technique to work on MapReduce platform.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Twitter: number of monthly active users 2010–2018, August 2018. https://www.statista.com/statistics/282087/number-of-monthly-active-twitter-users/

  2. Gao, H., Chen, Y., Lee, K., Palsetia, D., Choudhary, A.N.: Towards online spam filtering in social networks. In: NDSS (2012)

    Google Scholar 

  3. Grier, C., Thomas, K., Paxson, V., Zhang, M.: @ spam: the underground on 140 characters or less. In: Proceedings of the 17th ACM Conference on Computer and Communications Security, pp. 27–37. ACM (2010)

    Google Scholar 

  4. Thomas, K., Grier, C., Song, D., Paxson, V.: Suspended accounts in retrospect: an analysis of twitter spam. In: Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference, pp. 243–258. ACM (2011)

    Google Scholar 

  5. Barracuda labs 2010 annual security report. http://www.barracudalabs.com/research%5Fresources.html

  6. http://nakedsecurity.sophos.com/2011/01/19/sophos-security-threat/-report-2011-social-networking/

  7. Lee, K., Caverlee, J., Webb, S.: Uncovering social spammers: social honeypots+ machine learning. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 435–442. ACM (2010)

    Google Scholar 

  8. Benevenuto, F., Rodrigues, T., Almeida, V., Almeida, J., Gonçalves, M.: Detecting spammers and content promoters in online video social networks. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 620–627. ACM (2009)

    Google Scholar 

  9. Yu, H., Kaminsky, M., Gibbons, P.B., Flaxman, A.: Sybilguard: defending against sybil attacks via social networks. In: ACM SIGCOMM Computer Communication Review, vol. 36, pp. 267–278. ACM (2006)

    Google Scholar 

  10. Yu, H., Gibbons, P.B., Kaminsky, M., Xiao, F.: Sybillimit: a near-optimal social network defense against sybil attacks. In: IEEE Symposium on Security and Privacy, SP 2008, pp. 3–17. IEEE (2008)

    Google Scholar 

  11. Danezis, G., Mittal, P.: Sybilinfer: detecting sybil nodes using social networks. In: NDSS (2009)

    Google Scholar 

  12. Perez, C., Birregah, B., Layton, R., Lemercier, M., Watters, P.: REPLOT: retrieving profile links on twitter for suspicious networks detection. In: 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 1307–1314. IEEE (2013)

    Google Scholar 

  13. Liu, L., Jia, K.: Detecting spam in chinese microblogs-a study on sina weibo. In: 2012 Eighth International Conference on Computational Intelligence and Security (CIS), pp. 578–581. IEEE (2012)

    Google Scholar 

  14. Rahman, M.S., Huang, T.K., Madhyastha, H.V., Faloutsos, M.: Efficient and scalable socware detection in online social networks. In: USENIX Security Symposium, pp. 663–678 (2012)

    Google Scholar 

  15. Twitter usage statistics (2018). http://www.internetlivestats.com/twitter-statistics/

  16. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  17. Wickham, H., et al.: The split-apply-combine strategy for data analysis. J. Stat. Softw. 40(1), 1–29 (2011)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alok Kumar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kumar, A., Singh, M., Pais, A.R. (2019). Fuzzy String Matching Algorithm for Spam Detection in Twitter. In: Nandi, S., Jinwala, D., Singh, V., Laxmi, V., Gaur, M., Faruki, P. (eds) Security and Privacy. ISEA-ISAP 2019. Communications in Computer and Information Science, vol 939. Springer, Singapore. https://doi.org/10.1007/978-981-13-7561-3_21

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-7561-3_21

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-7560-6

  • Online ISBN: 978-981-13-7561-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics