Fuzzy String Matching Algorithm for Spam Detection in Twitter

Kumar, Alok; Singh, Maninder; Pais, Alwyn Roshan

doi:10.1007/978-981-13-7561-3_21

Alok Kumar¹³,
Maninder Singh¹⁴ &
Alwyn Roshan Pais¹³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 939))

Included in the following conference series:

International Conference on Security & Privacy

727 Accesses
12 Citations

Abstract

In recent times one of the most popular Internet activity around the world is visiting online social websites. The number of users and time spent by users on these social networks is increasing exponentially. Moreover, users tend to rely on the trustworthiness of data present on these networks. But in wrong hands this trustworthiness can easily be exploited and used to spread spams. Users can easily be harassed by spam messages which waste time and can fool users to click on malicious links. Spam effects many different type of electronic communications including instant messaging, email and social networks. But due to open nature, huge user base and reliance on users for data, social networks are worst hit because of spams. To detect spams from the social networks it is desirable to find new unsupervised techniques which can save the training cost which is required in supervised techniques.

In this article we present an unsupervised, distributed and decentralized technique to detect and remove spams from social networks. We present a new technique which uses fuzzy based method to detect spams, which can detect spams even from a single message stream. To handle huge data in networks, we implement our technique to work on MapReduce platform.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Twitter: number of monthly active users 2010–2018, August 2018. https://www.statista.com/statistics/282087/number-of-monthly-active-twitter-users/
Gao, H., Chen, Y., Lee, K., Palsetia, D., Choudhary, A.N.: Towards online spam filtering in social networks. In: NDSS (2012)
Google Scholar
Grier, C., Thomas, K., Paxson, V., Zhang, M.: @ spam: the underground on 140 characters or less. In: Proceedings of the 17th ACM Conference on Computer and Communications Security, pp. 27–37. ACM (2010)
Google Scholar
Thomas, K., Grier, C., Song, D., Paxson, V.: Suspended accounts in retrospect: an analysis of twitter spam. In: Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference, pp. 243–258. ACM (2011)
Google Scholar
Barracuda labs 2010 annual security report. http://www.barracudalabs.com/research%5Fresources.html
http://nakedsecurity.sophos.com/2011/01/19/sophos-security-threat/-report-2011-social-networking/
Lee, K., Caverlee, J., Webb, S.: Uncovering social spammers: social honeypots+ machine learning. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 435–442. ACM (2010)
Google Scholar
Benevenuto, F., Rodrigues, T., Almeida, V., Almeida, J., Gonçalves, M.: Detecting spammers and content promoters in online video social networks. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 620–627. ACM (2009)
Google Scholar
Yu, H., Kaminsky, M., Gibbons, P.B., Flaxman, A.: Sybilguard: defending against sybil attacks via social networks. In: ACM SIGCOMM Computer Communication Review, vol. 36, pp. 267–278. ACM (2006)
Google Scholar
Yu, H., Gibbons, P.B., Kaminsky, M., Xiao, F.: Sybillimit: a near-optimal social network defense against sybil attacks. In: IEEE Symposium on Security and Privacy, SP 2008, pp. 3–17. IEEE (2008)
Google Scholar
Danezis, G., Mittal, P.: Sybilinfer: detecting sybil nodes using social networks. In: NDSS (2009)
Google Scholar
Perez, C., Birregah, B., Layton, R., Lemercier, M., Watters, P.: REPLOT: retrieving profile links on twitter for suspicious networks detection. In: 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 1307–1314. IEEE (2013)
Google Scholar
Liu, L., Jia, K.: Detecting spam in chinese microblogs-a study on sina weibo. In: 2012 Eighth International Conference on Computational Intelligence and Security (CIS), pp. 578–581. IEEE (2012)
Google Scholar
Rahman, M.S., Huang, T.K., Madhyastha, H.V., Faloutsos, M.: Efficient and scalable socware detection in online social networks. In: USENIX Security Symposium, pp. 663–678 (2012)
Google Scholar
Twitter usage statistics (2018). http://www.internetlivestats.com/twitter-statistics/
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Wickham, H., et al.: The split-apply-combine strategy for data analysis. J. Stat. Softw. 40(1), 1–29 (2011)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Information Security Research Lab, Department of Computer Science and Engineering, National Institute of Technology Karnataka, Surathkal, Karnataka, India
Alok Kumar & Alwyn Roshan Pais
Department of Computer Science and Engineering, Thapar University, Patiala, Punjab, India
Maninder Singh

Authors

Alok Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Maninder Singh
View author publications
You can also search for this author in PubMed Google Scholar
Alwyn Roshan Pais
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alok Kumar .

Editor information

Editors and Affiliations

Indian Institute of Technology Guwahati, Guwahati, India
Sukumar Nandi
Indian Institute of Technology Jammu, Jammu, India
Devesh Jinwala
Indian Institute of Technology Bombay, Mumbai, India
Virendra Singh
Malaviya National Institute of Technology, Jaipur, India
Vijay Laxmi
Indian Institute of Technology Jammu, Jammu, Jammu and Kashmir, India
Manoj Singh Gaur
Department of Technical Education, Government of Gujarat, Rajkot, India
Parvez Faruki

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kumar, A., Singh, M., Pais, A.R. (2019). Fuzzy String Matching Algorithm for Spam Detection in Twitter. In: Nandi, S., Jinwala, D., Singh, V., Laxmi, V., Gaur, M., Faruki, P. (eds) Security and Privacy. ISEA-ISAP 2019. Communications in Computer and Information Science, vol 939. Springer, Singapore. https://doi.org/10.1007/978-981-13-7561-3_21

Download citation

DOI: https://doi.org/10.1007/978-981-13-7561-3_21
Published: 30 April 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-7560-6
Online ISBN: 978-981-13-7561-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics