Abstract
Social spam is a huge and complicated problem plaguing social networking sites in several ways. This includes posts, reviews or blogs containing product promotions and contests, adult content and general spam. It has been found that social media websites such as Twitter is also acting as a distributor of pornographic content, although it is considered against their own stated policy. In this paper, we have reviewed the case of Twitter and found that spammers contributing to pornographic content follow legitimate Twitter users and send URLs that link users to pornographic sites. Behavioral analysis of such type of spammers has been conducted using graph-based as well as content-based information fetched using simple text operators to study their characteristics. In the present study, about 74,000 tweets containing pornographic adult content posted by around 18,000 users have been collected and analyzed. The analysis shows that the users posting pornographic content fulfill the characteristics of spammers as stated by the rules and guidelines of Twitter. It has been observed that the illegitimate use of social media for spreading social spam has been spreading at a fast pace, with the network companies turning a blind eye toward this growing problem. Clearly, there is an immense requirement to build an effective solution to remove objectionable and slanderous content as stated above from social networking websites to promote and protect public decency and the welfare of children and adults. It is also essential so as to enhance public experience of genuine users using social media and protect them from harm to their public identity on the World Wide Web. Further in this paper, classification of pornographic spammers and genuine users has also been performed using machine learning technique. Experimental results show that Random Forest classifier is able to predict pornographic spammers with a reasonably high accuracy of 91.96 %. To the best of our knowledge, this is the first attempt to analyze and categorize the behavior of pornographic users in Twitter as spammers. So far, the work has been done for identifying spammers but they are not specifically targeting pornographic spammers.















Similar content being viewed by others
References
AA419 List. http://wiki.aa419.org/index.php/Main_Page. Last Accessed on June 2015
AB List. http://spamvertised.abusebutler.com/. Last Accessed on June 2015
Abozinadah EA, Mbaziira AV, Jones JH Jr (2015) Detection of abusive accounts with Arabic tweets. Int J Knowl Eng 1(2):113–119. doi:10.7763/IJKE.2015.V1.19
AdaBoostM1 Classifier. https://en.wikipedia.org/wiki/AdaBoost. Last Accessed on June 2015
Ahmed F, Abulaish M (2013) A generic statistical approach for spam detection in online social networks. Comput Commun J. doi:10.1016/j.comcom.2013.04.004
Bayes Net Classifier. https://en.wikipedia.org/wiki/Bayesian_network. Last Accessed on June 2015
Benevenuto F, Rodrigues T, Almeida V, Almeida J, Goncalves M (2009). Detecting spammers and content promoters in online video social networks. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval (New York, NY, USA, 2009), SIGIR ‘09. ACM, pp 620–627. doi:10.1145/1571941.1572047
Benevenuto F, Magno G, Rodrigues T, Almeida V (2010). Detecting spammers on Twitter. In: Proceedings of seventh annual collaboration, electronic messaging, anti abuse and spam conference (CEAS 2010), Washington, US, 2010. doi:10.1.1.297.5340
Breiman L (2001) Random forests. Mach Learn 45(1):5–32. doi:10.1023/A:1010933404324
Cheng H, Xing X, Liu X, Lv Q (2015) ISC: an iterative social based classifier for adult account detection on Twitter. IEEE Trans Knowl Data Eng 27(4):1045–1056. doi:10.1109/TKDE.2014.2357012
Chu Z, Widjaja I, Wang H (2012). Detecting social spam campaigns on twitter. In: Applied cryptography and network security, lecture notes in computer science, vol 7341. Springer, pp 455–472. doi:10.1007/978-3-642-31284-7_27
Definition of Social networking sites. http://www.techopedia.com/definition/4956/social-networking-site-sns. Last Accessed on May 2015
Edwards G, Guy A (2015). Connections between Twitter Spammer Categories. In: 5th Workshop on making sense of microposts @WWW2015. May 18th, 2015, Florence, Italy, pp 22–25
Fire M, Katz G, Elovici Y (2012). Strangers intrusion detection—detecting spammers and fake profiles in social networks based on topology anomalies. Technical report
Fire M, Kagan D, Elyashar A, Elovici Y (2014) Friend or Foe? Fake profile identification in online social networks. Soc Netw Anal Min J. doi:10.1007/s13278-014-0194-4
Flores M, Kuzmanovic A (2013). Searching for spam: detecting fraudulent accounts via web search. In: Lecture notes in computer science (LNCS), vol 7799. Springer, Berlin, pp 208–217. doi:10.1007/978-3-642-36516-4_21
Gianvecchio S, Haining W, Jajodia S (2012) Detecting automation of Twitter accounts: Are you a human, bot, or cyborg? IEEE Trans Depend Secure Comput 9(6):811–824. doi:10.1109/TDSC.2012.75
Grier C, Thomas K, Paxson V, Zhang M (2010) @spam: the underground on 140 characters or less. In: Proceedings of the 17th ACM conference on computer and communications security, October 04–08, 2010, Chicago, Illinois, USA. doi:10.1145/1866307.1866311
Hammami M, Chahir Y, Chen L (2006) Webguard: a web filtering engine combining textual, structural, and visual content-based analysis. IEEE Trans Knowl Data Eng 18(2):272–284. doi:10.1109/TKDE.2006.34
Hansen D, Shneiderman B, Smith M (2009) Analyzing social media networks: learning by doing with NodeXL
Hepple M, Ireson N, Allegrini P, Marchi S, Montemagni S, Maria J, Hidalgo G (2004) NLP-enhanced content filtering within the POESIA project. In: Proceedings of the 4th international conference on language resources and evaluation (LREC)
J48 Classifier. http://www.d.umn.edu/~padhy005/Chapter5.html. Last Accessed on June 2015
JP List. http://www.joewein.de/sw/blacklist.htm. Last Accessed on June 2015
Kakumanu P, Makrogiannis S, Bourbakis N (2007) A survey of skin-color modeling and detection methods. Pattern Recogn 40(3):1106–1122. doi:10.1016/j.patcog.2006.06.010
Kumar S, Morstatter F, Liu H (2014) Twitter data analytics. Springer, New York. doi:10.1007/978-1-4614-9372-3
Lee K, Caverlee J, Webb S (2010) Uncovering social spammers: social honeypots + machine learning. In: Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval, July 19–23, 2010, Geneva, Switzerland. doi:10.1145/1835449.1835522
Letter to Twitter to stop porn spam. http://michellerafter.com/2009/07/08/an-open-letter-to-twitter-stop-the-porn-spam/. Last Accessed on May 2015
Li B, Chen L, Zhu X, Zhang C (2013) Noisy but non-malicious user detection in social recommender systems. World Wide Web 16(5):677–699. doi:10.1007/s11280-012-0161-9
Logistic Regression Classifier. http://www.stat.cmu.edu/~cshalizi/uADA/12/lectures/ch12.pdf. Last Accessed on June 2015
Lopes APB, de Avila SEF, Peixoto ANA, Oliveira RS, de Araújo A (2009) A bag-of-features approach based on hue-sift descriptor for nude detection. In Proceedings of the 17th European signal processing conference, Glasgow, Scotland, pp 1552–1556
MW List. http://www.malwaredomainlist.com/. Last Accessed on June 2015
New Post Collecting Twitter search results—how far back? http://nodexl1.rssing.com/chan-8304019/all_p41.html. Last Accessed on May 2015
NodeXL. http://nodexl.codeplex.com/. Last Accessed on April 2015
Paid Porn Content. https://securelist.com/threats/adult-content-spam/. Last Accessed on May 2015
PH List. http://www.phishtank.com/. Last Accessed on June 2015
Porn on Twitter. http://www.businessinsider.in/Twitter-has-a-porn-problem-and-advertisers-are-starting-to-worry-about-it/articleshow/47176471.cms. Last Accessed on May 2015
Random Forest Classifier. https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm. Last Accessed on June 2015
Rapidminer tool. https://rapidminer.com/. Last Accessed on April 2016
REST API rate limits (2015) https://dev.twitter.com/rest/public/rate-limits#main-content. Last Accessed on April 2015
SC List. https://www.spamcop.net/. Last Accessed on June 2015
Segal MR (2003) Machine learning benchmarks and random forest regression. Technical report, center for bioinformatics and molecular biostatistics. April 14, 2003. pp 1–14
Shocking Truth Behind Twitter’s 10 Million Porn Accounts (2015) http://www.fightthenewdrug.org/the-shocking-truth-behind-twitters-10-million-porn-accounts/. Last Accessed on May 2015
Statistics of Social Networking Sites (2015) http://www.ebizmba.com/articles/social-networking-websites. Last Accessed on May 2015
Stringhini G, Kruegel C, Vigna G (2010). Detecting spammers on social networks. In Proceedings of the 26th annual computer security applications conference (ACSAC’10), University of California, Santa Barbara, Austin, Texas USA, ACM, pp 1–9, 2010. doi:10.1145/1920261.1920263
SURBL List. http://www.surbl.org/. Last Accessed on June 2015
Survey regarding Spam on Twitter (2015) http://marketingland.com/report-nearly-10-of-twitter-is-spam-brands-have-it-the-worst-124429. Last Accessed on June 2015
Tweepy Library for Python. https://pypi.python.org/pypi/tweepy. Last Accessed on Aug 2015
Twitter target of adult spam. http://netguide.co.nz/story/myspace-twitter-targets-for-adult-spam/. Last Accessed on June 2015
Twitter REST API. https://dev.twitter.com/rest/public. Last Accessed on April 2015
Twitter Rules. https://support.twitter.com/articles/18311-the-twitter-rules. Last Accessed on June 2015
Types of Twitter spam. http://www.publicrelationsprincess.com/2010/03/top-ten-types-of-twitter-spam.html. Last Accessed on May 2015
Update in Twitter rules to ban revenge porn (2015) http://www.washingtonpost.com/blogs/the-switch/wp/2015/03/11/twitter-updates-its-rules-to-specifically-ban-revenge-porn/. Last Accessed on May 2015
URIBL List. http://uribl.com/. Last Accessed on June 2015
Verma M, Bansal D, Sofat S (2014) Techniques to detect spammers in twitter—a survey. IJCA 85(10):27–32. doi:10.5120/14877-3279
Wang HA (2010). Don’t follow me: spam detection in twitter. In: Proceedings of the 2010 international conference on security and cryptography (SECRYPT), IEEE, pp 1–10. doi:10.5220/0002996201420151
Weka—data mining open source program. http://www.cs.waikato.ac.nz/ml/weka/. Last Accessed on June 2015
Will porn spam keep Twitter marketers away? http://www.wedowebcontent.com/?s=will+porn+spam+keep+twitter+marketers+away. Last Accessed on May 2015
WS List. http://spamassassin.apache.org/. Last Accessed on June 2015
Xing X, Liang Yu-Li, Cheng H, Dang J, Huang S, Han R, Liu X, Lv Q, Mishra S (2011) SafeVchat: detecting obscene content and misbehaving users in online video chat services. In: Proceedings of the 20th international conference on World Wide Web, March 28–April 01, 2011, Hyderabad, India. doi:10.1145/1963405.1963501
Yang C, Chandler HR, Gu G (2013) Empirical evaluation and new design for fighting evolving twitter spammers. IEEE Trans Inf Forensics Secur 8(8):2013. doi:10.1109/TIFS.2013.2267732
Acknowledgments
We would like to acknowledge the contribution of Mr. Agnit Mukhopadhyay, Undergraduate Student (Aerospace Engineering Department), PEC University of Technology, India, for his critical review of the manuscript.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Singh, M., Bansal, D. & Sofat, S. Behavioral analysis and classification of spammers distributing pornographic content in social media. Soc. Netw. Anal. Min. 6, 41 (2016). https://doi.org/10.1007/s13278-016-0350-0
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-016-0350-0