Novel Comment Spam Filtering Method on Youtube: Sentiment Analysis and Personality Recognition

Ezpeleta, Enaitz; Garitano, Iñaki; Arenaza-Nuño, Ignacio; Hidalgo, José María Gómez; Zurutuza, Urko

doi:10.1007/978-3-319-74433-9_21

Enaitz Ezpeleta ORCID: orcid.org/0000-0003-4121-8869¹⁵,
Iñaki Garitano¹⁵,
Ignacio Arenaza-Nuño¹⁵,
José María Gómez Hidalgo¹⁶ &
…
Urko Zurutuza¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10544))

Included in the following conference series:

International Conference on Web Engineering

2159 Accesses
1 Citations
12 Altmetric

Abstract

The deeply entrenched use of Online Social Networks (OSNs), where millions of users share unconsciously any kind of personal data, offers a very attractive channel to attackers. They provide the possibility of sending spam messages through different channels (wall posts, comments, private messages). In this paper we propose a novel spam filtering method focused on social media spam. It aims to demonstrate that using sentiment analysis and personality recognition techniques, in order to analyze the content of the texts, the improvement of spam filtering results is possible. We add these features to each OSN spam both independently and jointly, and then we compare Bayesian spam filters with and without the new features in terms of the number of false positive and accuracy. At the end, the results of the top ten filtering classifiers have been improved, reducing also the number of false positives (26.69% on average), reaching an 82.55% of accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Gao, H., Hu, J., Wilson, C., Li, Z., Chen, Y., Zhao, B.Y.: Detecting and characterizing social spam campaigns. In: Proceedings of the 17th ACM Conference on Computer and Communications Security, CCS 2010, pp. 681–683. ACM, New York (2010)
Google Scholar
Ezpeleta, E., Zurutuza, U., Gómez Hidalgo, J.M.: Does sentiment analysis help in Bayesian spam filtering? In: Martínez-Álvarez, F., Troncoso, A., Quintián, H., Corchado, E. (eds.) HAIS 2016. LNCS (LNAI), vol. 9648, pp. 79–90. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32034-2_7
Chapter Google Scholar
Ezpeleta, E., Zurutuza, U., Gómez Hidalgo, J.M.: Using personality recognition techniques to improve Bayesian spam filtering. Journal Procesamiento del Lenguaje Natural 57, 125–132 (2016)
Google Scholar
Almaatouq, A., Shmueli, E., Nouh, M., Alabdulkareem, A., Singh, V.K., Alsaleh, M., Alarifi, A., Alfaris, A., Pentland, A.S.: If it looks like a spammer and behaves like a spammer, it must be a spammer: analysis and detection of microblogging spam accounts. Int. J. Inf. Secur. 15(5), 475–491 (2016)
Article Google Scholar
Stringhini, G., Kruegel, C., Vigna, G.: Detecting spammers on social networks. In: Proceedings of the 26th Annual Computer Security Applications Conference, ACSAC 2010, pp. 1–9. ACM, New York (2010)
Google Scholar
Wang, D., Irani, D., Pu, C.: A social-spam detection framework. In: Proceedings of the 8th Annual Collaboration, Electronic Messaging, Anti-abuse and Spam Conference, pp. 46–54. ACM (2011)
Google Scholar
Egele, M., Stringhini, G., Kruegel, C., Vigna, G.: COMPA: detecting compromised accounts on social networks. In: NDSS. The Internet Society (2013)
Google Scholar
Gao, H., Chen, Y., Lee, K., Palsetia, D., Choudhary, A.N.: Towards online spam filtering in social networks. In: NDSS. The Internet Society (2012)
Google Scholar
Ezpeleta, E., Zurutuza, U., Hidalgo, J.M.G.: A study of the personalization of spam content using facebook public information. Log. J. IGPL 25(1), 30–41 (2017)
Article MathSciNet Google Scholar
Yang, C., Harkreader, R., Zhang, J., Shin, S., Gu, G.: Analyzing spammers’ social networks for fun and profit: a case study of cyber criminal ecosystem on twitter. In: Proceedings of the 21st International Conference on World Wide Web, pp. 71–80. ACM (2012)
Google Scholar
Song, J., Lee, S., Kim, J.: Spam filtering in twitter using sender-receiver relationship. In: Sommer, R., Balzarotti, D., Maier, G. (eds.) RAID 2011. LNCS, vol. 6961, pp. 301–317. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23644-0_16
Chapter Google Scholar
Wang, A.H.: Don’t follow me: spam detection in twitter. In: Proceedings of the 2010 International Conference on Security and Cryptography (SECRYPT), pp. 1–10. IEEE (2010)
Google Scholar
Zheng, X., Zeng, Z., Chen, Z., Yu, Y., Rong, C.: Detecting spammers on social networks. Neurocomputing 159, 27–34 (2015)
Article Google Scholar
Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2(1–2), 1–135 (2008)
Article Google Scholar
Liu, B., Zhang, L.: A survey of opinion mining and sentiment analysis. In: Aggarwal, C., Zhai, C. (eds.) Mining Text Data, pp. 415–463. Springer, Boston (2012). https://doi.org/10.1007/978-1-4614-3223-4_13
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, EMNLP 2002, vol. 10, pp. 79–86. Association for Computational Linguistics, Stroudsburg (2002)
Google Scholar
Turney, P.D.: Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL 2002, pp. 417–424. Association for Computational Linguistics, Stroudsburg (2002)
Google Scholar
Perveen, N., Missen, M.M.S., Rasool, Q., Akhtar, N.: Sentiment based twitter spam detection. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 7(7), 568–573 (2016)
Google Scholar
Vinciarelli, A., Mohammadi, G.: A survey of personality computing. IEEE Trans. Affect. Comput. 5(3), 273–291 (2014)
Article Google Scholar
Celli, F., Poesio, M.: PR2: a language independent unsupervised tool for personality recognition from text. CoRR abs/1402.2796 (2014)
Google Scholar
Myers, I.B., Myers, P.B.: Gifts Differing: Understanding Personality Type. CPP Inc., Palo Alto (1980)
Google Scholar
Costa, P.T., McCrae, R.R.: Normal personality assessment in clinical practice: the neo personality inventory. Psychol. Assess. 4(1), 5 (1992)
Article Google Scholar
Mairesse, F., Walker, M.A., Mehl, M.R., Moore, R.K.: Using linguistic cues for the automatic recognition of personality in conversation and text. J. Artif. Int. Res. 30(1), 457–500 (2007)
MATH Google Scholar
Oberlander, J., Nowson, S.: Whose thumb is it anyway?: Classifying author personality from weblog text. In: Proceedings of the COLING/ACL on Main Conference Poster Sessions, COLING-ACL 2006, pp. 627–634. Association for Computational Linguistics, Stroudsburg (2006)
Google Scholar
Bai, S., Zhu, T., Cheng, L.: Big-five personality prediction based on user behaviors at social network sites. CoRR abs/1204.4809 (2012)
Google Scholar
Rangel, F., Celli, F., Rosso, P., Potthast, M., Stein, B., Daelemans, W.: Overview of the 3rd author profiling task at PAN 2015. In: Working Notes Papers of the CLEF 2015 Evaluation Labs. CEUR Workshop Proceedings, CLEF and CEUR-WS.org, September 2015
Google Scholar
Shen, J., Brdiczka, O., Liu, J.: Understanding email writers: personality prediction from email messages. In: Carberry, S., Weibelzahl, S., Micarelli, A., Semeraro, G. (eds.) UMAP 2013. LNCS, vol. 7899, pp. 318–330. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38844-6_29
Chapter Google Scholar
Hernández Fusilier, D., Montes-y-Gómez, M., Rosso, P., Guzmán Cabrera, R.: Detecting positive and negative deceptive opinions using PU-learning. Inf. Process. Manag. 51(4), 433–443 (2015)
Article Google Scholar
Fornaciari, T., Celli, F., Poesio, M.: The effect of personality type on deceptive communication style. In: 2013 European Intelligence and Security Informatics Conference (EISIC), pp. 1–6, August 2013
Google Scholar
O’Callaghan, D., Harrigan, M., Carthy, J., Cunningham, P.: Network analysis of recurring youtube spam campaigns. CoRR abs/1201.3783 (2012)
Google Scholar
Jensen, G.H., DiTiberio, J.K.: Personality and the Teaching of Composition. Ablex, Norwood (1989)
Google Scholar

Download references

Acknowledgments

This work has been developed by the intelligent systems for industrial systems group supported by the Department of Education, Language policy and Culture of the Basque Government. It has been partially funded by the Basque Department of Education, Language policy and Culture under the project SocialSPAM (PI 2014 1 102).

We thank Mattias Östmar for the valuable tools developed and published. And we thank Jon Kâgström (Founder of uClassify (https://www.uclassify.com)) for the opportunity to use their API for research purposes.

Author information

Authors and Affiliations

Electronics and Computing Department, Mondragon University, Goiru Kalea, 2, 20500, Arrasate-Mondragón, Spain
Enaitz Ezpeleta, Iñaki Garitano, Ignacio Arenaza-Nuño & Urko Zurutuza
Pragsis Technologies, Manuel Tovar, 43-53, Fuencarral, 28034, Madrid, Spain
José María Gómez Hidalgo

Authors

Enaitz Ezpeleta
View author publications
You can also search for this author in PubMed Google Scholar
Iñaki Garitano
View author publications
You can also search for this author in PubMed Google Scholar
Ignacio Arenaza-Nuño
View author publications
You can also search for this author in PubMed Google Scholar
José María Gómez Hidalgo
View author publications
You can also search for this author in PubMed Google Scholar
Urko Zurutuza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Enaitz Ezpeleta .

Editor information

Editors and Affiliations

Universidad de Alicante, Alicante, Spain
Irene Garrigós
Institute of Software Technology and Interactive Systems, TU Wien, Vienna, Austria
Manuel Wimmer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ezpeleta, E., Garitano, I., Arenaza-Nuño, I., Hidalgo, J.M.G., Zurutuza, U. (2018). Novel Comment Spam Filtering Method on Youtube: Sentiment Analysis and Personality Recognition. In: Garrigós, I., Wimmer, M. (eds) Current Trends in Web Engineering. ICWE 2017. Lecture Notes in Computer Science(), vol 10544. Springer, Cham. https://doi.org/10.1007/978-3-319-74433-9_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-74433-9_21
Published: 22 February 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-74432-2
Online ISBN: 978-3-319-74433-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics