Statistical Comparison of Opinion Spam Detectors in Social Media with Imbalanced Datasets

El-Alfy, El-Sayed M.; Al-Azani, Sadam

doi:10.1007/978-981-13-5826-5_12

El-Sayed M. El-Alfy¹⁴ &
Sadam Al-Azani¹⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 969))

Included in the following conference series:

International Symposium on Security in Computing and Communication

1443 Accesses
1 Citations

Abstract

Sentiment analysis is a growing research area that analyzes people’s opinions towards a specific target using posts shared in social media. However, spammers can inject false opinions to change sentiment-oriented decisions, e.g. low quality products or policies can be promoted or advocated over others. Therefore, identifying and removing spam posts in social media is a crucial data cleaning operation for text mining tasks including sentiment analysis. An inherent problem related to spam detection is the imbalanced-class problem. In this paper, we explore the impact of imbalance ratio on the performance of Twitter spam detection using multiple approaches of single and ensemble classifiers. Besides ensemble-based learning (Bagging and Random forest), we apply the SMOTE oversampling technique to improve detection performance especially for classifiers sensitive to imbalanced datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://nsclab.org/nsclab/resources/.

References

Alberto, T.C., Lochter, J.V., Almeida, T.A.: Tubespam: comment spam filtering on Youtube. In: 14th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 138–143 (2015)
Google Scholar
Almerekhi, H., Elsayed, T.: Detecting automatically-generated Arabic tweets. In: Zuccon, G., Geva, S., Joho, H., Scholer, F., Sun, A., Zhang, P. (eds.) AIRS 2015. LNCS, vol. 9460, pp. 123–134. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-28940-3_10
Chapter Google Scholar
Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newslett. 6(1), 20–29 (2004)
Article Google Scholar
Benevenuto, F., Magno, G., Rodrigues, T., Almeida, V.: Detecting spammers on Twitter. In: Collaboration, Electronic Messaging, Anti-abuse and Spam Conference (CEAS), vol. 6, p. 12 (2010)
Google Scholar
Boudad, N., Faizi, R., Thami, R.O.H., Chiheb, R.: Sentiment analysis in Arabic: a review of the literature. Ain Shams Eng. J. 9(4), 2479–2490 (2018). https://doi.org/10.1016/j.asej.2017.04.007
Article Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article Google Scholar
Chen, C., Zhang, J., Chen, X., Xiang, Y., Zhou, W.: 6 million spam tweets: a large ground truth for timely twitter spam detection. In: IEEE International Conference on Communications (ICC), pp. 7065–7070 (2015)
Google Scholar
El-Mawass, N., Alaboodi, S.: Detecting Arabic spammers and content polluters on twitter. In: Sixth International Conference on Digital Information Processing and Communications (ICDIPC), pp. 53–58 (2016)
Google Scholar
Grier, C., Thomas, K., Paxson, V., Zhang, M.: @spam: the underground on 140 characters or less. In: Proceedings of the 17th ACM Conference on Computer and Communications Security, pp. 27–37. ACM (2010)
Google Scholar
He, H., et al.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Article Google Scholar
Kabakus, A.T., Kara, R.: A survey of spam detection methods on Twitter. Int. J. Adv. Comput. Sci. Appl. 8(3), 29–38 (2017)
Google Scholar
Liu, B.: Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Technol. 5(1), 1–167 (2012)
Article Google Scholar
Mataoui, M., Zelmati, O., Boughaci, D., Chaouche, M., Lagoug, F.: A proposed spam detection approach for Arabic social networks content. In: IEEE International Conference on Mathematics and Information Technology (ICMIT), pp. 222–226 (2017)
Google Scholar
Platt, J., et al.: Fast training of support vector machines using sequential minimal optimization. In: Advances in Kernel Methods Support Vector Learning, vol. 3 (1999)
Google Scholar
Rajdev, M., Lee, K.: Fake and spam messages: detecting misinformation during natural disasters on social media. In: IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), vol. 1, pp. 17–20 (2015)
Google Scholar
Ruan, X., Wu, Z., Wang, H., Jajodia, S.: Profiling online social behaviors for compromised account detection. IEEE Trans. Inf. Forensics Secur. 11(1), 176–187 (2016)
Article Google Scholar
Song, J., Lee, S., Kim, J.: Spam filtering in Twitter using sender-receiver relationship. In: Sommer, R., Balzarotti, D., Maier, G. (eds.) RAID 2011. LNCS, vol. 6961, pp. 301–317. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23644-0_16
Chapter Google Scholar
Wang, A.H.: Detecting spam bots in online social networking sites: a machine learning approach. In: Foresti, S., Jajodia, S. (eds.) DBSec 2010. LNCS, vol. 6166, pp. 335–342. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13739-6_25
Chapter Google Scholar
Wang, D., Irani, D., Pu, C.: A social-spam detection framework. In: Proceedings of the 8th Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, pp. 46–54. ACM (2011)
Google Scholar
Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Los Altos (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Sciences and Engineering, King Fahd University of Petroleum and Minerals, Dhahran, 31261, Saudi Arabia
El-Sayed M. El-Alfy & Sadam Al-Azani

Authors

El-Sayed M. El-Alfy
View author publications
You can also search for this author in PubMed Google Scholar
Sadam Al-Azani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to El-Sayed M. El-Alfy .

Editor information

Editors and Affiliations

Technology and Management, Indian Institute of Information, Kerala, India
Sabu M. Thampi
Department of Computer Science, Missouri University of Science and Technology, Rolla, MO, USA
Sanjay Madria
Guangzhou University, Guangzhou, China
Guojun Wang
Howard University, Washington, DC, USA
Danda B. Rawat
University of the West of Scotland, Paisley, Glasgow, UK
Jose M. Alcaraz Calero

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

El-Alfy, ES.M., Al-Azani, S. (2019). Statistical Comparison of Opinion Spam Detectors in Social Media with Imbalanced Datasets. In: Thampi, S., Madria, S., Wang, G., Rawat, D., Alcaraz Calero, J. (eds) Security in Computing and Communications. SSCC 2018. Communications in Computer and Information Science, vol 969. Springer, Singapore. https://doi.org/10.1007/978-981-13-5826-5_12

Download citation

DOI: https://doi.org/10.1007/978-981-13-5826-5_12
Published: 24 January 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-5825-8
Online ISBN: 978-981-13-5826-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics