Skip to main content

Machine and Deep Learning Algorithms for Twitter Spam Detection

  • Conference paper
  • First Online:
Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2019 (AISI 2019)

Abstract

Twitter allows users to send short text-based messages with up to 280 characters which is called “tweets”. The reputation of Twitter attracts the spammers to spread malevolent programming through URLs attached in tweets. Twitter spam has become a critical problem. Spam refers to a variety of prohibited behaviours that violate the Twitter rules. In this paper, different machine and deep learning algorithms are used to detect if the tweet is spammer or not. The performance of six machine learning algorithms, namely Random Forest (RF), Naive Bayes (NB), Bayesian Network (BN), Support Vector Machine (SVM), K-Nearest Neighbour (KNN), and Multi-Layer Perceptron (MLP) and one deep learning algorithm which is Recurrent Neural Network (RNN) are evaluated. Different test options are used, namely cross validation and percentage split tests. Results show that RF predicts the best result with lowest error rate and highest classification accuracy rate with different test options comparing to all algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://nsclab.org/nsclab/resource.

References

  1. bin Othman, M.F., Yau, T.M.: Comparison of different classification techniques using WEKA for breast cancer. In: 3rd Kuala Lumpur International Conference on Biomedical Engineering, pp. 520–523. Springer, Heidelberg (2007)

    Google Scholar 

  2. Frank, C., Habach, A., Seetan, R.: Predicting smoking status using machine learning algorithms and statistical analysis. J. Comput. Sci. Coll. 33, 66 (2018)

    Google Scholar 

  3. Wang, A.H.: Don’t follow me: spam detection in Twitter. In: 2010 International Conference on Security and Cryptography (SECRYPT), pp. 1–10. IEEE (2010)

    Google Scholar 

  4. Benevenuto, F., Magno, G., Rodrigues, T., Almeida, V.: Detecting spammers on Twitter. In: Collaboration, Electronic Messaging, Anti-abuse and Spam Conference (CEAS), vol. 6, pp. 12. (2010)

    Google Scholar 

  5. Gao, Y., Mi, G., Tan, Y.: Variable length concentration based feature construction method for spam detection. In: 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1–7. IEEE (2015)

    Google Scholar 

  6. Liaw, A., Wiener, M.: Classification and regression by randomForest. R News 2(3), 18–22 (2002)

    Google Scholar 

  7. Zhang, H.: The optimality of naive Bayes. AA 1(2), 3 (2004)

    Google Scholar 

  8. Jensen, F.V.: An Introduction to Bayesian Networks. UCL Press, London (1996)

    Google Scholar 

  9. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–97 (1995)

    MATH  Google Scholar 

  10. Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46(3), 175–85 (1992)

    MathSciNet  Google Scholar 

  11. Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach. Pearson Education Limited, Malaysia (2016)

    MATH  Google Scholar 

  12. Haykin, S.S.: Neural Networks and Learning Machines. Pearson Education, Upper Saddle River (2009)

    Google Scholar 

  13. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)

    Article  Google Scholar 

  14. Sedhai, S., Sun, A.: Semi-supervised spam detection in Twitter stream. IEEE Trans. Comput. Soc. Syst. 5(1), 169–175 (2017)

    Article  Google Scholar 

  15. Witten, I.H., Frank, E., Trigg, L.E., Hall, M.A., Holmes, G., Cunningham, S.J.: Weka: practical machine learning tools and techniques with Java implementations (1999)

    Google Scholar 

  16. Team, D.: Deeplearning4j: open-source distributed deep learning for the JVM. Apache Software Foundation License 2 (2016)

    Google Scholar 

  17. Chen, C., Zhang, J., Chen, X., Xiang, Y., Zhou, W.: 6 million spam tweets: a large ground truth for timely Twitter spam detection. In: IEEE International Conference on Communications (ICC), pp. 7065–7070. IEEE (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dabiah A. Alboaneen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Alsaffar, D. et al. (2020). Machine and Deep Learning Algorithms for Twitter Spam Detection. In: Hassanien, A., Shaalan, K., Tolba, M. (eds) Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2019. AISI 2019. Advances in Intelligent Systems and Computing, vol 1058. Springer, Cham. https://doi.org/10.1007/978-3-030-31129-2_44

Download citation

Publish with us

Policies and ethics