Skip to main content

Spam Review Detection Using Ensemble Machine Learning

  • Conference paper
  • First Online:
Machine Learning and Data Mining in Pattern Recognition (MLDM 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10935))

Abstract

The importance of consumer reviews has evolved significantly with increasing inclination towards e-Commerce. Potential consumers exhibit sincere intents in seeking opinions of other consumers. These consumers have had a usage experience of the products they are intending to make a purchase decision on. The underlying businesses also deem it fit to ascertain common public opinions regarding the quality of their products as well as services. However, the consumer reviews have bulked over time to such an extent that it has become a highly challenging task to read all the reviews and detect their genuineness. Hence, it is crucial to manage reviews since spammers can manipulate the reviews to demote or promote wrong product. The paper proposes an algorithm for detecting the fake reviews. Since the proposed work concentrates only on text. So, n-gram (unigram + bigram) features are used. Supervised learning technique is used for reviews filtering. The proposed algorithm considers the combination of multiple learning algorithms for better predictive performance. The obtained results clearly indicate that using only simple features like n-gram, Ensemble can boost efficiency of algorithm at significant level.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  2. Bajaj, S., Garg, N., Singh, S.K.: A novel user-based spam review detection. Procedia Comput. Sci. 122, 1009–1015 (2017)

    Article  Google Scholar 

  3. Džeroski, S., Ženko, B.: Is combining classifiers with stacking better than selecting the best one? Mach. Learn. 54(3), 255–273 (2004)

    Article  Google Scholar 

  4. Feng, S., Xing, L., Gogar, A., Choi, Y.: Distributional footprints of deceptive product reviews. ICWSM 12, 98–105 (2012)

    Google Scholar 

  5. Gaurav, K., Kumar, P.: Consumer satisfaction rating system using sentiment analysis. In: Kar, A.K., et al. (eds.) I3E 2017. LNCS, vol. 10595, pp. 400–411. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68557-1_35

    Chapter  Google Scholar 

  6. Gunn, S.R.: Support vector machines for classification and regression. ISIS Tech. Rep. 14(1), 5–16 (1998)

    Google Scholar 

  7. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)

    Article  Google Scholar 

  8. Heredia, B., Khoshgoftaar, T.M., Prusa, J., Crawford, M.: An investigation of ensemble techniques for detection of spam reviews. In: 15th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 127–133. IEEE, December 2016

    Google Scholar 

  9. Jindal, N., Liu, B.: Opinion spam and analysis. In: Proceedings of the 2008 International Conference on Web Search and DataMining. ACM (2008)

    Google Scholar 

  10. Kim, H.C., Pang, S., Je, H.M., Kim, D., Bang, S.Y.: Constructing support vector machine ensemble. Pattern Recognit. 36(12), 2757–2767 (2003)

    Article  Google Scholar 

  11. Kumar, P., Dasari, Y., Nath, S., Sinha, A.: Controlling and mitigating targeted socio-economic attacks. In: Dwivedi, Y.K., et al. (eds.) I3E 2016. LNCS, vol. 9844, pp. 471–476. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45234-0_42

    Chapter  Google Scholar 

  12. Kumar, P., Dasari, Y., Jain, A., Sinha, A.: Fake order mitigation: a profile based mechanism. In: Kar, A.K., et al. (eds.) I3E 2017. LNCS, vol. 10595, pp. 276–288. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68557-1_25

    Chapter  Google Scholar 

  13. Li, J., Ott, M., Cardie, C., Hovy, E.H.: Towards a general rule for identifying deceptive opinion spam. In: ACL, vol. 1, pp. 1566–1576, June 2014

    Google Scholar 

  14. Mukherjee, A., Venkataraman, V., Liu, B., Glance, N.: What yelp fake review filter might be doing?. In: Seventh International AAAI Conference on Weblogs and Social Media, June 2013

    Google Scholar 

  15. Mukherjee, A., Kumar, A., Liu, B., Wang, J., Hsu, M., Castellanos, M., Ghosh, R.: Spotting opinion spammers using behavioral footprints. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 632–640. ACM, August 2013

    Google Scholar 

  16. Ott, M., Choi, Y., Cardie, C., Hancock, J.T.: Finding deceptive opinion spam by any stretch of the imagination. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 309–319. Association for Computational Linguistics, June 2011

    Google Scholar 

  17. Ott, M., Cardi, C., Hancock, J.T.: Negative deceptive opinion spam. In: HLT- NAACL (2013)

    Google Scholar 

  18. Peng, Q., Zhong, M.: Detecting spam review through sentiment analysis. JSW 9(8), 2065–2072 (2014)

    Article  Google Scholar 

  19. Qian, T., Liu, B.: Identifying multiple userids of the same author. In: EMNLP, pp. 1124–1135, October 2013

    Google Scholar 

  20. Rennie, J.D., Shih, L., Teevan, J., Karger, D.R.: Tackling the poor assumptions of naive bayes text classifiers. In: Proceedings of the 20th International Conference on Machine Learning (ICML-2003), pp. 616–623 (2003)

    Google Scholar 

  21. Shojaee, S., Murad, M.A.A., Azman, A.B., Sharef, N.M., Nadali, S.: Detecting deceptive reviews using lexical and syntactic features. In: 2013 13th International Conference on Intelligent Systems Design and Applications (ISDA), pp. 53–58. IEEE, December 2013

    Google Scholar 

  22. Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 45(4), 427–437 (2009)

    Article  Google Scholar 

  23. Srivastava, A., Singh, M.P., Kumar, P.: Supervised semantic analysis of product reviews using weighted k-NN classifier. In: 2014 11th International Conference on Information Technology: New Generations (ITNG), pp. 502–507. IEEE, April 2014

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ayushi Jain .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mani, S., Kumari, S., Jain, A., Kumar, P. (2018). Spam Review Detection Using Ensemble Machine Learning. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2018. Lecture Notes in Computer Science(), vol 10935. Springer, Cham. https://doi.org/10.1007/978-3-319-96133-0_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-96133-0_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-96132-3

  • Online ISBN: 978-3-319-96133-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics