Spam Review Detection Using Ensemble Machine Learning

Mani, Shwet; Kumari, Sneha; Jain, Ayushi; Kumar, Prabhat

doi:10.1007/978-3-319-96133-0_15

Shwet Mani¹⁴,
Sneha Kumari¹⁴,
Ayushi Jain¹⁴ &
…
Prabhat Kumar¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10935))

Included in the following conference series:

International Conference on Machine Learning and Data Mining in Pattern Recognition

2228 Accesses
13 Citations

Abstract

The importance of consumer reviews has evolved significantly with increasing inclination towards e-Commerce. Potential consumers exhibit sincere intents in seeking opinions of other consumers. These consumers have had a usage experience of the products they are intending to make a purchase decision on. The underlying businesses also deem it fit to ascertain common public opinions regarding the quality of their products as well as services. However, the consumer reviews have bulked over time to such an extent that it has become a highly challenging task to read all the reviews and detect their genuineness. Hence, it is crucial to manage reviews since spammers can manipulate the reviews to demote or promote wrong product. The paper proposes an algorithm for detecting the fake reviews. Since the proposed work concentrates only on text. So, n-gram (unigram + bigram) features are used. Supervised learning technique is used for reviews filtering. The proposed algorithm considers the combination of multiple learning algorithms for better predictive performance. The obtained results clearly indicate that using only simple features like n-gram, Ensemble can boost efficiency of algorithm at significant level.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Bajaj, S., Garg, N., Singh, S.K.: A novel user-based spam review detection. Procedia Comput. Sci. 122, 1009–1015 (2017)
Article Google Scholar
Džeroski, S., Ženko, B.: Is combining classifiers with stacking better than selecting the best one? Mach. Learn. 54(3), 255–273 (2004)
Article Google Scholar
Feng, S., Xing, L., Gogar, A., Choi, Y.: Distributional footprints of deceptive product reviews. ICWSM 12, 98–105 (2012)
Google Scholar
Gaurav, K., Kumar, P.: Consumer satisfaction rating system using sentiment analysis. In: Kar, A.K., et al. (eds.) I3E 2017. LNCS, vol. 10595, pp. 400–411. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68557-1_35
Chapter Google Scholar
Gunn, S.R.: Support vector machines for classification and regression. ISIS Tech. Rep. 14(1), 5–16 (1998)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Article Google Scholar
Heredia, B., Khoshgoftaar, T.M., Prusa, J., Crawford, M.: An investigation of ensemble techniques for detection of spam reviews. In: 15th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 127–133. IEEE, December 2016
Google Scholar
Jindal, N., Liu, B.: Opinion spam and analysis. In: Proceedings of the 2008 International Conference on Web Search and DataMining. ACM (2008)
Google Scholar
Kim, H.C., Pang, S., Je, H.M., Kim, D., Bang, S.Y.: Constructing support vector machine ensemble. Pattern Recognit. 36(12), 2757–2767 (2003)
Article Google Scholar
Kumar, P., Dasari, Y., Nath, S., Sinha, A.: Controlling and mitigating targeted socio-economic attacks. In: Dwivedi, Y.K., et al. (eds.) I3E 2016. LNCS, vol. 9844, pp. 471–476. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45234-0_42
Chapter Google Scholar
Kumar, P., Dasari, Y., Jain, A., Sinha, A.: Fake order mitigation: a profile based mechanism. In: Kar, A.K., et al. (eds.) I3E 2017. LNCS, vol. 10595, pp. 276–288. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68557-1_25
Chapter Google Scholar
Li, J., Ott, M., Cardie, C., Hovy, E.H.: Towards a general rule for identifying deceptive opinion spam. In: ACL, vol. 1, pp. 1566–1576, June 2014
Google Scholar
Mukherjee, A., Venkataraman, V., Liu, B., Glance, N.: What yelp fake review filter might be doing?. In: Seventh International AAAI Conference on Weblogs and Social Media, June 2013
Google Scholar
Mukherjee, A., Kumar, A., Liu, B., Wang, J., Hsu, M., Castellanos, M., Ghosh, R.: Spotting opinion spammers using behavioral footprints. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 632–640. ACM, August 2013
Google Scholar
Ott, M., Choi, Y., Cardie, C., Hancock, J.T.: Finding deceptive opinion spam by any stretch of the imagination. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 309–319. Association for Computational Linguistics, June 2011
Google Scholar
Ott, M., Cardi, C., Hancock, J.T.: Negative deceptive opinion spam. In: HLT- NAACL (2013)
Google Scholar
Peng, Q., Zhong, M.: Detecting spam review through sentiment analysis. JSW 9(8), 2065–2072 (2014)
Article Google Scholar
Qian, T., Liu, B.: Identifying multiple userids of the same author. In: EMNLP, pp. 1124–1135, October 2013
Google Scholar
Rennie, J.D., Shih, L., Teevan, J., Karger, D.R.: Tackling the poor assumptions of naive bayes text classifiers. In: Proceedings of the 20th International Conference on Machine Learning (ICML-2003), pp. 616–623 (2003)
Google Scholar
Shojaee, S., Murad, M.A.A., Azman, A.B., Sharef, N.M., Nadali, S.: Detecting deceptive reviews using lexical and syntactic features. In: 2013 13th International Conference on Intelligent Systems Design and Applications (ISDA), pp. 53–58. IEEE, December 2013
Google Scholar
Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 45(4), 427–437 (2009)
Article Google Scholar
Srivastava, A., Singh, M.P., Kumar, P.: Supervised semantic analysis of product reviews using weighted k-NN classifier. In: 2014 11th International Conference on Information Technology: New Generations (ITNG), pp. 502–507. IEEE, April 2014
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science and Engineering Department, National Institute of Technology Patna, Patna, India
Shwet Mani, Sneha Kumari, Ayushi Jain & Prabhat Kumar

Authors

Shwet Mani
View author publications
You can also search for this author in PubMed Google Scholar
Sneha Kumari
View author publications
You can also search for this author in PubMed Google Scholar
Ayushi Jain
View author publications
You can also search for this author in PubMed Google Scholar
Prabhat Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ayushi Jain .

Editor information

Editors and Affiliations

Institute of Computer Vision and Applied Computer Sciences, Leipzig, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mani, S., Kumari, S., Jain, A., Kumar, P. (2018). Spam Review Detection Using Ensemble Machine Learning. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2018. Lecture Notes in Computer Science(), vol 10935. Springer, Cham. https://doi.org/10.1007/978-3-319-96133-0_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-96133-0_15
Published: 08 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96132-3
Online ISBN: 978-3-319-96133-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics