Abstract
With the advent of social networking sites, opinion-mining applications have attracted the interest of the online community on review sites to know about products for their purchase decisions. However, due to increasing trend of posting spam (fake) reviews to promote the target products or defame the specific brands of competitors, Opinion Spam detection and classification has emerged as a hot issue in the community of opinion mining and sentiment analysis. We investigate the issue of Opinion Spam detection by using different combinations of entities, features, and their sentiment scores. We enrich the feature set of a baseline Spam detection method with Spam detection features (Opinion Spam, Opinion Spammer, Item Spam). Using a dataset of reviews from the Amazon site and sentences labeled for Spam detection, we evaluate the role of spamicity-related features in detecting and classifying spam (fake) clues and distinguishing them from genuine reviews. For this purpose, we introduce a rule-based feature weighting scheme and propose a method for tagging the review sentence as spam and non-spam. Experiments results depict that spam-related features improve Spam detection in review sentences posted on product review sites. Adding a revised feature weighting scheme achieved an accuracy increase from 93 to 96%. Furthermore, a hybrid set of features are shown to improve the performance of Opinion Spam detection in terms of better precision, recall, and F-measure values. This work shows that combining spam-related features with rule-based weighting scheme can improve the performance of even baseline Spam detection method. This improvement can be of use to Opinion Spam detection systems, due to the growing interest of individuals and companies in isolating fake (spam) and genuine (non-spam) reviews about products. The outcome of this work will provide an insight into spam-related features and feature weighting and will assist in developing more advanced applications for Opinion Spam detection. In the field of Opinion Spam detection, previous state-of-the-art studies used less number of spamicity-related features and less efficient feature weighting scheme. However, we provided a revised feature selection and a revised feature weighting scheme with normalized spamicity score computation technique. Therefore, our contribution is novel to the field because it provides a significant improvement over the comparing methods.




Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abu Hammad A, El-Halees A (2015) An approach for detecting spam in Arabic opinion reviews. Int Arab J Inf Technol (IAJIT) 12(1):9–16
Ahmed H, Traore I, Saad S (2017) Detection of online fake news using N-gram analysis and machine learning techniques. In: International conference on intelligent, secure, and dependable systems in distributed and cloud environments. Springer, Cham, pp 127–138
Algur SP, Biradar JG (2015a) Review spamicity based on rank and content of the review. In: 2015 international conference on applied and theoretical computing and communication technology (iCATccT). IEEE, pp 140–145
Algur SP, Biradar JG (2015b) Rating consistency and review content based multiple stores review spam detection. In: 2015 international conference on information processing (ICIP). IEEE, pp 685–690
Asghar MZ, Khan A, Ahmad S, Khan IA, Kundi FM (2015) A unified framework for creating domain dependent polarity lexicons from user generated reviews. PLoS ONE 10(10):e0140204
Asghar MZ, Ahmad S, Qasim M, Zahra SR, Kundi FM (2016) SentiHealth: creating health-related sentiment lexicon using hybrid approach. SpringerPlus 5(1):1139
Asghar MZ, Khan A, Ahmad S, Qasim M, Khan IA (2017) Lexicon-enhanced sentiment analysis framework using rule-based classification scheme. PLoS ONE 12(2):e0171649
Asghar MZ, Kundi FM, Ahmad S, Khan A, Khan F (2018) T-SAF: twitter sentiment analysis framework using a hybrid classification scheme. Exp Syst 35(1):e12233
Bandakkanavar RV, Ramesh M, Geeta H (2014) A survey on detection of reviews using sentiment classification of methods. IJRITCC 2(2):310–314
Becchetti L, Castillo C, Donato D, Baeza-Yates R, Leonardi S (2008) Link analysis for web Spam detection. ACM Trans Web TWEB 2(1):2
Bird S, Klein E, Loper E (2009) Natural language processing with Python: analyzing text with the natural language toolkit. O’Reilly Media Inc, Sebastopol
Chen YR, Chen HH (2015) Opinion spam detection in web forum: a real case study. In: Proceedings of the 24th international conference on world wide web. ACM, pp 173–183
Chirita PA, Diederich J, Nejdl W (2005) MailRank: using ranking for spam detection. In: Proceedings of the 14th ACM international conference on information and knowledge management. ACM, pp 373–380
Crawford M, Khoshgoftaar TM, Prusa JD, Richter AN, Al Najada H (2015) Survey of review spam detection using machine learning techniques. J Big Data 2(1):23
De Souza FB, De Magalhaes TR, Almeida VAF, De Almeida JM, Goncalves MA (2010) U.S. Patent application no. 12/967,923
Elli MS, Wang YF (2015) Amazon reviews, business analytics with sentiment analysis. https://pdfs.semanticscholar.org/bbb4/b549cae71fb74680764fd3fe4d72b705f4f4.pdf
Fairbanks J, Fitch N, Knauf N, Briscoe E (2018) Credibility assessment in the news: do we need to read? MIS2’18, Feb 2018, Los Angeles, California USA
Fei G, Mukherjee A, Liu B, Hsu M, Castellanos M, Ghosh R (2013) Exploiting burstiness in reviews for review spammer detection. ICWSM 13:175–184
Feng S, Xing L, Gogar A, Choi Y (2012) Distributional footprints of deceptive product reviews. ICWSM 12:98–105
Gilbert E, Karahalios K (2010) Understanding deja reviewers. In: Proceedings of the 2010 ACM conference on computer supported cooperative work. ACM, pp 225–228
Granik M, Mesyura V (2017) Fake news detection using naive Bayes classifier. In: 2017 IEEE first Ukraine conference on electrical and computer engineering (UKRCON). IEEE, pp 900–903
Hosseinimotlagh S, Papalexakis EE (2018) Unsupervised content-based identification of fake news articles with tensor decomposition ensembles. MIS2, Marina Del Rey, CA, USA
Jindal N, Liu B (2008) Opinion spam and analysis. In: Proceedings of the 2008 international conference on web search and data mining. ACM, pp 219–230
Jindal N, Liu B, Lim EP (2010) Finding unusual review patterns using unexpected rules. In: Proceedings of the 19th ACM international conference on information and knowledge management. ACM, pp 1549–1552
Kokate S, Tidke B (2015) Fake review and brand spam detection using J48 classifier. IJCSIT Int J Comput Sci Inf Technol 6(4):3523–3526
Li J, Ott M, Cardie C, Hovy EH (2014) Towards a general rule for identifying deceptive opinion spam. In: ACL, vol 1, pp 1566–1576
Li L, Qin B, Ren W, Liu T (2017) Document representation and feature combination for deceptive spam review detection. Neurocomputing 254:33–41
Lim EP, Nguyen VA, Jindal N, Liu B, Lauw HW (2010) Detecting product review spammers using rating behaviors. In: Proceedings of the 19th ACM international conference on Information and knowledge management. ACM, pp 939–948
Lloret E, Saggion H, Palomar M (2010) Experiments on summary-based opinion classification. In: Proceedings of the NAACL HLT 2010 workshop on computational approaches to analysis and generation of emotion in text. Association for Computational Linguistics, pp 107–115
McAuley J, Pandey R, Leskovec J (2015) Inferring networks of substitutable and complementary products. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 785–794
Montes-y-Gomez M, Rosso P (2013) Using PU-learning to detect deceptive opinion spam. In: Proceedings of the 4th workshop on computational approaches to subjectivity, sentiment and social media analysis, pp 38–45
Mukherjee A, Liu B, Wang J, Glance N, Jindal N (2011) Detecting group review spam. In: Proceedings of the 20th international conference companion on world wide web. ACM, pp 93–94
Mukherjee A, Liu B, Glance N (2012) Spotting fake reviewer groups in consumer reviews. In: Proceedings of the 21st international conference on world wide web. ACM, pp 191–200
Mukherjee A, Kumar A, Liu B, Wang J, Hsu M, Castellanos M, Ghosh R (2013a) Spotting opinion spammers using behavioral footprints. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 632–640
Mukherjee A, Venkataraman V, Liu B, Glance NS (2013b) What yelp fake review filter might be doing? In: ICWSM
Nair A, Phapale A, Yagnik V, Bathe K (2016) Opinion spam mining. Int Res J Eng Technol (IRJET) 3(4):1855–1859
Noekhah S, Fouladfar E, Salim N, Ghorashi SH, Hozhabri AA (2014) A novel approach for opinion spam detection in e-commerce. In: Proceedings of the 8th IEEE international conference on E-commerce with focus on E-trust
Ott M, Choi Y, Cardie C, Hancock JT (2011) Finding deceptive opinion spam by any stretch of the imagination. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, vol 1. Association for Computational Linguistics, pp 309–319
Ott M, Cardie C, Hancock JT (2013) Negative deceptive opinion spam. In: HLT-NAACL, pp 497–501
Prajapati J, Bhatt M, Prajapati DJ (2012) Detection and summarization of genuine review using visual data mining. Int J Comput Appl 43(11):22–26
Radulescu C, Dinsoreanu M, Potolea R (2014) Identification of spam comments using natural language processing techniques. In: 2014 IEEE international conference on intelligent computer communication and processing (ICCP). IEEE, pp 29–35
Rajamohana SP, Umamaheswari K (2018) Hybrid approach of improved binary particle swarm optimization and shuffled frog leaping for feature selection. Comput Electr Eng 67:497–508
Rajamohana SP, Umamaheswari K, Karthiga R (2015) Sentiment classification based on latent Dirichlet allocation. Int J Comput Appl. ISSN 0975-8887
Rajamohana SP, Umamaheshwari K, Karthiga R (2016) Sentiment analysis using shuffled frog leaping algorithm. Int J Adv Res Comput Sci Softw Eng 6(12)
Raschka S (2018) About feature scaling and normalization. http://sebastianraschka.com/Articles/2014_about_feature_scaling.html. Last Accessed 03 Jan 2018
Rout JK, Dalmia A, Choo KKR, Bakshi S, Jena SK (2017) Revisiting semi-supervised learning for online deceptive review detection. IEEE Access 5(1):1319–1327
Sharma K, Lin KI (2013) Review spam detector with rating consistency check. In: Proceedings of the 51st ACM Southeast conference. ACM, p 34
Shojaee S, Murad MAA, Azman AB, Sharef NM, Nadali S (2013) Detecting deceptive reviews using lexical and syntactic features. In: 2013 13th international conference on intelligent systems design and applications (ISDA). IEEE, pp 53–58
Sun C, Du Q, Tian G (2016) Exploiting product related review features for fake review detection. Math Probl Eng 2016:4935792. https://doi.org/10.1155/2016/4935792
Wang G, Xie S, Liu B, Philip SY (2011) Review graph based online store review spammer detection. In: 2011 IEEE 11th international conference on data mining (ICDM). IEEE, pp 1242–1247
Wang G, Xie S, Liu B, Yu PS (2012) Identify online store review spammers via social review graph. ACM Trans Intell Syst Technol (TIST) 3(4):61
Wu G, Greene D, Smyth B, Cunningham P (2010) Distortion as a validation criterion in the identification of suspicious reviews. In: Proceedings of the first workshop on social media analytics. ACM, pp 10–13
Wu J, Xu B, Li S (2011) An unsupervised approach to rank product reviews. In: 2011 eighth international conference on fuzzy systems and knowledge discovery (FSKD), vol 3. IEEE, pp 1769–1772
Xie S, Wang G, Lin S, Yu PS (2012) Review spam detection via temporal pattern discovery. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 823–831
Zhiyuli A, Liang X, Wang Y (2015) Discerning the trend: concealing deceptive reviews. In 2015 IEEE international conference on systems, man, and cybernetics (SMC). IEEE, pp 1833–1838
Zubiaga A, Aker A, Bontcheva K, Liakata M, Procter R (2018) Detection and resolution of rumours in social media: a survey. ACM Comput Surve (CSUR) 51(2):32
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by V. Loia.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Asghar, M.Z., Ullah, A., Ahmad, S. et al. Opinion spam detection framework using hybrid classification scheme. Soft Comput 24, 3475–3498 (2020). https://doi.org/10.1007/s00500-019-04107-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-019-04107-y