skip to main content
10.1145/2818567.2818589acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccctConference Proceedingsconference-collections
research-article

Identification of Fake Reviews Using New Set of Lexical and Syntactic Features

Published: 25 September 2015 Publication History

Abstract

The services and products of E-Commerce portals in this digital age are heavily reviewed by the users. These reviews provide useful insights on the quality/usage of these products. Due to such importance of reviews, they can be faked to give false opinions about products and subsequently mislead the users. In this paper we are proposing new set of lexical and syntactic features set and applying supervised algorithms for performing classification on fake reviews dataset (gold standard). We focus on the writing style, that include type of punctuation mark, Part-of- Speech (POS) etc., that are helpful for detection of reviews spam. The final results give promising accuracy 91.51% for detecting fake reviews.

References

[1]
S. Banerjee and A. Y. Chua. A linguistic framework to distinguish between genuine and deceptive online reviews. In Proceedings of the International Conference on ICWS, 2014.
[2]
S. Banerjee and A. Y. Chua. A study of manipulative and authentic negative reviews. In Proceedings of the 8th International Conference on UIMC, page 76. ACM, 2014.
[3]
S. Banerjee and A. Y. Chua. Understanding the process of writing fake online reviews. In Digital Information Management (ICDIM), 2014 Ninth International Conference on, pages 68--73. IEEE, 2014.
[4]
S. Banerjee, A. Y. Chua, and J.-J. Kim. Using supervised learning to classify authentic and fake online reviews. In Proceedings of the 9th International Conference on UIMC, page 88. ACM, 2015.
[5]
C. Cristancho and E. Anduiza. Connective action in european mass protest. In workshop on Activist Social Media Communication, 2013.
[6]
M. Goswami and S. Gupta. Determination of fake reviews in hospitality sector. In Proceedings of the International Conference on Data Mining (DMIN), page 1, 2014.
[7]
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: an update. ACM SIGKDD explorations newsletter, 11(1):10--18, 2009.
[8]
N. Hu, I. Bose, N. S. Koh, and L. Liu. Manipulation of online reviews: An analysis of ratings, readability, and sentiments. DSS, 52(3):674--684, 2012.
[9]
N. Jindal and B. Liu. Analyzing and detecting review spam. In Data Mining, ICDM. Seventh IEEE International Conference, pages 547--552, 2007.
[10]
N. Jindal and B. Liu. Opinion spam and analysis. In Proceedings of the 2008 International Conference on Web Search and Data Mining, pages 219--230. ACM, 2008.
[11]
A. Karami and B. Zhou. Online review spam detection by new linguistic features. iConference 2015 Proceedings, 2015.
[12]
J. Li, M. Ott, C. Cardie, and E. Hovy. Towards a general rule for identifying deceptive opinion spam. ACL, 2014.
[13]
X. Lu. Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics, 15(4):474--496, 2010.
[14]
M. P. O'Mahony and B. Smyth. Using readability tests to predict helpful product reviews. In Adaptivity, Personalization and Fusion of Heterogeneous Information, pages 164--167, 2010.
[15]
T. Ong, M. Mannino, and D. Gregg. Linguistic characteristics of shill reviews. Electronic Commerce Research and Applications, 13(2):69--78, 2014.
[16]
C. Opinion. published by CRISIL RESEARCH. Feb. 2014.
[17]
M. Ott, C. Cardie, and J. T. Hancock. Negative deceptive opinion spam. In HLT-NAACL, pages 497--501, 2013.
[18]
M. Ott, Y. Choi, C. Cardie, and J. T. Hancock. Finding deceptive opinion spam by any stretch of the imagination. pages 309--319. ACL, 2011.
[19]
J. Platt et al. Fast training of support vector machines using sequential minimal optimization. Advances in kernel methodsâĂŤsupport vector learning, 3, 1999.
[20]
S. Shojaee, M. A. A. Murad, A. Bin Azman, N. M. Sharef, and S. Nadali. Detecting deceptive reviews using lexical and syntactic features. In (ISDA), 2013 13th International Conference, pages 53--58. IEEE, 2013.
[21]
S. Vajjala and D. Meurers. On improving the accuracy of readability classification using insights from second language acquisition. In Proceedings of the Seventh Workshop on Building Educational Applications Using NLP, pages 163--173. ACL, 2012.
[22]
Q. Xu and H. Zhao. Using deep linguistic features for finding deceptive opinion spam. In COLING (Posters), pages 1341--1350. Citeseer, 2012.

Cited By

View all
  • (2019)Multi-Label Fake News Detection using Multi-layered Supervised LearningProceedings of the 2019 11th International Conference on Computer and Automation Engineering10.1145/3313991.3314008(73-77)Online publication date: 23-Feb-2019
  • (2016)Finding of Review Spam through "Corleone, Review Genre, Writing Style and Review Text Detail Features"Proceedings of the Second International Conference on Information and Communication Technology for Competitive Strategies10.1145/2905055.2905081(1-6)Online publication date: 4-Mar-2016

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICCCT '15: Proceedings of the Sixth International Conference on Computer and Communication Technology 2015
September 2015
481 pages
ISBN:9781450335522
DOI:10.1145/2818567
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 September 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Opinion mining
  2. lexical and syntactic features
  3. reviews spam
  4. supervised learning algorithms

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICCCT '15

Acceptance Rates

Overall Acceptance Rate 33 of 124 submissions, 27%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)2
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2019)Multi-Label Fake News Detection using Multi-layered Supervised LearningProceedings of the 2019 11th International Conference on Computer and Automation Engineering10.1145/3313991.3314008(73-77)Online publication date: 23-Feb-2019
  • (2016)Finding of Review Spam through "Corleone, Review Genre, Writing Style and Review Text Detail Features"Proceedings of the Second International Conference on Information and Communication Technology for Competitive Strategies10.1145/2905055.2905081(1-6)Online publication date: 4-Mar-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media