Skip to main content
Log in

A deceptive detection model based on topic, sentiment, and sentence structure information

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Deceptive reviews on Web are a common phenomenon and how to detect them has a very important impact on products, services, and even business policies. In order to filter out deceptive reviews more accurately, a new model called Sentence Joint Topic Sentiment Model (SJTSM) is presented in this paper, which incorporates the sentence structure of reviews and the sentiment label information of words based on Latent Dirichlet Allocation (LDA) model to extract the review features. The proposed model employs Gibbs algorithm to estimate the maximum likelihood parameters and takes the vector of topic-sentiment distribution as the review features. Then a voting system of multiple-classifier, which takes the extracted review feature vector as its input is designed to realize the classification of deceptive review detection. The comparative experiments on different public datasets with other existing methods based on LDA model show that the new classifying system based on SJTSM model can achieve more satisfying classification results on deceptive review detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Mukherjee A, Liu B, Wang J, Glance N, Jindal N (2011) Detecting group review spam. In: Proceedings of the 20th international conference companion on World Wide Web, pp 93–94

  2. Mukherjee A, Venkataraman V, Liu B (2013) What yelp fake review filter might be doing?. In: 7th international AAAI conference on web and social media

  3. Horne BD, Adali S (2017) This just in: Fake news packs a lot in title, uses simpler, repetitive content in text body more similar to satire than real news. In: 11th international AAAI conference on web and social media

  4. Choi WS, Kim SB (2015) N-gram feature selection for text classification based on symmetrical conditional probability and TF-IDF. J Korean Inst Ind Eng 41(4):381–388

    Google Scholar 

  5. Ahmed H, Traore I, Saad S (2018) Detecting opinion spams and fake news using text classification. Secur Priv 1:e9

    Google Scholar 

  6. Ott M, Cardie C, Hancock JT (2013) Negative deceptive opinion spam. In: Proceedings of the 2013 conference of the north american chapter of the association for computational linguistics: human language technologies, pp 497–501

  7. Ghosh S, Tonelli S, Johansson R (2013) Mining fine-grained opinion expressions with shallow parsing. In: Proceedings of the international conference Recent Advances in Natural Language Processing (RANLP), pp 302–310

  8. Liu R, Wei Z, Liu H, Fu Q (2015) A part of speech based public opinion text classification method. In: 2015 International conference on humanities and social science research

  9. Feng S, Banerjee R, Choi Y (2012) Syntactic stylometry for deception detection. In: Proceedings of the 50th annual meeting of the association for computational linguistics, pp 171–175

  10. Li J, Luong M-T, Jurafsky D, Hovy E (2015). When are tree structures necessary for deep learning of representations? arXiv:1503.00185

  11. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3 (Jan):993–1022

    MATH  Google Scholar 

  12. Li F, Huang M, Zhu X (2010) Sentiment analysis with global topics and local dependency. In: 24th AAAI conference on artificial intelligence, pp 1371–1376

  13. Lin C, He Y, Everson R, Ruger S (2011) Weakly supervised joint sentiment-topic detection from text. IEEE Trans Knowl Data Eng 24(6):1134–1145

    Google Scholar 

  14. Li K, Xie J, Sun X, Ma Y, Bai H (2011) Multi-class text categorization based on LDA and SVM. Procedia Eng 15:1963–1967

    Google Scholar 

  15. Balikas G, Amini M-R, Clausel M (2016) On a topic model for sentences. In: Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval, pp 921–924

  16. Schneider J, Vlachos M (2018) Topic modeling based on keywords and context. In: Proceedings of the 2018 SIAM international conference on data mining, pp 369–377

  17. Dong LY, Ji SJ, Zhang CJ, Zhang Q, Chiu DW, Qiu LQ, Li D (2018) An unsupervised topic-sentiment joint probabilistic model for detecting deceptive reviews. Expert Syst Appl 114:210–223

    Google Scholar 

  18. Jindal N, Liu B (2008) Opinion spam and analysis. In: Proceedings of the 2008 international conference on web search and data mining, pp 219–230

  19. Li FH, Huang M, Yang Y, Zhu X (2011) Learning to identify review spam. In: Twenty-second international joint conference on artificial intelligence, pp 488–2493

  20. Ott M, Choi Y, Cardie C, Hancock JT (2011) Finding deceptive opinion spam by any stretch of the imagination. In: Proceedings of the 49th annual meeting of the association for computational linguistics, pp 309–319

  21. Mukherjee A, Kumar A, Liu B, Wang J, Hsu M, Castellanos M, Ghosh R (2013) Spotting opinion spammers using behavioral footprints. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, pp 632–640

  22. Titov I, McDonald R (2008) Modeling online reviews with multi-grain topic models. In: Proceedings of the 17th international conference on World Wide Web, pp 111–120

  23. Griffiths TL, Jordan MI, Tenenbaum JB, Blei DM (2004) Hierarchical topic models and the nested Chinese restaurant process. In: Advances in neural information processing systems, pp 17–24

  24. Tang M, Jin J, Liu Y, Li CP, Zhang WW (2019) Integrating topic, sentiment, and syntax for modeling online reviews: a topic model approach. J Comput Inf Sci Eng 19(1):011001

    Google Scholar 

  25. Pu X, Wu G, Yuan C (2019) User-aware topic modeling of online reviews. Multimed Syst 25(1):59–69

    Google Scholar 

  26. Guo J, Chen X (2018) Bias-sentiment-topic model for microblog sentiment analysis. Concurr Comput Prac Exp 30(13):e4417

    Google Scholar 

  27. Pang B, Lee L, Vaithyanathan S (2002) Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on empirical methods in natural language processing, pp 79–86

  28. Steyvers M, Griffiths T (2007) Probabilistic topic models. In: Handbook of latent semantic analysis, vol 427, pp 424–440

  29. Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101(suppl 1):5228–5235

    Google Scholar 

  30. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: Synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    MATH  Google Scholar 

  31. Wallach HM, Mimno DM, McCallum A (2009) Rethinking LDA: Why priors matter. In: Advances in neural information processing systems, pp 1973–1981

  32. Yang Q, Rao Y, Xie H, Wang J, Wang FL, Chan WH, Cambria EC (2019) Segment-level joint topic-sentiment model for online review analysis. IEEE Intell Syst 34(1):43–50

    Google Scholar 

  33. Appel O, Chiclana F, Carter J, Fujita H (2018) Successes and challenges in developing a hybrid approach to sentiment analysis. Appl Intell 48(5):1176–1188

    Google Scholar 

  34. Wang Y, Wang M, Fujita H (2020) Word sense disambiguation: a comprehensive knowledge exploitation framework. Knowl-Based Syst 190:105030

    Google Scholar 

  35. Blair SJ, Bi Y, Mulvenna MD (2019) Aggregated topic models for increasing social media topic coherence. Appl Intell 50(1):138–156

    Google Scholar 

  36. Peng Y, Fang Y, Xie Z, Zhou G (2019) Topic-enhanced emotional conversation generation with attention mechanism. Knowl-Based Syst 163:429–437

    Google Scholar 

  37. Chen J, Gong Z, Liu W (2020) A Dirichlet process biterm-based mixture model for short text stream clustering. Appl Intell 50:1609–1619

    Google Scholar 

  38. Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 18(1):50–60

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (No. 61272194), by the Humanities and Social Sciences Research Project of the Ministry of Education of China (No. 18YJA740015), and by the Chongqing Key Laboratory of Software Theory and Technology, Chongqing, China.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Ping Han or Zhengyu Zhu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Du, X., Zhu, R., Zhao, F. et al. A deceptive detection model based on topic, sentiment, and sentence structure information. Appl Intell 50, 3868–3881 (2020). https://doi.org/10.1007/s10489-020-01779-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-020-01779-0

Keywords

Navigation