Skip to main content
Log in

Hybrid attribute based sentiment classification of online reviews for consumer intelligence

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Rich online consumer reviews (OCR) can be mined to gain valuable insights, beneficial for both brands and future buyers. Recently, aspect based sentiment classification have shown excellent results for fine grained sentiment analysis of OCR. However, there are only few studies so far that rely on both explicitly deriving sentiment using syntactic features, and capturing implicit contextual word relations for the task of aspect based sentiment classification. In this paper, we propose a novel method: Hybrid Attribute Based Sentiment Classification (HABSC) with the aim to derive sentiment orientation of OCR by capturing implicit word relations and incorporating domain specific knowledge. First, we detect the most frequent bigrams and trigrams in the corpus, followed by POS tagging to retain aspect descriptions and opinion words. Then, we employ TFIDF (term frequency inverse document frequency) to represent each document, followed by automatically extracting optimal number of topics in the given corpus. All the adjectives and adverbs are labelled using domain specific knowledge and pre-existing lexicons. Lastly, we find sentiment orientation of each review under the assumption that each review is a mixture of weighted and sentiment labelled attributes. We test the efficiency of our method using datasets from two different domains: hotel reviews from TripAdvisor.com and mobile phone reviews from Amazon.com. Results show that, the classification accuracy of HABSC significantly exceeds various state-of-the-art methods including aspect-based sentiment classification and supervised classification using distributed word and paragraph vectors. Our method also exhibits less computational time as compared to distributed vectorization schemes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Abburi H, Akkireddy ESA, Gangashetti S, Mamidi R (2016) Multimodal sentiment analysis of telugu songs. In: SAAIP@ IJCAI, pp 48–52

  2. Abdelwahab O, Elmaghraby A (2016) Uofl at semeval-2016 task 4: multi domain word2vec for twitter sentiment classification. In: Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016), pp 164–170

  3. Al-Amin M, Islam MS, Uzzal SD (2017) Sentiment analysis of bengali comments with word2vec and sentiment information of words. In: International conference on electrical, computer and communication engineering (ECCE). IEEE, pp 186-190

  4. Alam MH, Ryu WJ, Lee S (2016) Joint multi-grain topic sentiment: modeling semantic aspects for online reviews. Inform Sci 339:206–223

    Article  Google Scholar 

  5. Appel O, Chiclana F, Carter J, Fujita H (2018) Successes and challenges in developing a hybrid approach to sentiment analysis. Appl Intell 48(5):1176–1188

    Google Scholar 

  6. Bansal B, Srivastava S (2018) Sentiment classification of online consumer reviews using word vector representations. Procedia Computer Science 132:1147–1153

    Article  Google Scholar 

  7. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993–1022

    MATH  Google Scholar 

  8. Cerón-Guzmán JA, León-Guzmán E (2016) A sentiment analysis system of spanish tweets and its application in Colombia 2014 presidential election. In: 2016 IEEE international conferences on big data and cloud computing (BDCloud), social computing and networking (socialcom), sustainable computing and communications (sustaincom)(BDCloud-socialcom-sustaincom), pp 250–257

  9. Chen R, Xu W (2017) The determinants of online customer ratings: a combined domain ontology and topic text analytics approach. Electron Commer Res 17(1):31–50

    Article  Google Scholar 

  10. Chen R, Zheng Y, Xu W, Liu M, Wang J (2018) Secondhand seller reputation in online markets: a text analytics framework. Decis Support Syst 108:96–106

    Article  Google Scholar 

  11. Dataset (2016) Amazon mobile review dataset. https://www.kaggle.com/PromptCloudHQ/amazon-reviews-unlocked-mobile-phones/data. Online; Accessed Nov 2017

  12. Giatsoglou M, Vozalis MG, Diamantaras K, Vakali A, Sarigiannidis G, Chatzisavvas KC (2017) Sentiment analysis leveraging emotions and word embeddings. Expert Syst Appl 69:214–224

    Article  Google Scholar 

  13. Hogenboom A, Heerschop B, Frasincar F, Kaymak U, de Jong F (2014) Multi-lingual support for lexicon-based sentiment analysis guided by semantics. Decis Support Syst 62:43–53

    Article  Google Scholar 

  14. Hu M, Liu B (2004) Mining opinion features in customer reviews. In: AAAI, vol 4. pp 755-760

  15. Jiang S, Lewris J, Voltmer M, Wang H (2016) Integrating rich document representations for text classification. In: 2016 IEEE systems and information engineering design symposium (SIEDS). IEEE, 303-308

  16. Jiang Y, Song X, Harrison J, Quegan S, Maynard D (2017) Comparing attitudes to climate change in the media using sentiment analysis based on latent dirichlet allocation. In: Proceedings of the 2017 EMNLP workshop natural language processing meets journalism, pp 25–30

  17. Jo Y, Oh AH (2011) Aspect and sentiment unification model for online review analysis. In: Proceedings of the fourth ACM international conference on Web search and data mining. ACM, pp 815–824

  18. Karami A, Gangopadhyay A, Zhou B, Kharrazi H (2017) Fuzzy approach topic discovery in health and medical corpora. Int J Fuzzy Syst 20(4):1334–1345

    Article  Google Scholar 

  19. Karami A, Dahl AA, Turner-McGrievy G, Kharrazi H, Shaw G (2018) Characterizing diabetes, diet, exercise, and obesity comments on twitter. Int J Inf Manage 38(1):1–6

    Article  Google Scholar 

  20. Kim HK, Kim M (2016) Model-induced term-weighting schemes for text classification. Appl Intell 45 (1):30–43

    Article  Google Scholar 

  21. Koltcov S, Koltsova O, Nikolenko S (2014) Latent dirichlet allocation: stability and applications to studies of user-generated content. In: Proceedings of the 2014 ACM conference on web science. ACM, pp 161-165

  22. Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning, pp 1188–1196

  23. Li G, Liu F (2014) Sentiment analysis based on clustering: a framework in improving accuracy and recognizing neutral opinions. Appl Intell 40(3):441–452

    Article  MathSciNet  Google Scholar 

  24. Lin C, He Y (2009) Joint sentiment/topic model for sentiment analysis. In: Proceedings of the 18th ACM conference on Information and knowledge management. ACM, pp 375–384

  25. Liu B (2012) Sentiment analysis and opinion mining. Synthesis Lectures On Human Language Technologies 5(1):1–167

    Article  Google Scholar 

  26. Liu B, Hu M, Cheng J (2005) Opinion observer: analyzing and comparing opinions on the web. In: Proceedings of the 14th international conference on world wide web. ACM, pp 342–351

  27. Liu H (2017) Sentiment analysis of citations using word2vec. arXiv:170400177

  28. Liu X, Burns AC, Hou Y (2017) An investigation of brand-related user-generated content on twitter. J Advert 46(2):236–247

    Article  Google Scholar 

  29. Liu Y, Jiang C, Zhao H (2018) Using contextual features and multi-view ensemble learning in product defect identification from online discussion forums. Decis Support Syst 105:1–12

    Article  Google Scholar 

  30. Ma S, Zhang C, He D (2016) Document representation methods for clustering bilingual documents. Proceedings of the Association for Information Science and Technology 53(1):1–10

    Google Scholar 

  31. Mei Q, Ling X, Wondra M, Su H, Zhai C (2007) Topic sentiment mixture: modeling facets and opinions in weblogs. In: Proceedings of the 16th international conference on world wide web. ACM, pp 171–180

  32. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:13013781

  33. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119

  34. Mikolov T, Wt Yih, Zweig G (2013) Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 conference of the north american chapter of the association for computational linguistics: human language technologies, pp 746-751

  35. Nielsen FÅ (2011) A new anew: evaluation of a word list for sentiment analysis in microblogs. arXiv:11032903

  36. Panichella A, Dit B, Oliveto R, Di Penta M, Poshynanyk D, De Lucia A (2013) How to effectively use topic models for software engineering tasks? an approach based on genetic algorithms. In: 2013 35th international conference on software engineering (ICSE). IEEE, pp 522–531

  37. Pham DH, Le AC (2017) Learning multiple layers of knowledge representation for aspect based sentiment analysis. Data Knowl Eng

  38. Qiang J, Li Y, Yuan Y, Liu W (2018) Snapshot ensembles of non-negative matrix factorization for stability of topic modeling. Appl Intell:1–13

  39. Qiao Z, Zhang X, Zhou M, Wang GA, Fan W (2017) A domain oriented lda model for mining product defects from online customer reviews

  40. Rehurek R. Gensim. https://radimrehurek.com/gensim/models/phrases.html. Last accessed Nov 2017

  41. Rehurek R, Sojka P (2010) Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 workshop on new challenges for NLP frameworks. Citeseer

  42. Salehan M, Kim DJ (2016) Predicting the performance of online consumer reviews: a sentiment mining approach to big data analytics. Decis Support Syst 81:30–40

    Article  Google Scholar 

  43. Sanguansat P (2016) Paragraph2vec-based sentiment analysis on social media for business in thailand. In: 2016 8th international conference on knowledge and smart technology (KST). IEEE, pp 175–178

  44. Schwenk H (2007) Continuous space language models. Comput Speech Lang 21(3):492–518

    Article  Google Scholar 

  45. Spacy https://spacy.io. Last accessed Nov 2017

  46. Tripathy A, Agrawal A, Rath SK (2016) Classification of sentiment reviews using n-gram machine learning approach. Expert Syst Appl 57:117–126

    Article  Google Scholar 

  47. Wang T, Cai Y, Hf Leung, Lau RY, Li Q, Min H (2014) Product aspect extraction supervised with online domain knowledge. Knowl-Based Syst 71:86–100

    Article  Google Scholar 

  48. Wang W, Wang H, Song Y (2017) Ranking product aspects through sentiment analysis of online reviews. J Exp Theor Artif Intell 29(2):227–246

    Article  Google Scholar 

  49. Wang Z, Gu S, Xu X (2018) Gslda: lda-based group spamming detection in product reviews. Appl Intell 48(9):3094–3107

    Article  Google Scholar 

  50. Xianghua F, Guo L, Yanyan G, Zhiqiang W (2013) Multi-aspect sentiment analysis for chinese online social reviews based on topic modeling and hownet lexicon. Knowl-Based Syst 37:186–195

    Article  Google Scholar 

  51. Xin Y, Yang J, Xie ZQ, Zhang JP (2015) An overlapping semantic community detection algorithm base on the arts multiple sampling models. Expert Syst Appl 42(7):3420–3432

    Article  Google Scholar 

  52. Yan X, Guo J, Lan Y, Cheng X (2013) A biterm topic model for short texts. In: Proceedings of the 22nd international conference on world wide web. ACM, pp 1445–1456

  53. Yao Y, Li X, Liu X, Liu P, Liang Z, Zhang J, Mai K (2017) Sensing spatial distribution of urban land use by integrating points-of-interest and google word2vec model. Int J Geogr Inf Sci 31(4):825–848

    Article  Google Scholar 

  54. Yu D, Mu Y, Jin Y (2017) Rating prediction using review texts with underlying sentiments. Inf Process Lett 117:10–18

    Article  MathSciNet  Google Scholar 

  55. Zainuddin N, Selamat A, Ibrahim R (2017) Hybrid sentiment classification on twitter aspect-based sentiment analysis. Appl Intell 48(5):1–15

    Google Scholar 

  56. Zhang D, Xu H, Su Z, Xu Y (2015) Chinese comments sentiment classification based on word2vec and svmperf. Expert Syst Appl 42(4):1857–1863

    Article  Google Scholar 

  57. Zirn C, Stuckenschmidt H (2014) Multidimensional topic analysis in political texts. Data Knowl Eng 90:38–53

    Article  Google Scholar 

Download references

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sangeet Srivastava.

Additional information

Availability of data and materials

The datasets analysed during the current study, are available in the Zenodo repository: Amazon Mobile Review Dataset [https://doi.org/10.5281/zenodo.1211639] and TripAdvisor Hotel Review Dataset [https://doi.org/10.5281/zenodo.1219899]. The datasets were originally derived from [11] and [4] respectively.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bansal, B., Srivastava, S. Hybrid attribute based sentiment classification of online reviews for consumer intelligence. Appl Intell 49, 137–149 (2019). https://doi.org/10.1007/s10489-018-1299-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-018-1299-7

Keywords

Navigation