Skip to main content
Log in

Cross-domain comparison of algorithm performance in extracting aspect-based opinions from Chinese online reviews

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Extracting aspects and opinions is the basis of sentiment analysis in fine-grained manner. It is often conducted in one of the following two ways: rule-based approaches and machine learning approaches. However, no conclusion has been drawn yet on the matter of multi-domains applicability in Chinese, so robustness and reliability across different fields are being of concern to these algorithms. We compare ten approaches of aspect-opinion extraction on Chinese corpora from seven domains. The compared methods include TF-based model plus POS, CRFs-based opinion mining, SVM-based opinion mining, MNB-based opinion mining, HMM-based opinion mining, RFM-based opinion mining, RNN-based opinion mining, KNN-based opinion mining, CART-based opinion mining and LPM-based opinion mining. We collect 3146 Chinese reviews as corpora including digital camera, cosmetics, book, hotel, movie, cellphone and restaurant. Experiments reveal the following results: (1) no algorithm dominates over all domains, (2) machine learning algorithms outperform rule-based approaches, (3) the length of text affects the accuracy of opinion mining negatively for rule-based approaches, while some machine learning methods are good at extracting long reviews, (4) for HMM-based model, RFM-based model, RNN-based model, KNN-based model, CART-based model and LPM-based model, the performances are similar in terms of precision and recall, (5) overall, SVM-based approach performs best among almost all the domains for opinion mining.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Abbasi A, Chen H, Salem A (2008) Sentiment analysis in multiple languages: feature selection for opinion classification in web forums. ACM Trans Inf Syst 26(3):1–34

    Article  Google Scholar 

  2. Abraham A et al (2011) International symposium on distributed computing and artificial intelligence, vol 91. Springer Science & Business Media, Berlin, p 349

    Book  Google Scholar 

  3. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. Proceedings of the 20th VLDB conference. 1215:487–499

  4. Andoni A, Indyk P (2006) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. 47th Annual IEEE symposium on foundations of computer science 459–468

  5. Armitage S, Dionysiou D, Gonzalez A (2014) Are the discounts in seasoned equity offers due to inelastic demand? J Bus Finance Acc 41(5–6):743–772

    Article  Google Scholar 

  6. Ashfaq RAR, Wang XZ, Huang JZ et al (2016) Fuzziness based semi-supervised learning approach for intrusion detection system. Inf Sci. doi:10.1016/j.ins.2016.04.019 (in press)

    Google Scholar 

  7. Bagheri A, Saraee M, De Jong F (2013) Care more about customers: unsupervised domain-independent aspect detection for sentiment analysis of customer reviews. Knowl Based Syst 52:201–213

    Article  Google Scholar 

  8. Chen L, Qi L, Wang F (2012) Comparison of feature-level learning methods for mining online consumer reviews. Expert Syst Appl 39(10):9588–9601

    Article  Google Scholar 

  9. Church KW, Hanks P (1990) Word association norms, mutual information, and lexicography. Comput Linguist 16(1):22–29

    Google Scholar 

  10. Daille B (1996) Study and implementation of combined techniques for automatic extraction of terminology. Balanc Act Comb Symb Stat Approach Lang 1:49–66

    Google Scholar 

  11. Ding X, Liu B, Yu P S (2008) A holistic lexicon-based approach to opinion mining. Proceedings of the international conference on web search and web data mining, pp 231–240

  12. Dioşan L, Rogozan A, Pecuchet JP (2010) Learning SVM with complex multiple kernels evolved by genetic programming. Int J Artif Intell Tools 19(5):647–677

    Article  Google Scholar 

  13. Eddy SR (1996) Hidden Markov models. Curr Opin Struct Biol 6(3):361–365

    Article  MathSciNet  Google Scholar 

  14. Elman JL (1990) Finding structure in time. Cognit Sci 14(2):179–211

    Article  Google Scholar 

  15. Guo JL, Peng JE, Wang HC (2013) An opinion feature extraction approach based on a multidimensional sentence analysis model. Cybern Syst 44(5):379–401

    Article  Google Scholar 

  16. Hai Z, Chang K, Kim JJ et al (2014) Identifying features in opinion mining via intrinsic and extrinsic domain relevance. IEEE Trans Knowl Data Eng 26(3):623–634

    Article  Google Scholar 

  17. He YL, Wang XZ, Huang JZ (2016) Fuzzy nonlinear regression analysis using a random weight network. Inf Sci 364:222–240

    Article  Google Scholar 

  18. He Y, Liu JNK, Hu Y et al (2015) OWA operator based link prediction ensemble for social network. Expert Syst Appl 42(1):21–50

    Article  Google Scholar 

  19. Hermans M, Schrauwen B (2013) Training and analyzing deep recurrent neural networks. Adv Neural Inf Process Syst, 190–198

  20. Hu M, Liu B (2004) Mining opinion features in customer reviews. Assoc Adv Artif Intell 4:755–760

    Google Scholar 

  21. Irsoy O, Cardie C (2014) Opinion mining with deep recurrent neural networks. Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 720–728

  22. Jacquemin C, Bourigault D (2003) Term extraction and automatic indexing. Oxford University Press, Oxford

    Google Scholar 

  23. Jin P, Wu YF, Yu SW (2008) Survey of word sense annotated corpora construction. J Chin Inf Process 22(3):16–23

    Google Scholar 

  24. Johansson R, Moschitti A (2013) Relational features in fine-grained opinion analysis. Comput Linguist 39(3):473–509

    Article  Google Scholar 

  25. Justeson JS, Katz SM (1995) Technical terminology: some linguistic properties and an algorithm for identification in text. Nat Lang Eng 1(1):9–27

    Article  Google Scholar 

  26. Khan K, Baharudin BB, Khan A (2014) Semantic-based unsupervised hybrid technique for opinion targets extraction from unstructured reviews. Arab J Sci Eng 39(5):3681–3689

    Article  Google Scholar 

  27. Khan K, Baharudin B, Khan A (2014) Identifying product features from customer reviews using hybrid patterns. Int Arab J Inf Technol 11(3):281–286

    Google Scholar 

  28. Kita S, Maekawa S, Ozawa S et al (2005) Boosting kernel discriminant analysis with adaptive kernel selection. Springer, Vienna

    Book  Google Scholar 

  29. Krishnan BC, Dutta S, Jha S (2013) Effectiveness of exaggerated advertised reference prices: the role of decision time pressure. J Retail 89(1):105–113

    Article  Google Scholar 

  30. Kudo T, Matsumoto Y (2003) Fast methods for kernel-based text analysis. Proceedings of the 41st annual meeting on association for computational linguistics, 2003: 24–31

  31. Lafferty J, McCallum A, Pereira FCN (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. Proceedings of the 18th international conference on machine learning, 2001, pp 282–289

  32. Lee HY, Renganathan H (2011) Chinese sentiment analysis using maximum entropy. Proceedings of the workshop on sentiment analysis where AI meets psychology (SAAIP), IJCNLP, 2011, pp 89–93

  33. Li F, Han C, Huang M et al (2010) Structure-aware review mining and summarization. Proceedings of the 23rd international conference on computational linguistics, 2010, pp 653–661

  34. Li W, Xu H (2014) Text-based emotion classification using emotion cause extraction. Expert Syst Appl 41(4):1742–1749

    Article  Google Scholar 

  35. Lima ACES, de Castro LN, Corchado JM (2015) A polarity analysis framework for twitter messages. Appl Math Comput 270:756–767

    Google Scholar 

  36. Liu B (2007) Web data mining. Springer, Berlin

    MATH  Google Scholar 

  37. Liu DC, Nocedal J (1989) On the limited memory BFGS method for large scale optimization. Math Program 45(1–3):503–528

    Article  MathSciNet  MATH  Google Scholar 

  38. Liu Q, Zhang HP, Zhang H (2004) Chinese POS tag set Version 3.0

  39. Loh WY (2011) Classification and regression trees. Wiley Interdiscip Rev Data Min Knowl Discov 1(1):14–23

    Article  Google Scholar 

  40. Lu Y, Kong X, Quan X et al (2010) Exploring the sentiment strength of user review. Web-age information management. Springer, Berlin, pp 471–482

  41. Miao Q, Li Q, Zeng D (2010) Mining fine grained opinions by using probabilistic models and domain knowledge. Web Intell Intell Agent Technol 1:358–365

    Google Scholar 

  42. Moraes R, Valiati JF, GaviãO Neto WP (2013) Document-level sentiment classification: an empirical comparison between SVM and ANN. Expert Syst Appl 40(2):621–633

    Article  Google Scholar 

  43. Pan SJ, Ni X, Sun JT et al (2010) Cross-domain sentiment classification via spectral feature alignment. Proceedings of the 19th international conference on World Wide Web, 2010, pp 751–760

  44. Pang B, Lee L, Vaithyanathan S (2002) Thumbs up?: sentiment classification using machine learning techniques. Proceedings of the ACL conference on Empirical methods in natural language processing, 2002, pp 79–86

  45. Pang B, Lee L (2004) A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. Proceedings of the 42nd annual meeting on association for computational linguistics, 2004, pp 271–278

  46. Peñalver-Martinez I, Garcia-Sanchez F, Valencia-Garcia R et al (2014) Feature-based opinion mining through ontologies. Expert Syst Appl 41(13):5995–6008

    Article  Google Scholar 

  47. Popescu AM, Popescu O (2007) Extracting product features and opinions from reviews. Natural language processing and text mining. Springer, London, 9–28

  48. Quan C, Ren F (2014) Unsupervised product feature extraction for feature-oriented opinion determination. Inf Sci 272:16–28

    Article  Google Scholar 

  49. Rausser GC, Simon LK, Zhao J (2015) Rational exaggeration and counter-exaggeration in information aggregation games. Econ Theor 59(1):109–146

    Article  MathSciNet  MATH  Google Scholar 

  50. Rong W, Peng B, Ouyang Y et al (2015) Structural information aware deep semi-supervised recurrent neural network for sentiment analysis. Front Comput Sci 9(2):171–184

    Article  MathSciNet  Google Scholar 

  51. Rossi RG, de Andrade Lopes A, de Paulo Faleiros T et al (2014) Inductive model generation for text classification using a bipartite heterogeneous network. J Comput Sci Technol 29(3):361–375

    Article  Google Scholar 

  52. Rossi RG, de Andrade Lopes A, Rezende SO (2015) Optimization and label propagation in bipartite heterogeneous networks to improve transductive classification of texts. Inf Process Manage 52(2):217–257

    Article  Google Scholar 

  53. Santorini B (1990) Part-of-speech tagging guidelines for the Penn Treebank Project (3rd revision)

  54. Schuller B, Mousa AED, Vryniotis V (2015) Sentiment analysis and opinion mining: on optimal parameters and performance. Wiley Interdiscip Rev Data Min Knowl Discov 5(5):255–263

    Article  Google Scholar 

  55. Sha F, Pereira F (2003) Shallow parsing with conditional random fields. Conference of the North American chapter of the association for computational linguistics on human language technology, 2003, pp 134–141

  56. Shi W, Wang H, He S (2013) Sentiment analysis of Chinese microblogging based on sentiment ontology: a case study of ‘7.23 Wenzhou Train Collision’. Conn Sci 25(4):161–178

    Article  Google Scholar 

  57. Tan PN, Steinbach M, Kumar V (2006) Introduction to data mining. Pearson Addison Wesley, Boston

    Google Scholar 

  58. Tang H, Tan S, Cheng X (2007) Research on sentiment classification of chinese reviews based on supervised machine learning techniques. J Chin Inf Process 21(6):88–108

    Google Scholar 

  59. Turney PD (2002) Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. Proceedings of the 40th annual meeting on association for computational linguistics, 2002, pp 417–424

  60. Wan X (2008) Using bilingual knowledge and ensemble techniques for unsupervised Chinese sentiment analysis. Proceedings of the conference on empirical methods in natural language processing. Association for computational linguistics, 2008, pp 553–561

  61. Wang F, He K, Liu Y et al (2013) Research on the selection of kernel function in SVM based facial expression recognition. IEEE conference on industrial electronics and applications, 2013, pp 1404–1408

  62. Wang G, Sun J, Ma J et al (2014) Sentiment classification: the contribution of ensemble learning. Decis Support Syst 57:77–93

    Article  Google Scholar 

  63. Wang WW, Hu XX, Yu HP (2015) Implementation and optimization of public opinion monitoring system based on deep learning and corresponding neural network. International conference on social science, management and economics (SSME), 2015, pp 542–546

  64. Wang XZ, Ashfaq RAR, Fu AM (2015) Fuzziness based sample categorization for classifier performance improvement. J Intell Fuzzy Syst 29(3):1185–1196

    Article  MathSciNet  Google Scholar 

  65. Wang X (2015) Learning from big data with uncertainty—editorial. J Intell Fuzzy Syst 28(5):2329–2330

    Article  MathSciNet  Google Scholar 

  66. Whitelaw C, Garg N, Argamon S (2005) Using appraisal groups for sentiment analysis. Proceedings of the 14th ACM international conference on information and knowledge management, 2005, pp 625–631

  67. Witschel HF (2005) Terminology extraction and automatic indexing—comparison and qualitative evaluation of methods. In Proc. of terminology and knowledge engineering, 2005, pp 1–12

  68. Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann

  69. Wu W, Xiao S (2013) Sentiment analysis of Chinese micro-blog based on multi-feature and combined classification. J Beijing Inf Sci Technol Univ 4:011

    Google Scholar 

  70. Xia R, Xu F, Zong CQ (2015) Dual sentiment analysis: considering two sides of one review. IEEE Trans Knowl Data Eng 27(8):2120–2133

    Article  Google Scholar 

  71. Xianghua F, Guo L, Yanyan G et al (2013) Multi-aspect sentiment analysis for Chinese online social reviews based on topic modeling and Hownet lexicon. Knowl Based Syst 37:186–195

    Article  Google Scholar 

  72. Yan G, He W, Shen J et al (2014) A bilingual approach for conducting Chinese and english social media sentiment analysis. Comput Netw 75:491–503

    Article  Google Scholar 

  73. Yang AM, Lin JH, Zhou YM (2013) Method on building chinese text sentiment lexicon. J Front Comput Sci Technol 7(11):1033–1039

    Google Scholar 

  74. Yao J, Wu G, Liu J, et al (2006) Using bilingual lexicon to judge sentiment orientation of chinese words. The sixth IEEE international conference on computer and information technology, 2006, pp 38–38

  75. Yin P, Wang H, Guo K (2013) Feature–opinion pair identification of product reviews in Chinese: a domain ontology modeling method. New Rev Hypermedia Multimed 19(1):3–24

    Article  Google Scholar 

  76. Bengio Y, Delalleau O, Le Roux N (2006) In Semi-supervised learning, 193–216

  77. Zhang HP, Yu HK, Xiong DY et al (2003) HHMM-based Chinese lexical analyzer ICTCLAS. Proceedings of the second SIGHAN workshop on Chinese language processing, 2003, pp 184–187

  78. Zhang L, Liu B, Lim SH et al (2010) Extracting and ranking product features in opinion documents. Proceedings of the 23rd international conference on computational linguistics, 2010, pp 1462–1470

  79. Zhang W, Xu H, Wan W (2012) Weakness finder: find product weakness from Chinese reviews by using aspects based sentiment analysis. Expert Syst Appl 39(11):10283–10291

    Article  Google Scholar 

  80. Zheng X, Lin Z, Wang X et al (2014) Incorporating appraisal expression patterns into topic modeling for aspect and sentiment word identification. Knowl Based Syst 61:29–47

    Article  Google Scholar 

  81. Zhu X, Ghahramani Z (2002) Learning from labeled and unlabeled data with label propagation. Technical Report CMU-CALD-02-107, Carnegie Mellon University

  82. Zhuang L, Jing F, Zhu XY (2006) Movie review mining and summarization. Proceedings of the 15th ACM international conference on information and knowledge management, 2006, pp 43–50

Download references

Acknowledgments

This work is partially supported by the NSFC Grant (71371144, 71601082, 71601119), Shanghai philosophy and social science planning projects (2013BGL004), Huaqiao University’s High Level Talent Research Start Project Funding (16SKBS102), and education department of young teachers project in Fujian province (JA14266).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongwei Wang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, W., Tan, G. & Wang, H. Cross-domain comparison of algorithm performance in extracting aspect-based opinions from Chinese online reviews. Int. J. Mach. Learn. & Cyber. 8, 1053–1070 (2017). https://doi.org/10.1007/s13042-016-0596-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-016-0596-x

Keywords

Navigation