Abstract
Extracting aspects and opinions is the basis of sentiment analysis in fine-grained manner. It is often conducted in one of the following two ways: rule-based approaches and machine learning approaches. However, no conclusion has been drawn yet on the matter of multi-domains applicability in Chinese, so robustness and reliability across different fields are being of concern to these algorithms. We compare ten approaches of aspect-opinion extraction on Chinese corpora from seven domains. The compared methods include TF-based model plus POS, CRFs-based opinion mining, SVM-based opinion mining, MNB-based opinion mining, HMM-based opinion mining, RFM-based opinion mining, RNN-based opinion mining, KNN-based opinion mining, CART-based opinion mining and LPM-based opinion mining. We collect 3146 Chinese reviews as corpora including digital camera, cosmetics, book, hotel, movie, cellphone and restaurant. Experiments reveal the following results: (1) no algorithm dominates over all domains, (2) machine learning algorithms outperform rule-based approaches, (3) the length of text affects the accuracy of opinion mining negatively for rule-based approaches, while some machine learning methods are good at extracting long reviews, (4) for HMM-based model, RFM-based model, RNN-based model, KNN-based model, CART-based model and LPM-based model, the performances are similar in terms of precision and recall, (5) overall, SVM-based approach performs best among almost all the domains for opinion mining.
Similar content being viewed by others
References
Abbasi A, Chen H, Salem A (2008) Sentiment analysis in multiple languages: feature selection for opinion classification in web forums. ACM Trans Inf Syst 26(3):1–34
Abraham A et al (2011) International symposium on distributed computing and artificial intelligence, vol 91. Springer Science & Business Media, Berlin, p 349
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. Proceedings of the 20th VLDB conference. 1215:487–499
Andoni A, Indyk P (2006) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. 47th Annual IEEE symposium on foundations of computer science 459–468
Armitage S, Dionysiou D, Gonzalez A (2014) Are the discounts in seasoned equity offers due to inelastic demand? J Bus Finance Acc 41(5–6):743–772
Ashfaq RAR, Wang XZ, Huang JZ et al (2016) Fuzziness based semi-supervised learning approach for intrusion detection system. Inf Sci. doi:10.1016/j.ins.2016.04.019 (in press)
Bagheri A, Saraee M, De Jong F (2013) Care more about customers: unsupervised domain-independent aspect detection for sentiment analysis of customer reviews. Knowl Based Syst 52:201–213
Chen L, Qi L, Wang F (2012) Comparison of feature-level learning methods for mining online consumer reviews. Expert Syst Appl 39(10):9588–9601
Church KW, Hanks P (1990) Word association norms, mutual information, and lexicography. Comput Linguist 16(1):22–29
Daille B (1996) Study and implementation of combined techniques for automatic extraction of terminology. Balanc Act Comb Symb Stat Approach Lang 1:49–66
Ding X, Liu B, Yu P S (2008) A holistic lexicon-based approach to opinion mining. Proceedings of the international conference on web search and web data mining, pp 231–240
Dioşan L, Rogozan A, Pecuchet JP (2010) Learning SVM with complex multiple kernels evolved by genetic programming. Int J Artif Intell Tools 19(5):647–677
Eddy SR (1996) Hidden Markov models. Curr Opin Struct Biol 6(3):361–365
Elman JL (1990) Finding structure in time. Cognit Sci 14(2):179–211
Guo JL, Peng JE, Wang HC (2013) An opinion feature extraction approach based on a multidimensional sentence analysis model. Cybern Syst 44(5):379–401
Hai Z, Chang K, Kim JJ et al (2014) Identifying features in opinion mining via intrinsic and extrinsic domain relevance. IEEE Trans Knowl Data Eng 26(3):623–634
He YL, Wang XZ, Huang JZ (2016) Fuzzy nonlinear regression analysis using a random weight network. Inf Sci 364:222–240
He Y, Liu JNK, Hu Y et al (2015) OWA operator based link prediction ensemble for social network. Expert Syst Appl 42(1):21–50
Hermans M, Schrauwen B (2013) Training and analyzing deep recurrent neural networks. Adv Neural Inf Process Syst, 190–198
Hu M, Liu B (2004) Mining opinion features in customer reviews. Assoc Adv Artif Intell 4:755–760
Irsoy O, Cardie C (2014) Opinion mining with deep recurrent neural networks. Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 720–728
Jacquemin C, Bourigault D (2003) Term extraction and automatic indexing. Oxford University Press, Oxford
Jin P, Wu YF, Yu SW (2008) Survey of word sense annotated corpora construction. J Chin Inf Process 22(3):16–23
Johansson R, Moschitti A (2013) Relational features in fine-grained opinion analysis. Comput Linguist 39(3):473–509
Justeson JS, Katz SM (1995) Technical terminology: some linguistic properties and an algorithm for identification in text. Nat Lang Eng 1(1):9–27
Khan K, Baharudin BB, Khan A (2014) Semantic-based unsupervised hybrid technique for opinion targets extraction from unstructured reviews. Arab J Sci Eng 39(5):3681–3689
Khan K, Baharudin B, Khan A (2014) Identifying product features from customer reviews using hybrid patterns. Int Arab J Inf Technol 11(3):281–286
Kita S, Maekawa S, Ozawa S et al (2005) Boosting kernel discriminant analysis with adaptive kernel selection. Springer, Vienna
Krishnan BC, Dutta S, Jha S (2013) Effectiveness of exaggerated advertised reference prices: the role of decision time pressure. J Retail 89(1):105–113
Kudo T, Matsumoto Y (2003) Fast methods for kernel-based text analysis. Proceedings of the 41st annual meeting on association for computational linguistics, 2003: 24–31
Lafferty J, McCallum A, Pereira FCN (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. Proceedings of the 18th international conference on machine learning, 2001, pp 282–289
Lee HY, Renganathan H (2011) Chinese sentiment analysis using maximum entropy. Proceedings of the workshop on sentiment analysis where AI meets psychology (SAAIP), IJCNLP, 2011, pp 89–93
Li F, Han C, Huang M et al (2010) Structure-aware review mining and summarization. Proceedings of the 23rd international conference on computational linguistics, 2010, pp 653–661
Li W, Xu H (2014) Text-based emotion classification using emotion cause extraction. Expert Syst Appl 41(4):1742–1749
Lima ACES, de Castro LN, Corchado JM (2015) A polarity analysis framework for twitter messages. Appl Math Comput 270:756–767
Liu B (2007) Web data mining. Springer, Berlin
Liu DC, Nocedal J (1989) On the limited memory BFGS method for large scale optimization. Math Program 45(1–3):503–528
Liu Q, Zhang HP, Zhang H (2004) Chinese POS tag set Version 3.0
Loh WY (2011) Classification and regression trees. Wiley Interdiscip Rev Data Min Knowl Discov 1(1):14–23
Lu Y, Kong X, Quan X et al (2010) Exploring the sentiment strength of user review. Web-age information management. Springer, Berlin, pp 471–482
Miao Q, Li Q, Zeng D (2010) Mining fine grained opinions by using probabilistic models and domain knowledge. Web Intell Intell Agent Technol 1:358–365
Moraes R, Valiati JF, GaviãO Neto WP (2013) Document-level sentiment classification: an empirical comparison between SVM and ANN. Expert Syst Appl 40(2):621–633
Pan SJ, Ni X, Sun JT et al (2010) Cross-domain sentiment classification via spectral feature alignment. Proceedings of the 19th international conference on World Wide Web, 2010, pp 751–760
Pang B, Lee L, Vaithyanathan S (2002) Thumbs up?: sentiment classification using machine learning techniques. Proceedings of the ACL conference on Empirical methods in natural language processing, 2002, pp 79–86
Pang B, Lee L (2004) A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. Proceedings of the 42nd annual meeting on association for computational linguistics, 2004, pp 271–278
Peñalver-Martinez I, Garcia-Sanchez F, Valencia-Garcia R et al (2014) Feature-based opinion mining through ontologies. Expert Syst Appl 41(13):5995–6008
Popescu AM, Popescu O (2007) Extracting product features and opinions from reviews. Natural language processing and text mining. Springer, London, 9–28
Quan C, Ren F (2014) Unsupervised product feature extraction for feature-oriented opinion determination. Inf Sci 272:16–28
Rausser GC, Simon LK, Zhao J (2015) Rational exaggeration and counter-exaggeration in information aggregation games. Econ Theor 59(1):109–146
Rong W, Peng B, Ouyang Y et al (2015) Structural information aware deep semi-supervised recurrent neural network for sentiment analysis. Front Comput Sci 9(2):171–184
Rossi RG, de Andrade Lopes A, de Paulo Faleiros T et al (2014) Inductive model generation for text classification using a bipartite heterogeneous network. J Comput Sci Technol 29(3):361–375
Rossi RG, de Andrade Lopes A, Rezende SO (2015) Optimization and label propagation in bipartite heterogeneous networks to improve transductive classification of texts. Inf Process Manage 52(2):217–257
Santorini B (1990) Part-of-speech tagging guidelines for the Penn Treebank Project (3rd revision)
Schuller B, Mousa AED, Vryniotis V (2015) Sentiment analysis and opinion mining: on optimal parameters and performance. Wiley Interdiscip Rev Data Min Knowl Discov 5(5):255–263
Sha F, Pereira F (2003) Shallow parsing with conditional random fields. Conference of the North American chapter of the association for computational linguistics on human language technology, 2003, pp 134–141
Shi W, Wang H, He S (2013) Sentiment analysis of Chinese microblogging based on sentiment ontology: a case study of ‘7.23 Wenzhou Train Collision’. Conn Sci 25(4):161–178
Tan PN, Steinbach M, Kumar V (2006) Introduction to data mining. Pearson Addison Wesley, Boston
Tang H, Tan S, Cheng X (2007) Research on sentiment classification of chinese reviews based on supervised machine learning techniques. J Chin Inf Process 21(6):88–108
Turney PD (2002) Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. Proceedings of the 40th annual meeting on association for computational linguistics, 2002, pp 417–424
Wan X (2008) Using bilingual knowledge and ensemble techniques for unsupervised Chinese sentiment analysis. Proceedings of the conference on empirical methods in natural language processing. Association for computational linguistics, 2008, pp 553–561
Wang F, He K, Liu Y et al (2013) Research on the selection of kernel function in SVM based facial expression recognition. IEEE conference on industrial electronics and applications, 2013, pp 1404–1408
Wang G, Sun J, Ma J et al (2014) Sentiment classification: the contribution of ensemble learning. Decis Support Syst 57:77–93
Wang WW, Hu XX, Yu HP (2015) Implementation and optimization of public opinion monitoring system based on deep learning and corresponding neural network. International conference on social science, management and economics (SSME), 2015, pp 542–546
Wang XZ, Ashfaq RAR, Fu AM (2015) Fuzziness based sample categorization for classifier performance improvement. J Intell Fuzzy Syst 29(3):1185–1196
Wang X (2015) Learning from big data with uncertainty—editorial. J Intell Fuzzy Syst 28(5):2329–2330
Whitelaw C, Garg N, Argamon S (2005) Using appraisal groups for sentiment analysis. Proceedings of the 14th ACM international conference on information and knowledge management, 2005, pp 625–631
Witschel HF (2005) Terminology extraction and automatic indexing—comparison and qualitative evaluation of methods. In Proc. of terminology and knowledge engineering, 2005, pp 1–12
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann
Wu W, Xiao S (2013) Sentiment analysis of Chinese micro-blog based on multi-feature and combined classification. J Beijing Inf Sci Technol Univ 4:011
Xia R, Xu F, Zong CQ (2015) Dual sentiment analysis: considering two sides of one review. IEEE Trans Knowl Data Eng 27(8):2120–2133
Xianghua F, Guo L, Yanyan G et al (2013) Multi-aspect sentiment analysis for Chinese online social reviews based on topic modeling and Hownet lexicon. Knowl Based Syst 37:186–195
Yan G, He W, Shen J et al (2014) A bilingual approach for conducting Chinese and english social media sentiment analysis. Comput Netw 75:491–503
Yang AM, Lin JH, Zhou YM (2013) Method on building chinese text sentiment lexicon. J Front Comput Sci Technol 7(11):1033–1039
Yao J, Wu G, Liu J, et al (2006) Using bilingual lexicon to judge sentiment orientation of chinese words. The sixth IEEE international conference on computer and information technology, 2006, pp 38–38
Yin P, Wang H, Guo K (2013) Feature–opinion pair identification of product reviews in Chinese: a domain ontology modeling method. New Rev Hypermedia Multimed 19(1):3–24
Bengio Y, Delalleau O, Le Roux N (2006) In Semi-supervised learning, 193–216
Zhang HP, Yu HK, Xiong DY et al (2003) HHMM-based Chinese lexical analyzer ICTCLAS. Proceedings of the second SIGHAN workshop on Chinese language processing, 2003, pp 184–187
Zhang L, Liu B, Lim SH et al (2010) Extracting and ranking product features in opinion documents. Proceedings of the 23rd international conference on computational linguistics, 2010, pp 1462–1470
Zhang W, Xu H, Wan W (2012) Weakness finder: find product weakness from Chinese reviews by using aspects based sentiment analysis. Expert Syst Appl 39(11):10283–10291
Zheng X, Lin Z, Wang X et al (2014) Incorporating appraisal expression patterns into topic modeling for aspect and sentiment word identification. Knowl Based Syst 61:29–47
Zhu X, Ghahramani Z (2002) Learning from labeled and unlabeled data with label propagation. Technical Report CMU-CALD-02-107, Carnegie Mellon University
Zhuang L, Jing F, Zhu XY (2006) Movie review mining and summarization. Proceedings of the 15th ACM international conference on information and knowledge management, 2006, pp 43–50
Acknowledgments
This work is partially supported by the NSFC Grant (71371144, 71601082, 71601119), Shanghai philosophy and social science planning projects (2013BGL004), Huaqiao University’s High Level Talent Research Start Project Funding (16SKBS102), and education department of young teachers project in Fujian province (JA14266).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, W., Tan, G. & Wang, H. Cross-domain comparison of algorithm performance in extracting aspect-based opinions from Chinese online reviews. Int. J. Mach. Learn. & Cyber. 8, 1053–1070 (2017). https://doi.org/10.1007/s13042-016-0596-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-016-0596-x