Abstract
Mining features and opinion words is essential for fine-grained opinion analysis of customer reviews. It is observed that semantic dependencies naturally exist between features and opinion words, even among features or opinion words themselves. In this article, we employ a corpus statistics association measure to quantify the pairwise word dependencies and propose a generalized association-based unified framework to identify features, including explicit and implicit features, and opinion words from reviews. We first extract explicit features and opinion words via an association-based bootstrapping method (ABOOT). ABOOT starts with a small list of annotated feature seeds and then iteratively recognizes a large number of domain-specific features and opinion words by discovering the corpus statistics association between each pair of words on a given review domain. Two instances of this ABOOT method are evaluated based on two particular association models, likelihood ratio tests (LRTs) and latent semantic analysis (LSA). Next, we introduce a natural extension to identify implicit features by employing the recognized known semantic correlations between features and opinion words. Experimental results illustrate the benefits of the proposed association-based methods for identifying features and opinion words versus benchmark methods.
- Apoorv Agarwal, Fadi Biadsy, and Kathleen R. Mckeown. 2009. Contextual phrase-level polarity analysis using lexical affect scoring and syntactic N-grams. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL’09). 24--32. Google ScholarDigital Library
- David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3 (March 2003), 993--1022. Google ScholarDigital Library
- Wanxiang Che, Zhenghua Li, and Ting Liu. 2010. LTP: A Chinese language technology platform. In Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations (COLING’10). 13--16. Google ScholarDigital Library
- Kenneth Ward Church and Patrick Hanks. 1990. Word association norms, mutual information, and lexicography. Computational Linguistics 16, 1 (1990), 22--29. Google ScholarDigital Library
- Scott Deerwester, Susan T. Dumais, and Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 6 (1990), 391--407.Google ScholarCross Ref
- Ted Dunning. 1993. Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19, 1 (March 1993), 61--74. Google ScholarDigital Library
- Gene H. Golub and Charles F. Van Loan. 1996. Matrix Computations. The Johns Hopkins University Press (1996).Google Scholar
- Zhen Hai, Kuiyu Chang, and Jung-jae Kim. 2011. Implicit feature identification via co-occurrence association rule mining. In Proceedings of the 12th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing’11). 393--404. Google ScholarDigital Library
- Minqing Hu and Bing Liu. 2004. Mining and summarizing customer reviews. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’04). 168--177. Google ScholarDigital Library
- Wei Jin and Hung Hay Ho. 2009. A novel lexicalized HMM-based learning framework for web opinion mining. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML’09). 465--472. Google ScholarDigital Library
- Yohan Jo and Alice H. Oh. 2011. Aspect and sentiment unification model for online review analysis. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining (WSDM’11). 815--824. Google ScholarDigital Library
- Dan Klein and Christopher D. Manning. 2003. Accurate unlexicalized parsing. In Proceedings of the 41st Meeting of the Association for Computational Linguistics (ACL’03). 423--430. Google ScholarDigital Library
- Fangtao Li, Chao Han, Minlie Huang, Xiaoyan Zhu, Ying-Ju Xia, Shu Zhang, and Hao Yu. 2010. Structure-aware review mining and summarization. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING’10). 653--661. Google ScholarDigital Library
- Chenghua Lin and Yulan He. 2009. Joint sentiment/topic model for sentiment analysis. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM’09). 375--384. Google ScholarDigital Library
- Bing Liu. 2010. Sentiment analysis and subjectivity. Handbook of Natural Language Processing 2 (2010), 627--666.Google Scholar
- Bing Liu. 2012. Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies 5, 1 (2012), 1--167.Google ScholarCross Ref
- Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. 2011. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL’11). 142--150. Google ScholarDigital Library
- Georgios Paltoglou and Mike Thelwall. 2012. Twitter, MySpace, Digg: Unsupervised sentiment analysis in social media. ACM Transactions on Intelligent Systems and Technology 3, 4 (2012), 66:1--66:19. Google ScholarDigital Library
- Sinno Jialin Pan, Xiaochuan Ni, Jian-Tao Sun, Qiang Yang, and Zheng Chen. 2010. Cross-domain sentiment classification via spectral feature alignment. In Proceedings of the 19th International Conference on World Wide Web (WWW’10). 751--760. Google ScholarDigital Library
- Bo Pang and Lillian Lee. 2004. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics (ACL’04). Article 271. Google ScholarDigital Library
- Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up?: Sentiment classification using machine learning techniques. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’02). 79--86. Google ScholarDigital Library
- Ana-Maria Popescu and Oren Etzioni. 2005. Extracting product features and opinions from reviews. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT&EMNLP’’05). 339--346. Google ScholarDigital Library
- Guang Qiu, Bing Liu, Jiajun Bu, and Chun Chen. 2009. Expanding domain sentiment lexicon through double propagation. In Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI’09). 1199--1204. Google ScholarDigital Library
- Guang Qiu, Bing Liu, Jiajun Bu, and Chun Chen. 2011. Opinion word expansion and target extraction through double propagation. Computational linguistics 37, 1 (March 2011), 9--27. Google ScholarDigital Library
- Qi Su, Xinying Xu, Honglei Guo, Zhili Guo, Xian Wu, Xiaoxun Zhang, Bin Swen, and Zhong Su. 2008. Hidden sentiment association in Chinese web opinion mining. In Proceedings of the 17th International Conference on World Wide Web (WWW’08). 959--968. Google ScholarDigital Library
- Ivan Titov and Ryan McDonald. 2008. Modeling online reviews with multi-grain topic models. In Proceedings of the 17th International Conference on World Wide Web (WWW’08). 111--120. Google ScholarDigital Library
- Peter D. Turney. 2002. Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL’02). 417--424. Google ScholarDigital Library
- Anthony J. Viera and Joanne M. Garrett. 2005. Understanding interobserver agreement: The Kappa statistic. Family Medicine 37, 5 (2005), 360--363.Google Scholar
- Theresa Wilson, Janyce Wiebe, and Paul Hoffmann. 2005. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT&EMNLP’’05). 347--354. Google ScholarDigital Library
- Yuanbin Wu, Qi Zhang, Xuanjing Huang, and Lide Wu. 2009. Phrase dependency parsing for opinion mining. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’09). 1533--1541. Google ScholarDigital Library
- Lei Zhang, Bing Liu, Suk Hwan Lim, and Eamonn O’Brien-Strain. 2010. Extracting and ranking product features in opinion documents. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING’10). 1462--1470. Google ScholarDigital Library
Index Terms
- An Association-Based Unified Framework for Mining Features and Opinion Words
Recommendations
One seed to find them all: mining opinion features via association
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge managementFeature-based opinion analysis has attracted extensive attention recently. Identifying features associated with opinions expressed in reviews is essential for fine-grained opinion mining. One approach is to exploit the dependency relations that occur ...
Hidden sentiment association in chinese web opinion mining
WWW '08: Proceedings of the 17th international conference on World Wide WebThe boom of product review websites, blogs and forums on the web has attracted many research efforts on opinion mining. Recently, there was a growing interest in the finer-grained opinion mining, which detects opinions on different review features as ...
Extracting implicit features in online customer reviews for opinion mining
WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide WebAs the number of customer reviews grows very rapidly, it is essential to summarize useful opinions for buyers, sellers and producers. One key step of opinion mining is feature extraction. Most existing research focus on finding explicit features, only a ...
Comments