ABSTRACT
Opinion mining focuses on extracting customers' opinions from the reviews and predicting their sentiment orientation. Reviewers usually praise a product in some aspects and bemoan it in other aspects. With the business globalization, it is very important for enterprises to extract the opinions toward different aspects and find out cross-lingual/cross-culture difference in opinions. Cross-lingual opinion mining is a very challenging task as amounts of opinions are written in different languages, and not well structured. Since people usually use different words to describe the same aspect in the reviews, product-feature (PF) categorization becomes very critical in cross-lingual opinion mining. Manual cross-lingual PF categorization is time consuming, and practically infeasible for the massive amount of data written in different languages. In order to effectively find out cross-lingual difference in opinions, we present an aspect-oriented opinion mining method with Cross-lingual Latent Semantic Association (CLaSA). We first construct CLaSA model to learn the cross-lingual latent semantic association among all the PFs from multi-dimension semantic clues in the review corpus. Then we employ CLaSA model to categorize all the multilingual PFs into semantic aspects, and summarize cross-lingual difference in opinions towards different aspects. Experimental results show that our method achieves better performance compared with the existing approaches. With CLaSA model, our text mining system OpinionIt can effectively discover cross-lingual difference in opinions.
- M. Bautin, L. Vijayarenu, and S. Skiena. International sentiment analysis for news and blogs. In Proceedings of 23rd AAAI Conference on Artificial Intelligence (AAAI'08), pages 19--26, 2008.Google Scholar
- D. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3(5):993--1022, 2003. Google ScholarDigital Library
- S. Branavan, H. Chen, J. Eisenstein, and R. Barzilay. Learning document-level semantic properties from free-text annotations. In 46th Annual Meeting of the Association for Computational Linguisticsm (ACL'08), 2008.Google Scholar
- C. Cardie and K. Wagstaff. Noun phrase coreference as clustering. In Proceedings of the 1999 Conference on Empirical Methods in Natural Language Process (EMNLP'99), pages 82--89, 1999.Google Scholar
- C. Cesarano, A. Picariello, D. Reforgiato, and V. Subrahmanian. The oasys 2.0 opinion analysis system. In Proceedings of 2007 International AAAI Conference on Weblogs and Social Media (ICWSM'07), pages 313--314, 2007.Google Scholar
- K. W. Church and P. Hanks. Word association norms, mutual information and lexicography. Computational Linguistics, 16(1):22--29, 1990. Google ScholarDigital Library
- T. Fukuhara, T. Utsuro, and H. Nakagawa. Cross-lingual concern analysis from multilingual weblog articles. In Proceedings of 6th Inter. Workshop on Social Intelligence Design, pages 55--64, 2007.Google Scholar
- H. Guo, H. Zhu, Z. Guo, X. Zhang, and Z. Su. Product feature categorization with multi-level latent semantic association. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM'09), pages 1087--1096, 2009. Google ScholarDigital Library
- T. Hofmann. Probabilistic latent semantic indexing. In Proceedings of the 22th Annual International SIGIR Conference on Research and Development in Information Retrieval (SIGIR'99), 1999. Google ScholarDigital Library
- M. Hu and B. Liu. Mining and summarizing customer reviews. In Proceedings of 10th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD-2004), 2004. Google ScholarDigital Library
- M. Hu and B. Liu. Mining opinion features in customer reviews. In Proceedings of AAAI-2004, 2004. Google ScholarDigital Library
- W. Jin, H. H. Ho, and R. K. Srihari. Opinionminer: A novel machine learning system for web opinion mining and extraction. In Proceedings of KDD'09, 2009. Google ScholarDigital Library
- W. Li and A. McCallum. Pachinko allocation: dag-structured mixture models of topic correlations. In Proceedings of the 2006 IEEE International Conference on Data Mining (ICDM'06), 2006. Google ScholarDigital Library
- B. Liu, M. Hu, and J. Cheng. Opinion observer: analyzing and comparing opinions on the web. In Proceedings of WWW'05, pages 1024--1025, 2005. Google ScholarDigital Library
- Y. Lu and C. Zhai. Opinion integration through semi-supervised topic modeling. In Proceedings of WWW'08, pages 121--130, 2008. Google ScholarDigital Library
- J. B. MacQueen. Some methods for classification and analysis of multivariate observations. In Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, pages 281--297, 1967.Google Scholar
- Q. Mei, X. Ling, M. Wondra, H. Su, and C. Zhai. Topic sentiment mixture: Modeling facets and opinions in weblogs. In Proceedings of WWW'07, 2007. Google ScholarDigital Library
- P. Melville, W. Gryc, and R. D. Lawrence. Sentiment analysis of blogs by combining lexical knowledge with text classification. In Proceedings of KDD'09, 2009. Google ScholarDigital Library
- H. Nakasaki, M. Kawaba, T. Utsuro, and T. Fukuhara. Mining cross-lingual/cross-cultural differences in concerns and opinions in blogs. In LNAI 5459, pages 213--224, 2009. Google ScholarDigital Library
- B. Pang and L. Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of ACL'05, 2005. Google ScholarDigital Library
- W. M. Rand. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66(336):846--850, 1971.Google ScholarCross Ref
- Q. Su, X. Xu, H. Guo, Z. Guo, X. Wu, X. Zhang, B. Swen, and Z. Su. Hidden sentiment association in chinese web opinion mining. In Proceedings of the 17th international conference on World Wide Web (WWW'08), pages 959--968, 2008. Google ScholarDigital Library
- I. Titov and R. McDonald. A joint model of text and aspect ratings for sentiment summarization. In Proceedings of ACL'08, pages 308--316, 2008.Google Scholar
- I. Titov and R. McDonald. Modeling online reviews with multi-grain topic models. In Proceedings of WWW'08, 2008. Google ScholarDigital Library
- K. Wagstaff, C. Cardie, S. Rogers, and S. Schroedl. Constrained k-means clustering with background knowledge. In Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM'01), 2001.Google Scholar
- X. Wei and B. Croft. Lda-based document models for ad-hoc retrieval. In Proceedings of the 29th Annual International SIGIR Conference on Research and Development in Information Retrieval (SIGIR06), 2006. Google ScholarDigital Library
- T.-L. Wong, W. Lam, and T.-S. Wong. An unsupervised framework for extracting and normalizing product attributes from multiple web sites. In Proceedings of SIGIR'08, pages 35--41, 2008. Google ScholarDigital Library
- C. Zhai, A. Velivelli, and B. Yu. A cross-collection mixture model for comparative text mining. In Proceedings of 10th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-2004), pages 743--748, 2004. Google ScholarDigital Library
- L. Zhuang, F. Jing, and X. Zhu. Movie review mining and summarization. In Proceedings of the 15th ACM Conference on Information and Knowledge Management (CIKM'06), pages 43--50, 2006. Google ScholarDigital Library
Index Terms
- OpinionIt: a text mining system for cross-lingual opinion analysis
Recommendations
Product feature categorization with multilevel latent semantic association
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge managementIn recent years, the number of freely available online reviews is increasing at a high speed. Aspect-based opinion mining technique has been employed to find out reviewers' opinions toward different product aspects. Such finer-grained opinion mining is ...
Joint sentiment/topic model for sentiment analysis
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge managementSentiment analysis or opinion mining aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text. This paper proposes a novel probabilistic modeling framework based on Latent Dirichlet ...
Towards jointly extracting aspects and aspect-specific sentiment knowledge
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge managementIn this paper, we aim to jointly extract aspects and aspect-specific sentiment knowledge from online reviews, where the sentiment knowledge refers to the aspect-specific opinion words along with their aspect-aware sentiment polarities. To this end, we ...
Comments