skip to main content
10.1145/1871437.1871589acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

OpinionIt: a text mining system for cross-lingual opinion analysis

Published:26 October 2010Publication History

ABSTRACT

Opinion mining focuses on extracting customers' opinions from the reviews and predicting their sentiment orientation. Reviewers usually praise a product in some aspects and bemoan it in other aspects. With the business globalization, it is very important for enterprises to extract the opinions toward different aspects and find out cross-lingual/cross-culture difference in opinions. Cross-lingual opinion mining is a very challenging task as amounts of opinions are written in different languages, and not well structured. Since people usually use different words to describe the same aspect in the reviews, product-feature (PF) categorization becomes very critical in cross-lingual opinion mining. Manual cross-lingual PF categorization is time consuming, and practically infeasible for the massive amount of data written in different languages. In order to effectively find out cross-lingual difference in opinions, we present an aspect-oriented opinion mining method with Cross-lingual Latent Semantic Association (CLaSA). We first construct CLaSA model to learn the cross-lingual latent semantic association among all the PFs from multi-dimension semantic clues in the review corpus. Then we employ CLaSA model to categorize all the multilingual PFs into semantic aspects, and summarize cross-lingual difference in opinions towards different aspects. Experimental results show that our method achieves better performance compared with the existing approaches. With CLaSA model, our text mining system OpinionIt can effectively discover cross-lingual difference in opinions.

References

  1. M. Bautin, L. Vijayarenu, and S. Skiena. International sentiment analysis for news and blogs. In Proceedings of 23rd AAAI Conference on Artificial Intelligence (AAAI'08), pages 19--26, 2008.Google ScholarGoogle Scholar
  2. D. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3(5):993--1022, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Branavan, H. Chen, J. Eisenstein, and R. Barzilay. Learning document-level semantic properties from free-text annotations. In 46th Annual Meeting of the Association for Computational Linguisticsm (ACL'08), 2008.Google ScholarGoogle Scholar
  4. C. Cardie and K. Wagstaff. Noun phrase coreference as clustering. In Proceedings of the 1999 Conference on Empirical Methods in Natural Language Process (EMNLP'99), pages 82--89, 1999.Google ScholarGoogle Scholar
  5. C. Cesarano, A. Picariello, D. Reforgiato, and V. Subrahmanian. The oasys 2.0 opinion analysis system. In Proceedings of 2007 International AAAI Conference on Weblogs and Social Media (ICWSM'07), pages 313--314, 2007.Google ScholarGoogle Scholar
  6. K. W. Church and P. Hanks. Word association norms, mutual information and lexicography. Computational Linguistics, 16(1):22--29, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. Fukuhara, T. Utsuro, and H. Nakagawa. Cross-lingual concern analysis from multilingual weblog articles. In Proceedings of 6th Inter. Workshop on Social Intelligence Design, pages 55--64, 2007.Google ScholarGoogle Scholar
  8. H. Guo, H. Zhu, Z. Guo, X. Zhang, and Z. Su. Product feature categorization with multi-level latent semantic association. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM'09), pages 1087--1096, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. Hofmann. Probabilistic latent semantic indexing. In Proceedings of the 22th Annual International SIGIR Conference on Research and Development in Information Retrieval (SIGIR'99), 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Hu and B. Liu. Mining and summarizing customer reviews. In Proceedings of 10th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD-2004), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Hu and B. Liu. Mining opinion features in customer reviews. In Proceedings of AAAI-2004, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. W. Jin, H. H. Ho, and R. K. Srihari. Opinionminer: A novel machine learning system for web opinion mining and extraction. In Proceedings of KDD'09, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. W. Li and A. McCallum. Pachinko allocation: dag-structured mixture models of topic correlations. In Proceedings of the 2006 IEEE International Conference on Data Mining (ICDM'06), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. B. Liu, M. Hu, and J. Cheng. Opinion observer: analyzing and comparing opinions on the web. In Proceedings of WWW'05, pages 1024--1025, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Y. Lu and C. Zhai. Opinion integration through semi-supervised topic modeling. In Proceedings of WWW'08, pages 121--130, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. B. MacQueen. Some methods for classification and analysis of multivariate observations. In Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, pages 281--297, 1967.Google ScholarGoogle Scholar
  17. Q. Mei, X. Ling, M. Wondra, H. Su, and C. Zhai. Topic sentiment mixture: Modeling facets and opinions in weblogs. In Proceedings of WWW'07, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. Melville, W. Gryc, and R. D. Lawrence. Sentiment analysis of blogs by combining lexical knowledge with text classification. In Proceedings of KDD'09, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. H. Nakasaki, M. Kawaba, T. Utsuro, and T. Fukuhara. Mining cross-lingual/cross-cultural differences in concerns and opinions in blogs. In LNAI 5459, pages 213--224, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. B. Pang and L. Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of ACL'05, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. W. M. Rand. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66(336):846--850, 1971.Google ScholarGoogle ScholarCross RefCross Ref
  22. Q. Su, X. Xu, H. Guo, Z. Guo, X. Wu, X. Zhang, B. Swen, and Z. Su. Hidden sentiment association in chinese web opinion mining. In Proceedings of the 17th international conference on World Wide Web (WWW'08), pages 959--968, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. I. Titov and R. McDonald. A joint model of text and aspect ratings for sentiment summarization. In Proceedings of ACL'08, pages 308--316, 2008.Google ScholarGoogle Scholar
  24. I. Titov and R. McDonald. Modeling online reviews with multi-grain topic models. In Proceedings of WWW'08, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. K. Wagstaff, C. Cardie, S. Rogers, and S. Schroedl. Constrained k-means clustering with background knowledge. In Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM'01), 2001.Google ScholarGoogle Scholar
  26. X. Wei and B. Croft. Lda-based document models for ad-hoc retrieval. In Proceedings of the 29th Annual International SIGIR Conference on Research and Development in Information Retrieval (SIGIR06), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. T.-L. Wong, W. Lam, and T.-S. Wong. An unsupervised framework for extracting and normalizing product attributes from multiple web sites. In Proceedings of SIGIR'08, pages 35--41, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. C. Zhai, A. Velivelli, and B. Yu. A cross-collection mixture model for comparative text mining. In Proceedings of 10th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-2004), pages 743--748, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. L. Zhuang, F. Jing, and X. Zhu. Movie review mining and summarization. In Proceedings of the 15th ACM Conference on Information and Knowledge Management (CIKM'06), pages 43--50, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. OpinionIt: a text mining system for cross-lingual opinion analysis

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management
        October 2010
        2036 pages
        ISBN:9781450300995
        DOI:10.1145/1871437

        Copyright © 2010 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 26 October 2010

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate1,861of8,427submissions,22%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader