Skip to main content
Log in

Leveraging Large Data with Weak Supervision for Joint Feature and Opinion Word Extraction

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Product feature and opinion word extraction is very important for fine granular sentiment analysis. In this paper, we leverage large-scale unlabeled data for joint extraction of feature and opinion words under a knowledge poor setting, in which only a few feature-opinion pairs are utilized as weak supervision. Our major contributions are two-fold: first, we propose a data-driven approach to represent product features and opinion words as a list of corpus-level syntactic relations, which captures rich language structures; second, we build a simple yet robust unsupervised model with prior knowledge incorporated to extract new feature and opinion words, which obtains high performance robustly. The extraction process is based upon a bootstrapping framework which, to some extent, reduces error propagation under large data. Experimental results under various settings compared with state-of-the-art baselines demonstrate that our method is effective and promising.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Ante S E. Amazon: Turning consumer opinions into gold. Business Week. http://www.bloomberg.com/bw/magazine/content/0943/b4152047039565.htm, May 2015.

  2. Pang B, Lee L, Vaithyanathan S. Thumbs up?: Sentiment classification using machine learning techniques. In Proc. the ACL-02 Conference on Empirical Methods in Natural Language Processing, Jul. 2002, pp.79-86.

  3. Hu M, Liu B. Mining and summarizing customer reviews. In Proc. the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2004, pp. 168-177.

  4. Liu B, Hu M, Cheng J. Opinion observer: Analyzing and comparing opinions on the web. In Proc. the 14th International Conference on World Wide Web, May 2005, pp.342-351.

  5. Qiu G, Liu B, Bu J, Chen C. Opinion word expansion and target extraction through double propagation. Comput. Linguist., 2011, 37(1): 9-27.

    Article  Google Scholar 

  6. Zhuang L, Jing F, Zhu X Y. Movie review mining and summarization. In Proc. the 15th ACM International Conference on Information and Knowledge Management, Nov. 2006, pp.43-50.

  7. Hai Z, Chang K, Cong G. One seed to find them all: Mining opinion features via association. In Proc. the 21st ACM International Conference on Information and Knowledge Management, Oct. 29 – Nov. 2, 2012, pp.255-264.

  8. Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation. Journal of Machine Learning Research, 2003, 3: 993-1022.

    MATH  Google Scholar 

  9. Titov I, McDonald R. A joint model of text and aspect ratings for sentiment summarization. In Proc. the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Jun. 2008, pp.308-316.

  10. Zhao W X, Jiang J, Yan H, Li X. Jointly modeling aspects and opinions with a Maxent-LDA hybrid. In Proc. the 2010 Conference on Empirical Methods in Natural Language Processing, Oct. 2010, pp.56-65.

  11. Mukherjee A, Liu B. Aspect extraction through semisupervised modeling. In Proc. the 50th Annual Meeting of the Association for Computational Linguistics, Jul. 2012, pp.339-348.

  12. Newman D, Asuncion A, Smyth P, Welling M. Distributed algorithms for topic models. Journal of Machine Learning Research, 2009, 10: 1801-1828.

    MATH  MathSciNet  Google Scholar 

  13. Lin J, Kolcz A. Large-scale machine learning at Twitter. In Proc. the 2012 ACM SIGMOD International Conference on Management of Data, May 2012, pp.793-804.

  14. Halevy A, Norvig P, Pereira F. The unreasonable effectiveness of data. IEEE Intelligent Systems, 2009, 24(2): 8-12.

    Article  Google Scholar 

  15. Kobayashi N, Inui K, Matsumoto Y. Extracting aspectevaluation and aspect-of relations in opinion mining. In Proc. the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jun. 2007, pp.1065-1074.

  16. Wu Y, Zhang Q, Huang X, Wu L. Phrase dependency parsing for opinion mining. In Proc. the 2009 Conference on Empirical Methods in Natural Language Processing, Aug. 2009, pp.1533-1541.

  17. Li F, Han C, Huang M, Zhu X, Xia Y J, Zhang S, Yu H. Structure-aware review mining and summarization. In Proc. the 23rd International Conference on Computational Linguistics, Aug. 2010, pp.653-661.

  18. Choi Y, Cardie C. Hierarchical sequential learning for extracting opinions and their attributes. In Proc. the ACL 2010 Conference Short Papers, Jul. 2010, pp.269-274.

  19. Popescu A M, Etzioni O. Extracting product features and opinions from reviews. In Proc. the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, Oct. 2005, pp.339-346.

  20. Kaji N, Kitsuregawa M. Building lexicon for sentiment analysis from massive collection of HTML documents. In Proc. the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, June 2007, pp.1075-1083.

  21. Guo H, Zhu H, Guo Z, Zhang X, Su Z. Product feature categorization with multilevel latent semantic association. In Proc. the 18th ACM Conference on Information and Knowledge Management, Nov. 2009, pp.1087-1096.

  22. Zhang L, Liu B, Lim S H, O’Brien-Strain E. Extracting and ranking product features in opinion documents. In Proc. the 23rd International Conference on Computational Linguistics, Aug. 2010, pp.1462-1470.

  23. Gindl S, Weichselbraun A, Scharl A. Rule-based opinion target and aspect extraction to acquire affective knowledge. In Proc. the 22nd International Conference on World Wide Web Companion, May 2013, pp.557-564.

  24. Mei Q, Ling X, Wondra M, Su H, Zhai C. Topic sentiment mixture: Modeling facets and opinions in weblogs. In Proc. the 16th International Conference on World Wide Web, May 2007, pp.171-180.

  25. Brody S, Elhadad N. An unsupervised aspect-sentiment model for online reviews. In Proc. Human Language Technologies: The 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics, Jun. 2010, pp.804-812.

  26. Jo Y, Oh A H. Aspect and sentiment unification model for online review analysis. In Proc. the 4th ACM International Conference on Web Search and Data Mining, Feb. 2011, pp.815-824.

  27. Lu B, Ott M, Cardie C, Tsou B K. Multi-aspect sentiment analysis with topic models. In Proc. the 11th IEEE International Conference on Data Mining Workshops, Dec. 2011, pp.81-88.

  28. Moghaddam S, Ester M. ILDA: Interdependent LDA model for learning latent aspects and their ratings from online product reviews. In Proc. the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul. 2011, pp.665-674.

  29. Chen Z, Mukherjee A, Liu B, Hsu M, Castellanos M, Ghosh R. Exploiting domain knowledge in aspect extraction. In Proc. the 2013 Conference on Empirical Methods in Natural Language Processing, Oct. 2013, pp.1655-1667.

  30. Wang H, Lu Y, Zhai C. Latent aspect rating analysis on review text data: A rating regression approach. In Proc. the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Jul. 2010, pp.783-792.

  31. Snyder B, Barzilay R. Multiple aspect ranking using the good grief algorithm. In Proc. Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics, Apr. 2007, pp.300-307.

  32. Yu J, Zha Z J, Wang M, Chua T S. Aspect ranking: Identifying important product aspects from online consumer reviews. In Proc. the 49th Annual Meeting of the Association for Computational Linguistics, Jun. 2011, pp.1496-1505.

  33. Li P, Wang Y, Gao W, Jiang J. Generating aspect-oriented multi-document summarization with event-aspect model. In Proc. the Conference on Empirical Methods in Natural Language Processing, Jul. 2011, pp.1137-1146.

  34. Liu K, Xu L, Zhao J. Opinion target extraction using wordbased translation model. In Proc. the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jul. 2012, pp.1346-1356.

  35. Liu K, Xu L, Zhao J. Syntactic patterns versus word alignment: Extracting opinion targets from online reviews. In Proc. the 51st Annual Meeting of the Association for Computational Linguistics, Aug. 2013, pp.1754-1763.

  36. Xu L, Liu K, Lai S, Chen Y, Zhao J. Mining opinion words and opinion targets in a two-stage framework. In Proc. the 51st Annual Meeting of the Association for Computational Linguistics, Aug. 2013, pp.1764-1773.

  37. Andrzejewski D, Zhu X, Craven M. Incorporating domain knowledge into topic modeling via dirichlet forest priors. In Proc. the 26th Annual International Conference on Machine Learning, Jun. 2009, pp.25-32.

  38. Andrzejewski D, Zhu X, Craven M, Recht B. A framework for incorporating general domain knowledge into latent dirichlet allocation using first-order logic. In Proc. the 22nd International Joint Conference on Artificial Intelligence, Jul. 2011, pp.1171-1177.

  39. Li T, Zhang Y, Sindhwani V. A non-negative matrix trifactorization approach to sentiment classification with lexical prior knowledge. In Proc. the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Aug. 2009, pp.244-252.

  40. Shen C, Li T. A non-negative matrix factorization based approach for active dual supervision from document and word labels. In Proc. the Conference on Empirical Methods in Natural Language Processing, Jul. 2011, pp.949-958.

  41. Fang L, Huang M, Zhu X. Exploring weakly supervised latent sentiment explanations for aspect-level review analysis. In Proc. the 22nd ACM International Conference on Information and Knowledge Management, Oct. 27 – Nov. 1, 2013, pp.1057-1066.

  42. Yu C N J, Joachims T. Learning structural SVMs with latent variables. In Proc. the 26th Annual International Conference on Machine Learning, Jun. 2009, pp.1169-1176.

  43. Druck G, Mann G, McCallum A. Learning from labeled features using generalized expectation criteria. In Proc. the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul. 2008, pp.595-602.

  44. Ganchev K, Gra¸ca J, Gillenwater J, Taskar B. Posterior regularization for structured latent variable models. Journal of Machine Learning Research, 2010, 11: 2001-2049.

  45. Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 2008, 51(1): 107-113.

    Article  Google Scholar 

  46. Klein D, Manning C D. Accurate unlexicalized parsing. In Proc. the 41st Annual Meeting on Association for Computational Linguistics, Jul. 2003, pp.423-430.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Min-Lie Huang.

Additional information

This work is partly supported by the National Basic Research 973 Program of China under Grant Nos. 2012CB316301 and 2013CB329403, the National Natural Science Foundation of China under Grant Nos. 61332007 and 61272227, and the Beijing Higher Education Young Elite Teacher Project.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fang, L., Liu, B. & Huang, ML. Leveraging Large Data with Weak Supervision for Joint Feature and Opinion Word Extraction. J. Comput. Sci. Technol. 30, 903–916 (2015). https://doi.org/10.1007/s11390-015-1569-3

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-015-1569-3

Keywords