Leveraging Large Data with Weak Supervision for Joint Feature and Opinion Word Extraction

Fang, Lei; Liu, Biao; Huang, Min-Lie

doi:10.1007/s11390-015-1569-3

Leveraging Large Data with Weak Supervision for Joint Feature and Opinion Word Extraction

Regular Paper
Published: 08 July 2015

Volume 30, pages 903–916, (2015)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Lei Fang¹,
Biao Liu¹ &
Min-Lie Huang¹

117 Accesses
2 Citations
Explore all metrics

Abstract

Product feature and opinion word extraction is very important for fine granular sentiment analysis. In this paper, we leverage large-scale unlabeled data for joint extraction of feature and opinion words under a knowledge poor setting, in which only a few feature-opinion pairs are utilized as weak supervision. Our major contributions are two-fold: first, we propose a data-driven approach to represent product features and opinion words as a list of corpus-level syntactic relations, which captures rich language structures; second, we build a simple yet robust unsupervised model with prior knowledge incorporated to extract new feature and opinion words, which obtains high performance robustly. The extraction process is based upon a bootstrapping framework which, to some extent, reduces error propagation under large data. Experimental results under various settings compared with state-of-the-art baselines demonstrate that our method is effective and promising.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

EnSWF: effective features extraction and selection in conjunction with ensemble learning methods for document sentiment classification

Article 09 March 2019

Aspect-Level Sentiment Analysis of Online Product Reviews Based on Multi-features

Improved Representations for Personalized Document-Level Sentiment Classification

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Ante S E. Amazon: Turning consumer opinions into gold. Business Week. http://www.bloomberg.com/bw/magazine/content/0943/b4152047039565.htm, May 2015.
Pang B, Lee L, Vaithyanathan S. Thumbs up?: Sentiment classification using machine learning techniques. In Proc. the ACL-02 Conference on Empirical Methods in Natural Language Processing, Jul. 2002, pp.79-86.
Hu M, Liu B. Mining and summarizing customer reviews. In Proc. the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2004, pp. 168-177.
Liu B, Hu M, Cheng J. Opinion observer: Analyzing and comparing opinions on the web. In Proc. the 14th International Conference on World Wide Web, May 2005, pp.342-351.
Qiu G, Liu B, Bu J, Chen C. Opinion word expansion and target extraction through double propagation. Comput. Linguist., 2011, 37(1): 9-27.
Article Google Scholar
Zhuang L, Jing F, Zhu X Y. Movie review mining and summarization. In Proc. the 15th ACM International Conference on Information and Knowledge Management, Nov. 2006, pp.43-50.
Hai Z, Chang K, Cong G. One seed to find them all: Mining opinion features via association. In Proc. the 21st ACM International Conference on Information and Knowledge Management, Oct. 29 – Nov. 2, 2012, pp.255-264.
Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation. Journal of Machine Learning Research, 2003, 3: 993-1022.
MATH Google Scholar
Titov I, McDonald R. A joint model of text and aspect ratings for sentiment summarization. In Proc. the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Jun. 2008, pp.308-316.
Zhao W X, Jiang J, Yan H, Li X. Jointly modeling aspects and opinions with a Maxent-LDA hybrid. In Proc. the 2010 Conference on Empirical Methods in Natural Language Processing, Oct. 2010, pp.56-65.
Mukherjee A, Liu B. Aspect extraction through semisupervised modeling. In Proc. the 50th Annual Meeting of the Association for Computational Linguistics, Jul. 2012, pp.339-348.
Newman D, Asuncion A, Smyth P, Welling M. Distributed algorithms for topic models. Journal of Machine Learning Research, 2009, 10: 1801-1828.
MATH MathSciNet Google Scholar
Lin J, Kolcz A. Large-scale machine learning at Twitter. In Proc. the 2012 ACM SIGMOD International Conference on Management of Data, May 2012, pp.793-804.
Halevy A, Norvig P, Pereira F. The unreasonable effectiveness of data. IEEE Intelligent Systems, 2009, 24(2): 8-12.
Article Google Scholar
Kobayashi N, Inui K, Matsumoto Y. Extracting aspectevaluation and aspect-of relations in opinion mining. In Proc. the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jun. 2007, pp.1065-1074.
Wu Y, Zhang Q, Huang X, Wu L. Phrase dependency parsing for opinion mining. In Proc. the 2009 Conference on Empirical Methods in Natural Language Processing, Aug. 2009, pp.1533-1541.
Li F, Han C, Huang M, Zhu X, Xia Y J, Zhang S, Yu H. Structure-aware review mining and summarization. In Proc. the 23rd International Conference on Computational Linguistics, Aug. 2010, pp.653-661.
Choi Y, Cardie C. Hierarchical sequential learning for extracting opinions and their attributes. In Proc. the ACL 2010 Conference Short Papers, Jul. 2010, pp.269-274.
Popescu A M, Etzioni O. Extracting product features and opinions from reviews. In Proc. the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, Oct. 2005, pp.339-346.
Kaji N, Kitsuregawa M. Building lexicon for sentiment analysis from massive collection of HTML documents. In Proc. the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, June 2007, pp.1075-1083.
Guo H, Zhu H, Guo Z, Zhang X, Su Z. Product feature categorization with multilevel latent semantic association. In Proc. the 18th ACM Conference on Information and Knowledge Management, Nov. 2009, pp.1087-1096.
Zhang L, Liu B, Lim S H, O’Brien-Strain E. Extracting and ranking product features in opinion documents. In Proc. the 23rd International Conference on Computational Linguistics, Aug. 2010, pp.1462-1470.
Gindl S, Weichselbraun A, Scharl A. Rule-based opinion target and aspect extraction to acquire affective knowledge. In Proc. the 22nd International Conference on World Wide Web Companion, May 2013, pp.557-564.
Mei Q, Ling X, Wondra M, Su H, Zhai C. Topic sentiment mixture: Modeling facets and opinions in weblogs. In Proc. the 16th International Conference on World Wide Web, May 2007, pp.171-180.
Brody S, Elhadad N. An unsupervised aspect-sentiment model for online reviews. In Proc. Human Language Technologies: The 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics, Jun. 2010, pp.804-812.
Jo Y, Oh A H. Aspect and sentiment unification model for online review analysis. In Proc. the 4th ACM International Conference on Web Search and Data Mining, Feb. 2011, pp.815-824.
Lu B, Ott M, Cardie C, Tsou B K. Multi-aspect sentiment analysis with topic models. In Proc. the 11th IEEE International Conference on Data Mining Workshops, Dec. 2011, pp.81-88.
Moghaddam S, Ester M. ILDA: Interdependent LDA model for learning latent aspects and their ratings from online product reviews. In Proc. the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul. 2011, pp.665-674.
Chen Z, Mukherjee A, Liu B, Hsu M, Castellanos M, Ghosh R. Exploiting domain knowledge in aspect extraction. In Proc. the 2013 Conference on Empirical Methods in Natural Language Processing, Oct. 2013, pp.1655-1667.
Wang H, Lu Y, Zhai C. Latent aspect rating analysis on review text data: A rating regression approach. In Proc. the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Jul. 2010, pp.783-792.
Snyder B, Barzilay R. Multiple aspect ranking using the good grief algorithm. In Proc. Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics, Apr. 2007, pp.300-307.
Yu J, Zha Z J, Wang M, Chua T S. Aspect ranking: Identifying important product aspects from online consumer reviews. In Proc. the 49th Annual Meeting of the Association for Computational Linguistics, Jun. 2011, pp.1496-1505.
Li P, Wang Y, Gao W, Jiang J. Generating aspect-oriented multi-document summarization with event-aspect model. In Proc. the Conference on Empirical Methods in Natural Language Processing, Jul. 2011, pp.1137-1146.
Liu K, Xu L, Zhao J. Opinion target extraction using wordbased translation model. In Proc. the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jul. 2012, pp.1346-1356.
Liu K, Xu L, Zhao J. Syntactic patterns versus word alignment: Extracting opinion targets from online reviews. In Proc. the 51st Annual Meeting of the Association for Computational Linguistics, Aug. 2013, pp.1754-1763.
Xu L, Liu K, Lai S, Chen Y, Zhao J. Mining opinion words and opinion targets in a two-stage framework. In Proc. the 51st Annual Meeting of the Association for Computational Linguistics, Aug. 2013, pp.1764-1773.
Andrzejewski D, Zhu X, Craven M. Incorporating domain knowledge into topic modeling via dirichlet forest priors. In Proc. the 26th Annual International Conference on Machine Learning, Jun. 2009, pp.25-32.
Andrzejewski D, Zhu X, Craven M, Recht B. A framework for incorporating general domain knowledge into latent dirichlet allocation using first-order logic. In Proc. the 22nd International Joint Conference on Artificial Intelligence, Jul. 2011, pp.1171-1177.
Li T, Zhang Y, Sindhwani V. A non-negative matrix trifactorization approach to sentiment classification with lexical prior knowledge. In Proc. the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Aug. 2009, pp.244-252.
Shen C, Li T. A non-negative matrix factorization based approach for active dual supervision from document and word labels. In Proc. the Conference on Empirical Methods in Natural Language Processing, Jul. 2011, pp.949-958.
Fang L, Huang M, Zhu X. Exploring weakly supervised latent sentiment explanations for aspect-level review analysis. In Proc. the 22nd ACM International Conference on Information and Knowledge Management, Oct. 27 – Nov. 1, 2013, pp.1057-1066.
Yu C N J, Joachims T. Learning structural SVMs with latent variables. In Proc. the 26th Annual International Conference on Machine Learning, Jun. 2009, pp.1169-1176.
Druck G, Mann G, McCallum A. Learning from labeled features using generalized expectation criteria. In Proc. the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul. 2008, pp.595-602.
Ganchev K, Gra¸ca J, Gillenwater J, Taskar B. Posterior regularization for structured latent variable models. Journal of Machine Learning Research, 2010, 11: 2001-2049.
Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 2008, 51(1): 107-113.
Article Google Scholar
Klein D, Manning C D. Accurate unlexicalized parsing. In Proc. the 41st Annual Meeting on Association for Computational Linguistics, Jul. 2003, pp.423-430.

Download references

Author information

Authors and Affiliations

State Key Laboratory on Intelligent Technology and Systems, Department of Computer Science and Technology Tsinghua University, Beijing, 100084, China
Lei Fang, Biao Liu & Min-Lie Huang

Authors

Lei Fang
View author publications
Search author on:PubMed Google Scholar
Biao Liu
View author publications
Search author on:PubMed Google Scholar
Min-Lie Huang
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Min-Lie Huang.

Additional information

This work is partly supported by the National Basic Research 973 Program of China under Grant Nos. 2012CB316301 and 2013CB329403, the National Natural Science Foundation of China under Grant Nos. 61332007 and 61272227, and the Beijing Higher Education Young Elite Teacher Project.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fang, L., Liu, B. & Huang, ML. Leveraging Large Data with Weak Supervision for Joint Feature and Opinion Word Extraction. J. Comput. Sci. Technol. 30, 903–916 (2015). https://doi.org/10.1007/s11390-015-1569-3

Download citation

Received: 12 September 2014
Revised: 04 May 2015
Published: 08 July 2015
Issue Date: July 2015
DOI: https://doi.org/10.1007/s11390-015-1569-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Leveraging Large Data with Weak Supervision for Joint Feature and Opinion Word Extraction

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

EnSWF: effective features extraction and selection in conjunction with ensemble learning methods for document sentiment classification

Aspect-Level Sentiment Analysis of Online Product Reviews Based on Multi-features

Improved Representations for Personalized Document-Level Sentiment Classification

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now