ABSTRACT
Descriptions and reviews for products abound on the web and characterise the corresponding products through their aspects. Extracting these aspects is essential to better understand these descriptions, e.g., for comparing or recommending products. Current pattern-based aspect extraction approaches focus on flat patterns extracting flat sets of adjective-noun pairs. Aspects also have crucial importance on sentiment classification in which sentiments are matched with aspect-level expressions. A preliminary step in both aspect extraction and aspect based sentiment analysis is to detect aspect terms and opinion targets. In this paper, we propose a sequential learning approach to extract aspect terms and opinion targets from opinionated documents. For the first time, we use semi-markov conditional random fields for this task and we incorporate word embeddings as features into the learning process. We get comparative results on the benchmark datasets for the subtask of aspect term extraction in SemEval-2014 Task 4 and the subtask of opinion target extraction in SemEval-2015 Task 12. Our results show that word embeddings improve the detection accuracy for aspect terms and opinion targets.
- R. Agerri, J. Bermudez, and G. Rigau. Ixa pipeline: Efficient and ready to use multilingual nlp tools. In N. C. C. Chair), K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, and S. Piperidis, editors, Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), Reykjavik, Iceland, may 2014. European Language Resources Association (ELRA).Google Scholar
- R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Bases, VLDB '94, pages 487--499, San Francisco, CA, USA, 1994. Morgan Kaufmann Publishers Inc. Google ScholarDigital Library
- G. Andrew. A hybrid markov/semi-markov conditional random field for sequence segmentation. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pages 465--472, Sydney, Australia, July 2006. Association for Computational Linguistics. Google ScholarDigital Library
- S. Aubin and T. Hamon. Improving term extraction with terminological resources. In T. Salakoski, F. Ginter, S. Pyysalo, and T. Pahikkala, editors, Advances in Natural Language Processing (5th International Conference on NLP, FinTAL 2006), number 4139 in LNAI, pages 380--387. Springer, August 2006. Google ScholarDigital Library
- Y. Bengio, A. C. Courville, and P. Vincent. Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell., 35(8):1798--1828, 2013. Google ScholarDigital Library
- C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann. Dbpedia - a crystallization point for the web of data. Web Semant., 7(3):154--165, Sept. 2009. Google ScholarDigital Library
- P. Blinov and E. Kotelnikov. Blinov: Distributed representations of words for aspect-based sentiment analysis at semeval 2014. SemEval 2014, page 140, 2014.Google Scholar
- K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD '08, pages 1247--1250, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- E. Breck, Y. Choi, and C. Cardie. Identifying expressions of opinion in context. In Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI'07, pages 2683--2688, San Francisco, CA, USA, 2007. Morgan Kaufmann Publishers Inc. Google ScholarDigital Library
- S. Brin and L. Page. Reprint of: The anatomy of a large-scale hypertextual web search engine. Computer networks, 56(18):3825--3833, 2012. Google ScholarDigital Library
- P. F. Brown, P. V. deSouza, R. L. Mercer, V. J. D. Pietra, and J. C. Lai. Class-based n-gram models of natural language. Comput. Linguist., 18(4):467--479, Dec. 1992. Google ScholarDigital Library
- M. Chernyshevich. Ihs r&d belarus: Cross-domain extraction of product features using crf. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pages 309--313, Dublin, Ireland, August 2014. Association for Computational Linguistics and Dublin City University.Google Scholar
- M. Collins. Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10, pages 1--8. Association for Computational Linguistics, 2002. Google ScholarDigital Library
- R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. Natural language processing (almost) from scratch. J. Mach. Learn. Res., 12:2493--2537, Nov. 2011. Google ScholarCross Ref
- S. Cucerzan. Large-scale named entity disambiguation based on wikipedia data. In Proceedings of EMNLP-CoNLL 2007, page 708--716, June 2007.Google Scholar
- H. Cunningham, D. Maynard, K. Bontcheva, V. Tablan, N. Aswani, I. Roberts, G. Gorrell, A. Funk, A. Roberts, D. Damljanovic, T. Heitz, M. A. Greenwood, H. Saggion, J. Petrak, Y. Li, and W. Peters. Text Processing with GATE (Version 6). GATE, 2011. Google ScholarDigital Library
- B. de Bruijn, C. Cherry, S. Kiritchenko, J. Martin, and X. Zhu. Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010. Journal of the American Medical Informatics Association, 18(5):557--562, 2011.Google ScholarCross Ref
- P. Drouin. Term extraction using non-technical corpora as a point of leverage. Terminology, 9(1):99--115, 2003.Google ScholarCross Ref
- C. Fellbaum. Wordnet: An electronic database, 1998.Google Scholar
- G. Ganu, N. Elhadad, and A. Marian. Beyond the stars: Improving rating predictions using review text content. In WebDB, volume 9, pages 1--6. Citeseer, 2009.Google Scholar
- A. Garcıa-Pablos, M. Cuadros, S. Gaines, and G. Rigau. V3: Unsupervised generation of domain aspect terms for aspect based sentiment analysis. SemEval 2014, page 833, 2014.Google Scholar
- M. Hepp. Goodrelations: An ontology for describing products and services offers on the web. In Proceedings of the 16th International Conference on Knowledge Engineering: Practice and Patterns, EKAW '08, pages 329--346, Berlin, Heidelberg, 2008. Springer-Verlag. Google ScholarDigital Library
- M. Hepple. Independence and commitment: Assumptions for rapid training and execution of rule-based pos taggers. In Proc. of ACL, pages 278--277, 2000. Google ScholarDigital Library
- M. Hu and B. Liu. Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 168--177. ACM, 2004. Google ScholarDigital Library
- M. Hu and B. Liu. Mining and summarizing customer reviews. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '04, pages 168--177, New York, NY, USA, 2004. ACM. Google ScholarDigital Library
- M. Hu and B. Liu. Mining opinion features in customer reviews. In Proceedings of the 19th national conference on Artifical intelligence, pages 755--760. AAAI Press, 2004. Google ScholarDigital Library
- A. Ittoo and G. Bouma. Term extraction from sparse, ungrammatical domain-specific documents. Expert Systems with Applications, 40(7):2530--2540, 2013. Google ScholarDigital Library
- N. Jakob and I. Gurevych. Extracting opinion targets in a single-and cross-domain setting with conditional random fields. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 1035--1045. Association for Computational Linguistics, 2010. Google ScholarDigital Library
- S. Kiritchenko, X. Zhu, C. Cherry, and S. M. Mohammad. Nrc-canada-2014: Detecting aspects and sentiment in customer reviews. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pages 437--442, 2014.Google ScholarCross Ref
- R. Kittredge. Variation and homogeneity of sublanguages. Sublanguage: studies of language in restricted semantic domains, pages 107--137, 1982.Google Scholar
- J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46(5):604--632, Sept. 1999. Google ScholarDigital Library
- J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning, ICML '01, pages 282--289, San Francisco, CA, USA, 2001. Morgan Kaufmann Publishers Inc. Google ScholarDigital Library
- F. Li, C. Han, M. Huang, X. Zhu, Y.-J. Xia, S. Zhang, and H. Yu. Structure-aware review mining and summarization. In Proceedings of the 23rd international conference on computational linguistics, pages 653--661. Association for Computational Linguistics, 2010. Google ScholarDigital Library
- B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. In Proceedings of the 4th international conference on Knowledge Discovery and Data mining (KDD'98), pages 80--86. AAAI Press, August 1998.Google ScholarDigital Library
- D. C. Liu and J. Nocedal. On the limited memory bfgs method for large scale optimization. Math. Program., 45(3):503--528, Dec. 1989.Google ScholarCross Ref
- M. Mahoney. Large text compression benchmark. URL: http://www. mattmahoney. net/text/text. html, 2009.Google Scholar
- D. Maynard, Y. Li, and W. Peters. Nlp techniques for term extraction and ontology population. In Proceedings of the 2008 conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge, pages 107--127. IOS Press, 2008. Google ScholarDigital Library
- T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.Google Scholar
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages 3111--3119. Curran Associates, Inc., 2013.Google Scholar
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages 3111--3119. Curran Associates, Inc., 2013.Google Scholar
- T. Mikolov, W. Yih, and G. Zweig. Linguistic regularities in continuous space word representations. In Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, June 9--14, 2013, Westin Peachtree Plaza Hotel, Atlanta, Georgia, USA, pages 746--751, 2013.Google Scholar
- K. Min-Yen and M. R. Information extraction and summarization system and methods, Aug. 24 2000. WO Patent App. PCT/US2000/004,117.Google Scholar
- S. Moghaddam and M. Ester. Opinion digger: an unsupervised opinion miner from unstructured product reviews. In Proceedings of the 19th ACM international conference on Information and knowledge management, CIKM '10, pages 1825--1828, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- R. Navigli and S. P. Ponzetto. Babelnet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell., 193:217--250, Dec. 2012. Google ScholarDigital Library
- R. Navigli and P. Velardi. Learning domain ontologies from document warehouses and dedicated web sites. Computational Linguistics, 30(2):151--179, 2004. Google ScholarDigital Library
- N. Okazaki. Crfsuite: a fast implementation of conditional random fields (crfs), 2007.Google Scholar
- J. Pavlopoulos and I. Androutsopoulos. Aspect term extraction for sentiment analysis: New datasets, new evaluation measures and an improved unsupervised method. Proceedings of LASMEACL, pages 44--52, 2014.Google ScholarCross Ref
- J. Pennington, R. Socher, and C. Manning. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1532--1543, Doha, Qatar, October 2014. Association for Computational Linguistics.Google ScholarCross Ref
- H. Picht. Terms and their lsp environment-lsp phraseology. Meta: Journal des traducteursMeta:/Translators' Journal, 32(2):149--155, 1987.Google Scholar
- M. Pontiki, D. Galanis, H. Papageorgiou, S. Manandhar, and I. Androutsopoulos. Semeval-2015 task 12: Aspect based sentiment analysis. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pages 486--495, Denver, Colorado, June 2015. Association for Computational Linguistics.Google ScholarCross Ref
- M. Pontiki, D. Galanis, J. Pavlopoulos, H. Papageorgiou, I. Androutsopoulos, and S. Manandhar. Semeval-2014 task 4: Aspect based sentiment analysis. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pages 27--35, Dublin, Ireland, August 2014. Association for Computational Linguistics and Dublin City University.Google ScholarCross Ref
- A.-M. Popescu and O. Etzioni. Extracting product features and opinions from reviews. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pages 339--346, Vancouver, British Columbia, Canada, October 2005. Association for Computational Linguistics. Google ScholarDigital Library
- G. Qiu, B. Liu, J. Bu, and C. Chen. Expanding domain sentiment lexicon through double propagation. In Proceedings of the 21st International Jont Conference on Artifical Intelligence, IJCAI'09, pages 1199--1204, San Francisco, CA, USA, 2009. Morgan Kaufmann Publishers Inc. Google ScholarDigital Library
- G. Qiu, B. Liu, J. Bu, and C. Chen. Opinion word expansion and target extraction through double propagation. Computational linguistics, 37(1):9--27, 2011. Google ScholarDigital Library
- L. A. Ramshaw and M. P. Marcus. Text chunking using transformation-based learning. CoRR, cmp-lg/9505040, 1995.Google Scholar
- E. Riloff and R. Jones. Learning dictionaries for information extraction by multi-level bootstrapping. In Proceedings of the Sixteenth National Conference on Artificial Intelligence and the Eleventh Innovative Applications of Artificial Intelligence Conference Innovative Applications of Artificial Intelligence, AAAI '99/IAAI '99, pages 474--479, Menlo Park, CA, USA, 1999. American Association for Artificial Intelligence. Google ScholarDigital Library
- I. n. San Vicente, X. Saralegi, and R. Agerri. Elixa: A modular and flexible absa platform. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pages 748--752, Denver, Colorado, June 2015. Association for Computational Linguistics.Google ScholarCross Ref
- E. F. Sang and J. Veenstra. Representing text chunks. In Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics, pages 173--179. Association for Computational Linguistics, 1999. Google ScholarDigital Library
- S. Sarawagi and W. W. Cohen. Semi-markov conditional random fields for information extraction. In NIPS, 2004.Google Scholar
- Z. Toh and J. Su. Nlangp: Supervised machine learning system for aspect category classification and opinion target extraction. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pages 496--501, Denver, Colorado, June 2015. Association for Computational Linguistics.Google ScholarCross Ref
- Z. Toh and W. Wang. Dlirec: Aspect term extraction and term polarity classification system. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pages 235--240, Dublin, Ireland, August 2014. Association for Computational Linguistics and Dublin City University.Google ScholarCross Ref
- J. Turian, L. Ratinov, and Y. Bengio. Word representations: A simple and general method for semi-supervised learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL '10, pages 384--394, Stroudsburg, PA, USA, 2010. Association for Computational Linguistics. Google ScholarDigital Library
- T. Wilson, P. Hoffmann, S. Somasundaran, J. Kessler, J. Wiebe, Y. Choi, C. Cardie, E. Riloff, and S. Patwardhan. Opinionfinder: A system for subjectivity analysis. In Proceedings of HLT/EMNLP on Interactive Demonstrations, HLT-Demo '05, pages 34--35, Stroudsburg, PA, USA, 2005. Association for Computational Linguistics. Google ScholarDigital Library
- J. Winborg and H. Landström. Financial bootstrapping in small businesses: examining small business managers' resource acquisition behaviors. Journal of Business Venturing, 16(3):235--254, 2001.Google ScholarCross Ref
- S. E. Wright and G. Budin. Term selection: the initial phase of terminology management. Handbook of terminology management, 1:13--23, 1997.Google Scholar
- Y. Wu, Q. Zhang, X. Huang, and L. Wu. Phrase dependency parsing for opinion mining. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3, EMNLP '09, pages 1533--1541, Stroudsburg, PA, USA, 2009. Association for Computational Linguistics. Google ScholarDigital Library
- B. Yang and C. Cardie. Extracting opinion expressions with semi-markov conditional random fields. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL '12, pages 1335--1345, Stroudsburg, PA, USA, 2012. Association for Computational Linguistics. Google ScholarDigital Library
- L. Zhang, B. Liu, S. H. Lim, and E. O'Brien-Strain. Extracting and ranking product features in opinion documents. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, COLING '10, pages 1462--1470, Stroudsburg, PA, USA, 2010. Association for Computational Linguistics. Google ScholarDigital Library
- L. Zhuang, F. Jing, and X.-Y. Zhu. Movie review mining and summarization. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management, CIKM '06, pages 43--50, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
- Aspect Term and Opinion Target Extraction from Web Product Reviews using Semi-markov Conditional Random Fields with Word Embeddings as Features
Recommendations
Hierarchical Multi-label Conditional Random Fields for Aspect-Oriented Opinion Mining
ECIR 2014: Proceedings of the 36th European Conference on IR Research on Advances in Information Retrieval - Volume 8416A common feature of many online review sites is the use of an overall rating that summarizes the opinions expressed in a review. Unfortunately, these document-level ratings do not provide any information about the opinions contained in the review that ...
Aspect-Sentiment-Multiple-Opinion Triplet Extraction
Natural Language Processing and Chinese ComputingAbstractAspect Sentiment Triplet Extraction (ASTE) aims to extract aspect term (aspect), sentiment and opinion term (opinion) triplets from sentences and can tell a complete story, i.e., the discussed aspect, the sentiment toward the aspect, and the cause ...
Rule-based opinion target and aspect extraction to acquire affective knowledge
WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide WebOpinion holder and opinion target extraction are among the most popular and challenging problems tackled by opinion mining researchers, recognizing the significant business value of such components and their importance for applications such as media ...
Comments