skip to main content
10.1145/2912845.2936809acmotherconferencesArticle/Chapter ViewAbstractPublication PageswimsConference Proceedingsconference-collections
research-article

Aspect Term and Opinion Target Extraction from Web Product Reviews using Semi-markov Conditional Random Fields with Word Embeddings as Features

Published:13 June 2016Publication History

ABSTRACT

Descriptions and reviews for products abound on the web and characterise the corresponding products through their aspects. Extracting these aspects is essential to better understand these descriptions, e.g., for comparing or recommending products. Current pattern-based aspect extraction approaches focus on flat patterns extracting flat sets of adjective-noun pairs. Aspects also have crucial importance on sentiment classification in which sentiments are matched with aspect-level expressions. A preliminary step in both aspect extraction and aspect based sentiment analysis is to detect aspect terms and opinion targets. In this paper, we propose a sequential learning approach to extract aspect terms and opinion targets from opinionated documents. For the first time, we use semi-markov conditional random fields for this task and we incorporate word embeddings as features into the learning process. We get comparative results on the benchmark datasets for the subtask of aspect term extraction in SemEval-2014 Task 4 and the subtask of opinion target extraction in SemEval-2015 Task 12. Our results show that word embeddings improve the detection accuracy for aspect terms and opinion targets.

References

  1. R. Agerri, J. Bermudez, and G. Rigau. Ixa pipeline: Efficient and ready to use multilingual nlp tools. In N. C. C. Chair), K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, and S. Piperidis, editors, Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), Reykjavik, Iceland, may 2014. European Language Resources Association (ELRA).Google ScholarGoogle Scholar
  2. R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Bases, VLDB '94, pages 487--499, San Francisco, CA, USA, 1994. Morgan Kaufmann Publishers Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. G. Andrew. A hybrid markov/semi-markov conditional random field for sequence segmentation. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pages 465--472, Sydney, Australia, July 2006. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Aubin and T. Hamon. Improving term extraction with terminological resources. In T. Salakoski, F. Ginter, S. Pyysalo, and T. Pahikkala, editors, Advances in Natural Language Processing (5th International Conference on NLP, FinTAL 2006), number 4139 in LNAI, pages 380--387. Springer, August 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Y. Bengio, A. C. Courville, and P. Vincent. Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell., 35(8):1798--1828, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann. Dbpedia - a crystallization point for the web of data. Web Semant., 7(3):154--165, Sept. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Blinov and E. Kotelnikov. Blinov: Distributed representations of words for aspect-based sentiment analysis at semeval 2014. SemEval 2014, page 140, 2014.Google ScholarGoogle Scholar
  8. K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD '08, pages 1247--1250, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. E. Breck, Y. Choi, and C. Cardie. Identifying expressions of opinion in context. In Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI'07, pages 2683--2688, San Francisco, CA, USA, 2007. Morgan Kaufmann Publishers Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Brin and L. Page. Reprint of: The anatomy of a large-scale hypertextual web search engine. Computer networks, 56(18):3825--3833, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. F. Brown, P. V. deSouza, R. L. Mercer, V. J. D. Pietra, and J. C. Lai. Class-based n-gram models of natural language. Comput. Linguist., 18(4):467--479, Dec. 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Chernyshevich. Ihs r&d belarus: Cross-domain extraction of product features using crf. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pages 309--313, Dublin, Ireland, August 2014. Association for Computational Linguistics and Dublin City University.Google ScholarGoogle Scholar
  13. M. Collins. Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10, pages 1--8. Association for Computational Linguistics, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. Natural language processing (almost) from scratch. J. Mach. Learn. Res., 12:2493--2537, Nov. 2011. Google ScholarGoogle ScholarCross RefCross Ref
  15. S. Cucerzan. Large-scale named entity disambiguation based on wikipedia data. In Proceedings of EMNLP-CoNLL 2007, page 708--716, June 2007.Google ScholarGoogle Scholar
  16. H. Cunningham, D. Maynard, K. Bontcheva, V. Tablan, N. Aswani, I. Roberts, G. Gorrell, A. Funk, A. Roberts, D. Damljanovic, T. Heitz, M. A. Greenwood, H. Saggion, J. Petrak, Y. Li, and W. Peters. Text Processing with GATE (Version 6). GATE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. de Bruijn, C. Cherry, S. Kiritchenko, J. Martin, and X. Zhu. Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010. Journal of the American Medical Informatics Association, 18(5):557--562, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  18. P. Drouin. Term extraction using non-technical corpora as a point of leverage. Terminology, 9(1):99--115, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  19. C. Fellbaum. Wordnet: An electronic database, 1998.Google ScholarGoogle Scholar
  20. G. Ganu, N. Elhadad, and A. Marian. Beyond the stars: Improving rating predictions using review text content. In WebDB, volume 9, pages 1--6. Citeseer, 2009.Google ScholarGoogle Scholar
  21. A. Garcıa-Pablos, M. Cuadros, S. Gaines, and G. Rigau. V3: Unsupervised generation of domain aspect terms for aspect based sentiment analysis. SemEval 2014, page 833, 2014.Google ScholarGoogle Scholar
  22. M. Hepp. Goodrelations: An ontology for describing products and services offers on the web. In Proceedings of the 16th International Conference on Knowledge Engineering: Practice and Patterns, EKAW '08, pages 329--346, Berlin, Heidelberg, 2008. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Hepple. Independence and commitment: Assumptions for rapid training and execution of rule-based pos taggers. In Proc. of ACL, pages 278--277, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Hu and B. Liu. Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 168--177. ACM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Hu and B. Liu. Mining and summarizing customer reviews. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '04, pages 168--177, New York, NY, USA, 2004. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Hu and B. Liu. Mining opinion features in customer reviews. In Proceedings of the 19th national conference on Artifical intelligence, pages 755--760. AAAI Press, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Ittoo and G. Bouma. Term extraction from sparse, ungrammatical domain-specific documents. Expert Systems with Applications, 40(7):2530--2540, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. N. Jakob and I. Gurevych. Extracting opinion targets in a single-and cross-domain setting with conditional random fields. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 1035--1045. Association for Computational Linguistics, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Kiritchenko, X. Zhu, C. Cherry, and S. M. Mohammad. Nrc-canada-2014: Detecting aspects and sentiment in customer reviews. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pages 437--442, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  30. R. Kittredge. Variation and homogeneity of sublanguages. Sublanguage: studies of language in restricted semantic domains, pages 107--137, 1982.Google ScholarGoogle Scholar
  31. J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46(5):604--632, Sept. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning, ICML '01, pages 282--289, San Francisco, CA, USA, 2001. Morgan Kaufmann Publishers Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. F. Li, C. Han, M. Huang, X. Zhu, Y.-J. Xia, S. Zhang, and H. Yu. Structure-aware review mining and summarization. In Proceedings of the 23rd international conference on computational linguistics, pages 653--661. Association for Computational Linguistics, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. In Proceedings of the 4th international conference on Knowledge Discovery and Data mining (KDD'98), pages 80--86. AAAI Press, August 1998.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. D. C. Liu and J. Nocedal. On the limited memory bfgs method for large scale optimization. Math. Program., 45(3):503--528, Dec. 1989.Google ScholarGoogle ScholarCross RefCross Ref
  36. M. Mahoney. Large text compression benchmark. URL: http://www. mattmahoney. net/text/text. html, 2009.Google ScholarGoogle Scholar
  37. D. Maynard, Y. Li, and W. Peters. Nlp techniques for term extraction and ontology population. In Proceedings of the 2008 conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge, pages 107--127. IOS Press, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.Google ScholarGoogle Scholar
  39. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages 3111--3119. Curran Associates, Inc., 2013.Google ScholarGoogle Scholar
  40. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages 3111--3119. Curran Associates, Inc., 2013.Google ScholarGoogle Scholar
  41. T. Mikolov, W. Yih, and G. Zweig. Linguistic regularities in continuous space word representations. In Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, June 9--14, 2013, Westin Peachtree Plaza Hotel, Atlanta, Georgia, USA, pages 746--751, 2013.Google ScholarGoogle Scholar
  42. K. Min-Yen and M. R. Information extraction and summarization system and methods, Aug. 24 2000. WO Patent App. PCT/US2000/004,117.Google ScholarGoogle Scholar
  43. S. Moghaddam and M. Ester. Opinion digger: an unsupervised opinion miner from unstructured product reviews. In Proceedings of the 19th ACM international conference on Information and knowledge management, CIKM '10, pages 1825--1828, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. R. Navigli and S. P. Ponzetto. Babelnet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell., 193:217--250, Dec. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. R. Navigli and P. Velardi. Learning domain ontologies from document warehouses and dedicated web sites. Computational Linguistics, 30(2):151--179, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. N. Okazaki. Crfsuite: a fast implementation of conditional random fields (crfs), 2007.Google ScholarGoogle Scholar
  47. J. Pavlopoulos and I. Androutsopoulos. Aspect term extraction for sentiment analysis: New datasets, new evaluation measures and an improved unsupervised method. Proceedings of LASMEACL, pages 44--52, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  48. J. Pennington, R. Socher, and C. Manning. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1532--1543, Doha, Qatar, October 2014. Association for Computational Linguistics.Google ScholarGoogle ScholarCross RefCross Ref
  49. H. Picht. Terms and their lsp environment-lsp phraseology. Meta: Journal des traducteursMeta:/Translators' Journal, 32(2):149--155, 1987.Google ScholarGoogle Scholar
  50. M. Pontiki, D. Galanis, H. Papageorgiou, S. Manandhar, and I. Androutsopoulos. Semeval-2015 task 12: Aspect based sentiment analysis. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pages 486--495, Denver, Colorado, June 2015. Association for Computational Linguistics.Google ScholarGoogle ScholarCross RefCross Ref
  51. M. Pontiki, D. Galanis, J. Pavlopoulos, H. Papageorgiou, I. Androutsopoulos, and S. Manandhar. Semeval-2014 task 4: Aspect based sentiment analysis. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pages 27--35, Dublin, Ireland, August 2014. Association for Computational Linguistics and Dublin City University.Google ScholarGoogle ScholarCross RefCross Ref
  52. A.-M. Popescu and O. Etzioni. Extracting product features and opinions from reviews. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pages 339--346, Vancouver, British Columbia, Canada, October 2005. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. G. Qiu, B. Liu, J. Bu, and C. Chen. Expanding domain sentiment lexicon through double propagation. In Proceedings of the 21st International Jont Conference on Artifical Intelligence, IJCAI'09, pages 1199--1204, San Francisco, CA, USA, 2009. Morgan Kaufmann Publishers Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. G. Qiu, B. Liu, J. Bu, and C. Chen. Opinion word expansion and target extraction through double propagation. Computational linguistics, 37(1):9--27, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. L. A. Ramshaw and M. P. Marcus. Text chunking using transformation-based learning. CoRR, cmp-lg/9505040, 1995.Google ScholarGoogle Scholar
  56. E. Riloff and R. Jones. Learning dictionaries for information extraction by multi-level bootstrapping. In Proceedings of the Sixteenth National Conference on Artificial Intelligence and the Eleventh Innovative Applications of Artificial Intelligence Conference Innovative Applications of Artificial Intelligence, AAAI '99/IAAI '99, pages 474--479, Menlo Park, CA, USA, 1999. American Association for Artificial Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. I. n. San Vicente, X. Saralegi, and R. Agerri. Elixa: A modular and flexible absa platform. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pages 748--752, Denver, Colorado, June 2015. Association for Computational Linguistics.Google ScholarGoogle ScholarCross RefCross Ref
  58. E. F. Sang and J. Veenstra. Representing text chunks. In Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics, pages 173--179. Association for Computational Linguistics, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. S. Sarawagi and W. W. Cohen. Semi-markov conditional random fields for information extraction. In NIPS, 2004.Google ScholarGoogle Scholar
  60. Z. Toh and J. Su. Nlangp: Supervised machine learning system for aspect category classification and opinion target extraction. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pages 496--501, Denver, Colorado, June 2015. Association for Computational Linguistics.Google ScholarGoogle ScholarCross RefCross Ref
  61. Z. Toh and W. Wang. Dlirec: Aspect term extraction and term polarity classification system. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pages 235--240, Dublin, Ireland, August 2014. Association for Computational Linguistics and Dublin City University.Google ScholarGoogle ScholarCross RefCross Ref
  62. J. Turian, L. Ratinov, and Y. Bengio. Word representations: A simple and general method for semi-supervised learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL '10, pages 384--394, Stroudsburg, PA, USA, 2010. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. T. Wilson, P. Hoffmann, S. Somasundaran, J. Kessler, J. Wiebe, Y. Choi, C. Cardie, E. Riloff, and S. Patwardhan. Opinionfinder: A system for subjectivity analysis. In Proceedings of HLT/EMNLP on Interactive Demonstrations, HLT-Demo '05, pages 34--35, Stroudsburg, PA, USA, 2005. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. J. Winborg and H. Landström. Financial bootstrapping in small businesses: examining small business managers' resource acquisition behaviors. Journal of Business Venturing, 16(3):235--254, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  65. S. E. Wright and G. Budin. Term selection: the initial phase of terminology management. Handbook of terminology management, 1:13--23, 1997.Google ScholarGoogle Scholar
  66. Y. Wu, Q. Zhang, X. Huang, and L. Wu. Phrase dependency parsing for opinion mining. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3, EMNLP '09, pages 1533--1541, Stroudsburg, PA, USA, 2009. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. B. Yang and C. Cardie. Extracting opinion expressions with semi-markov conditional random fields. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL '12, pages 1335--1345, Stroudsburg, PA, USA, 2012. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. L. Zhang, B. Liu, S. H. Lim, and E. O'Brien-Strain. Extracting and ranking product features in opinion documents. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, COLING '10, pages 1462--1470, Stroudsburg, PA, USA, 2010. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. L. Zhuang, F. Jing, and X.-Y. Zhu. Movie review mining and summarization. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management, CIKM '06, pages 43--50, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Aspect Term and Opinion Target Extraction from Web Product Reviews using Semi-markov Conditional Random Fields with Word Embeddings as Features

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      WIMS '16: Proceedings of the 6th International Conference on Web Intelligence, Mining and Semantics
      June 2016
      309 pages

      Copyright © 2016 ACM

      © 2016 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 June 2016

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      WIMS '16 Paper Acceptance Rate36of53submissions,68%Overall Acceptance Rate140of278submissions,50%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader