research-article

Doc2Vec-based Approach for Extracting Diverse Evaluation Expressions from Online Review Data

Authors:

Kosuke Kurihara,

Yoshiyuki Shoji,

Martin J. DürstAuthors Info & Claims

iiWAS2021: The 23rd International Conference on Information Integration and Web Intelligence

Pages 11 - 18

https://doi.org/10.1145/3487664.3487773

Published: 30 December 2021 Publication History

Abstract

This paper proposes a method for extracting diverse expressions from online movie review texts for a given keyword query. When people watch a movie that makes them cry, they generally do not say “I cried.” Instead, they use such euphemistic language as “I needed a handkerchief” or “My makeup was running.” To enable information retrieval based on audience reactions such as “movies that make me cry” using review texts, a variety of paraphrased expressions must be collected for arbitrary queries. Our proposed method extracts such expressions from review datasets by applying two extensions to Doc2Vec: 1) it changes the granularity of the training sentences to mitigate a lack of context, and 2) it applies query expansion for similarity calculation in advance. We conducted a large-scale experiment using crowdsourcing with 1.29 million actual sentences taken from Yahoo! Movies, Japan. The experimental result revealed that changing the training data granularity and adding the query expansion are both effective to accurately collect more diverse expressions that have a meaning similar to the given query.

References

[1]

Nadeem Bader, Osnat Mokryn, and Joel Lanir. 2017. Exploring Emotions in Online Movie Reviews for Online Browsing. In Proceedings of the 22Nd International Conference on Intelligent User Interfaces Companion(Limassol, Cyprus) (IUI ’17 Companion). ACM, New York, NY, USA, 35–38. https://doi.org/10.1145/3030024.3040982

Digital Library

[2]

Oren Barkan and Noam Koenigstein. 2016. Item2vec: neural item embedding for collaborative filtering. In 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, 1–6.

[3]

Abd Samad Hasan Basari, Burairah Hussin, I Gede Pramudya Ananta, and Junta Zeniarja. 2013. Opinion mining of movie review using hybrid method of support vector machine and particle swarm optimization. Procedia Engineering 53(2013), 453–462.

[4]

Arti Buche, M. B. Chandak, and Akshay Zadgaonkar. 39 – 48. Opinion Mining and Analysis: A survey. International Journal on Natural Language Computing 2, 3 (39 – 48).

[5]

Scott Deerwester, Susan T Dumais, George W Furnas, Thomas K Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the American society for information science 41, 6(1990), 391–407.

[6]

Fatemeh Hemmatian and Mohammad Karim Sohrabi. 2019. A survey on classification techniques for opinion mining and sentiment analysis. Artificial Intelligence Review 52, 3 (2019), 1495–1545.

Digital Library

[7]

Nan Hu, Paul A. Pavlou, and Jennifer Zhang. 2006. Can Online Reviews Reveal a Product’s True Quality?: Empirical Findings and Analytical Modeling of Online Word-of-mouth Communication. In Proceedings of the 7th ACM Conference on Electronic Commerce (Ann Arbor, Michigan, USA) (EC ’06). ACM, New York, NY, USA, 324–330. https://doi.org/10.1145/1134707.1134743

Digital Library

[8]

Yohan Jo and Alice H. Oh. 2011. Aspect and Sentiment Unification Model for Online Review Analysis. In Proceedings of the Fourth ACM International Conference on Web Search and Data Mining (Hong Kong, China) (WSDM ’11). ACM, New York, NY, USA, 815–824. https://doi.org/10.1145/1935826.1935932

Digital Library

[9]

Kenji Sugiki and Shigeki Matsubara. 2007. A product retrieval system robust to subjective queries. In 2007 2nd International Conference on Digital Information Management, Vol. 1. 351–356. https://doi.org/10.1109/ICDIM.2007.4444248

[10]

Jayashri Khairnar and Mayura Kinikar. 2013. Machine learning algorithms for opinion mining and sentiment classification. International Journal of Scientific and Research Publications 3, 6(2013), 1–6.

[11]

Kosuke Kurihara, Yoshiyuki Shoji, Sumio Fujita, and Martin J. Dürst. 2019. Target-Topic Aware Doc2Vec for Short Sentence Retrieval from User Generated Content. In Proceedings of the 21st International Conference on Information Integration and Web-Based Applications & Services (Munich, Germany) (iiWAS2019). Association for Computing Machinery, New York, NY, USA, 463–467.

Digital Library

[12]

J. Richard Landis and Gary G. Koch. 1977. The Measurement of Observer Agreement for Categorical Data. Biometrics 33, 1 (1977), 159–174. http://www.jstor.org/stable/2529310

[13]

Quoc Le and Tomas Mikolov. 2014. Distributed Representations of Sentences and Documents. In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32 (Beijing, China) (ICML’14). JMLR.org, II–1188–II–1196. http://dl.acm.org/citation.cfm?id=3044805.3045025

Digital Library

[14]

Bing Liu and Lei Zhang. 2012. A survey of opinion mining and sentiment analysis. In Mining text data. Springer, 415–463.

[15]

Chien-Liang Liu, Wen-Hoar Hsaio, Chia-Hoang Lee, Gen-Chi Lu, and Emery Jou. 2011. Movie rating and review summarization in mobile environment. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42, 3 (2011), 397–407.

Digital Library

[16]

Gaojun Liu and Xingyu Wu. 2019. Using collaborative filtering algorithms combined with Doc2Vec for movie recommendation. In 2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC). IEEE, 1461–1464.

[17]

Van-Thuy Phi, Liu Chen, and Yu Hirate. 2016. Distributed representation based recommender systems in e-commerce. In DEIM Forum.

[18]

Vijay B Raut and DD Londhe. 2014. Opinion mining and summarization of hotel reviews. In 2014 International Conference on Computational Intelligence and Communication Networks. IEEE, 556–559.

Digital Library

[19]

Sumbal Riaz, Mehvish Fatima, Muhammad Kamran, and M Wasif Nisar. 2019. Opinion mining on large scale data using sentiment analysis and k-means clustering. Cluster Computing 22, 3 (2019), 7149–7164.

[20]

Vivek Kumar Singh, Rajesh Piryani, Ashraf Uddin, and Pranav Waila. 2013. Sentiment analysis of movie reviews: A new feature-based heuristic for aspect-level sentiment classification. In 2013 International Mutli-Conference on Automation, Computing, Communication, Control and Compressed Sensing (iMac4s). 712–717. https://doi.org/10.1109/iMac4s.2013.6526500

[21]

Jiaxing Tan, Alexander Kotov, Rojiar Pir Mohammadiani, and Yumei Huo. 2017. Sentence Retrieval with Sentiment-specific Topical Anchoring for Review Summarization. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management(Singapore, Singapore) (CIKM ’17). ACM, New York, NY, USA, 2323–2326. https://doi.org/10.1145/3132847.3133153

Digital Library

[22]

Ivan Titov and Ryan McDonald. 2008. A joint model of text and aspect ratings for sentiment summarization. In proceedings of ACL-08: HLT. 308–316.

[23]

Lap Q. Trieu, Huy Q. Tran, and Minh-Triet Tran. 2017. News Classification from Social Media Using Twitter-based Doc2Vec Model and Automatic Query Expansion. In Proceedings of the Eighth International Symposium on Information and Communication Technology (Nha Trang City, Viet Nam) (SoICT 2017). ACM, New York, NY, USA, 460–467. https://doi.org/10.1145/3155133.3155206

Digital Library

[24]

Christophe Van Gysel, Maarten de Rijke, and Evangelos Kanoulas. 2018. Mix ’N Match: Integrating Text Matching and Product Substitutability Within Product Search. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (Torino, Italy) (CIKM ’18). ACM, New York, NY, USA, 1373–1382. https://doi.org/10.1145/3269206.3271668

Digital Library

[25]

Libing Wu, Cong Quan, Chenliang Li, and Donghong Ji. 2018. PARL: Let Strangers Speak Out What You Like. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (Torino, Italy) (CIKM ’18). ACM, New York, NY, USA, 677–686. https://doi.org/10.1145/3269206.3271695

Digital Library

[26]

Lili Zhao and Chunping Li. 2009. Ontology based opinion mining for movie reviews. In International Conference on Knowledge Science, Engineering and Management. Springer, 204–214.

Digital Library

[27]

Li Zhuang, Feng Jing, and Xiao-Yan Zhu. 2006. Movie Review Mining and Summarization. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management (Arlington, Virginia, USA) (CIKM ’06). ACM, New York, NY, USA, 43–50. https://doi.org/10.1145/1183614.1183625

Digital Library

[28]

Yuan Zuo, Junjie Wu, Hui Zhang, Hao Lin, Fei Wang, Ke Xu, and Hui Xiong. 2016. Topic Modeling of Short Texts: A Pseudo-Document View. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD ’16). ACM, New York, NY, USA, 2105–2114. https://doi.org/10.1145/2939672.2939880

Digital Library

Index Terms

Doc2Vec-based Approach for Extracting Diverse Evaluation Expressions from Online Review Data
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals

Index terms have been assigned to the content through auto-classification.

Recommendations

Sentiment classification for unlabeled dataset using Doc2Vec with JST
ICEC '16: Proceedings of the 18th Annual International Conference on Electronic Commerce: e-Commerce in Smart connected World

Supervised learning require sentiment labeled corpus for training. But it is hard to apply automatic sentiment classification system to new domain because labeled dataset construction costs a lot of time. Meanwhile, researches using Doc2vec based ...
Specialists, Scientists, and Sentiments: Word2Vec and Doc2Vec in Analysis of Scientific and Medical Texts
Abstract
Analyze performance of unsupervised embedding algorithms in sentiment analysis of knowledge-rich data sets. We apply state-of-the-art embedding algorithms Word2Vec and Doc2Vec as the learning techniques. The algorithms build word and document ...
Off-topic Detection Model based on Biterm-LDA and Doc2vec
HPCCT '19: Proceedings of the 2019 3rd High Performance Computing and Cluster Technologies Conference

Chinese writing in primary and secondary schools occupies an extremely important position in Chinese education. With the advent of natural language processing, the automatic e ssay review system has gradually matured, which has greatly promoted the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

iiWAS2021: The 23rd International Conference on Information Integration and Web Intelligence

November 2021

658 pages

ISBN:9781450395564

DOI:10.1145/3487664

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 December 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

iiWAS2021

iiWAS2021: The 23rd International Conference on Information Integration and Web Intelligence

November 29 - December 1, 2021

Linz, Austria

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
47
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)1

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten