short-paper

Target-Topic Aware Doc2Vec for Short Sentence Retrieval from User Generated Content

Authors:

Kosuke Kurihara,

Yoshiyuki Shoji,

Sumio Fujita,

Martin J. DürstAuthors Info & Claims

iiWAS2019: Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services

Pages 463 - 467

https://doi.org/10.1145/3366030.3366126

Published: 22 February 2020 Publication History

Get Access

Abstract

This paper proposes a new method of supplementing the context of short sentences for the training phase of Doc2Vec. Since CGM (Consumer Generated Media) sites and SNS sites become widespread, the importance of similarity calculation between a given query and a short sentence is increasing. As an example, a search by the query "sad" should find actual expressions such as "I needed a handkerchief" on a movie review site. Doc2Vec is one of the most widely used methods for vectorization of queries and sentences. However, Doc2Vec often exhibits low accuracy if the training data consists of short sentences, because they lack context. We modified Doc2Vec with the hypothesis that other posts for the same topic (i.e. reviews for the same movie in online movie review sites) may share the same background. Our method uses target-topic IDs instead of sentence IDs as the context in the training phase of the Doc2Vec with the PV-DM model; this model estimates the next term from a few previous terms and context. The model trained with item IDs vectorizes a sentence more accurately than a model trained with sentence IDs. We conducted a large-scale experiment using 1.2 million movie review posts and a crowdsourcing-based evaluation. The experimental result demonstrates that our new method achieves higher precision and nDCG than previous Doc2Vec variants and traditional topic modeling methods.

References

[1]

Nadeem Bader, Osnat Mokryn, and Joel Lanir. 2017. Exploring Emotions in Online Movie Reviews for Online Browsing. In Proceedings of the 22Nd International Conference on Intelligent User Interfaces Companion (IUI '17 Companion). ACM, New York, NY, USA, 35--38. https://doi.org/10.1145/3030024.3040982

Digital Library

Google Scholar

[2]

Scott Deerwester, Susan T Dumais, George W Furnas, Thomas K Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the American society for information science 41, 6 (1990), 391--407.

Crossref

Google Scholar

[3]

Nan Hu, Paul A. Pavlou, and Jennifer Zhang. 2006. Can Online Reviews Reveal a Product's True Quality?: Empirical Findings and Analytical Modeling of Online Word-of-mouth Communication. In Proceedings of the 7th ACM Conference on Electronic Commerce (EC '06). ACM, New York, NY, USA, 324--330. https://doi.org/10.1145/1134707.1134743

Digital Library

Google Scholar

[4]

Yohan Jo and Alice H. Oh. 2011. Aspect and Sentiment Unification Model for Online Review Analysis. In Proceedings of the Fourth ACM International Conference on Web Search and Data Mining (WSDM '11). ACM, New York, NY, USA, 815--824. https://doi.org/10.1145/1935826.1935932

Crossref

Google Scholar

[5]

Kenji Sugiki and Shigeki Matsubara. 2007. A product retrieval system robust to subjective queries. In 2007 2nd International Conference on Digital Information Management, Vol. 1. 351--356. https://doi.org/10.1109/ICDIM.2007.4444248

Crossref

Google Scholar

[6]

Quoc Le and Tomas Mikolov. 2014. Distributed Representations of Sentences and Documents. In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32 (ICML'14). JMLR.org, II1188--II1196. http://dl.acm.org/citation.cfm?id=3044805.3045025

Digital Library

Google Scholar

[7]

Vivek Kumar Singh, Rajesh Piryani, Ashraf Uddin, and Pranav Waila. 2013. Sentiment analysis of movie reviews: A new feature-based heuristic for aspect-level sentiment classification. In 2013 International Mutli-Conference on Automation, Computing, Communication, Control and Compressed Sensing (iMac4s). 712--717. https://doi.org/10.1109/iMac4s.2013.6526500

Crossref

Google Scholar

[8]

Jiaxing Tan, Alexander Kotov, Rojiar Pir Mohammadiani, and Yumei Huo. 2017. Sentence Retrieval with Sentiment-specific Topical Anchoring for Review Summarization. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM '17). ACM, New York, NY, USA, 2323--2326. https://doi.org/10.1145/3132847.3133153

Digital Library

Google Scholar

[9]

Lap Q. Trieu, Huy Q. Tran, and Minh-Triet Tran. 2017. News Classification from Social Media Using Twitter-based Doc2Vec Model and Automatic Query Expansion. In Proceedings of the Eighth International Symposium on Information and Communication Technology (SoICT 2017). ACM, New York, NY, USA, 460--467. https://doi.org/10.1145/3155133.3155206

Digital Library

Google Scholar

[10]

Christophe Van Gysel, Maarten de Rijke, and Evangelos Kanoulas. 2018. Mix 'N Match: Integrating Text Matching and Product Substitutability Within Product Search. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM '18). ACM, New York, NY, USA, 1373--1382. https://doi.org/10.1145/3269206.3271668

Crossref

Google Scholar

[11]

Libing Wu, Cong Quan, Chenliang Li, and Donghong Ji. 2018. PARL: Let Strangers Speak Out What You Like. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM 18). ACM, New York, NY, USA, 677--686. https://doi.org/10.1145/3269206.3271695

Digital Library

Google Scholar

[12]

Li Zhuang, Feng Jing, and Xiao-Yan Zhu. 2006. Movie Review Mining and Summarization. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management (CIKM '06). ACM, New York, NY, USA, 43--50. https://doi.org/10.1145/1183614.1183625

Digital Library

Google Scholar

[13]

Yuan Zuo, Junjie Wu, Hui Zhang, Hao Lin, Fei Wang, Ke Xu, and Hui Xiong. 2016. Topic Modeling of Short Texts: A Pseudo-Document View. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16). ACM, New York, NY, USA, 2105--2114. https://doi.org/10.1145/2939672.2939880

Digital Library

Google Scholar

Cited By

View all

Fujii MKawada YYamamoto TYumoto T(2025)Review Search Interface Based on Search Result Summarization Using Large Language ModelDatabase Systems for Advanced Applications. DASFAA 2024 International Workshops10.1007/978-981-96-0914-7_23(293-301)Online publication date: 23-Jan-2025
https://doi.org/10.1007/978-981-96-0914-7_23
Kurihara KShoji YFujita SDürst M(2021)Learning to Rank-based Approach for Movie Search by Keyword Query and Example QueryThe 23rd International Conference on Information Integration and Web Intelligence10.1145/3487664.3487779(137-145)Online publication date: 29-Nov-2021
https://dl.acm.org/doi/10.1145/3487664.3487779
Kurihara KShoji YFujita SDürst M(2021)Doc2Vec-based Approach for Extracting Diverse Evaluation Expressions from Online Review DataThe 23rd International Conference on Information Integration and Web Intelligence10.1145/3487664.3487773(11-18)Online publication date: 29-Nov-2021
https://dl.acm.org/doi/10.1145/3487664.3487773

Index Terms

Target-Topic Aware Doc2Vec for Short Sentence Retrieval from User Generated Content
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
      1. Similarity measures

Recommendations

Off-topic Detection Model based on Biterm-LDA and Doc2vec
HPCCT '19: Proceedings of the 2019 3rd High Performance Computing and Cluster Technologies Conference

Chinese writing in primary and secondary schools occupies an extremely important position in Chinese education. With the advent of natural language processing, the automatic e ssay review system has gradually matured, which has greatly promoted the ...
Sentiment classification for unlabeled dataset using Doc2Vec with JST
ICEC '16: Proceedings of the 18th Annual International Conference on Electronic Commerce: e-Commerce in Smart connected World

Supervised learning require sentiment labeled corpus for training. But it is hard to apply automatic sentiment classification system to new domain because labeled dataset construction costs a lot of time. Meanwhile, researches using Doc2vec based ...
Detecting short-term cyclical topic dynamics in the user-generated content and news

With the maturation of the Internet and the mobile technology, Internet users are now able to produce and consume text data in different contexts. Linking the context to the text data can provide valuable information regarding users' activities and ...

Comments

Information & Contributors

Information

Published In

iiWAS2019: Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services

December 2019

709 pages

ISBN:9781450371797

DOI:10.1145/3366030

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

JKU: Johannes Kepler Universität Linz
@WAS: International Organization of Information Integration and Web-based Applications and Services

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 February 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper
Research
Refereed limited

Conference

iiWAS2019

iiWAS2019: The 21st International Conference on Information Integration and Web-based Applications & Services

December 2 - 4, 2019

Munich, Germany

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
125
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)1

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Fujii MKawada YYamamoto TYumoto T(2025)Review Search Interface Based on Search Result Summarization Using Large Language ModelDatabase Systems for Advanced Applications. DASFAA 2024 International Workshops10.1007/978-981-96-0914-7_23(293-301)Online publication date: 23-Jan-2025
https://doi.org/10.1007/978-981-96-0914-7_23
Kurihara KShoji YFujita SDürst M(2021)Learning to Rank-based Approach for Movie Search by Keyword Query and Example QueryThe 23rd International Conference on Information Integration and Web Intelligence10.1145/3487664.3487779(137-145)Online publication date: 29-Nov-2021
https://dl.acm.org/doi/10.1145/3487664.3487779
Kurihara KShoji YFujita SDürst M(2021)Doc2Vec-based Approach for Extracting Diverse Evaluation Expressions from Online Review DataThe 23rd International Conference on Information Integration and Web Intelligence10.1145/3487664.3487773(11-18)Online publication date: 29-Nov-2021
https://dl.acm.org/doi/10.1145/3487664.3487773

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Off-topic Detection Model based on Biterm-LDA and Doc2vec

Sentiment classification for unlabeled dataset using Doc2Vec with JST

Detecting short-term cyclical topic dynamics in the user-generated content and news

Comments

Information

Published In

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations