skip to main content
10.1145/3366030.3366126acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiiwasConference Proceedingsconference-collections
short-paper

Target-Topic Aware Doc2Vec for Short Sentence Retrieval from User Generated Content

Published: 22 February 2020 Publication History

Abstract

This paper proposes a new method of supplementing the context of short sentences for the training phase of Doc2Vec. Since CGM (Consumer Generated Media) sites and SNS sites become widespread, the importance of similarity calculation between a given query and a short sentence is increasing. As an example, a search by the query "sad" should find actual expressions such as "I needed a handkerchief" on a movie review site. Doc2Vec is one of the most widely used methods for vectorization of queries and sentences. However, Doc2Vec often exhibits low accuracy if the training data consists of short sentences, because they lack context. We modified Doc2Vec with the hypothesis that other posts for the same topic (i.e. reviews for the same movie in online movie review sites) may share the same background. Our method uses target-topic IDs instead of sentence IDs as the context in the training phase of the Doc2Vec with the PV-DM model; this model estimates the next term from a few previous terms and context. The model trained with item IDs vectorizes a sentence more accurately than a model trained with sentence IDs. We conducted a large-scale experiment using 1.2 million movie review posts and a crowdsourcing-based evaluation. The experimental result demonstrates that our new method achieves higher precision and nDCG than previous Doc2Vec variants and traditional topic modeling methods.

References

[1]
Nadeem Bader, Osnat Mokryn, and Joel Lanir. 2017. Exploring Emotions in Online Movie Reviews for Online Browsing. In Proceedings of the 22Nd International Conference on Intelligent User Interfaces Companion (IUI '17 Companion). ACM, New York, NY, USA, 35--38. https://doi.org/10.1145/3030024.3040982
[2]
Scott Deerwester, Susan T Dumais, George W Furnas, Thomas K Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the American society for information science 41, 6 (1990), 391--407.
[3]
Nan Hu, Paul A. Pavlou, and Jennifer Zhang. 2006. Can Online Reviews Reveal a Product's True Quality?: Empirical Findings and Analytical Modeling of Online Word-of-mouth Communication. In Proceedings of the 7th ACM Conference on Electronic Commerce (EC '06). ACM, New York, NY, USA, 324--330. https://doi.org/10.1145/1134707.1134743
[4]
Yohan Jo and Alice H. Oh. 2011. Aspect and Sentiment Unification Model for Online Review Analysis. In Proceedings of the Fourth ACM International Conference on Web Search and Data Mining (WSDM '11). ACM, New York, NY, USA, 815--824. https://doi.org/10.1145/1935826.1935932
[5]
Kenji Sugiki and Shigeki Matsubara. 2007. A product retrieval system robust to subjective queries. In 2007 2nd International Conference on Digital Information Management, Vol. 1. 351--356. https://doi.org/10.1109/ICDIM.2007.4444248
[6]
Quoc Le and Tomas Mikolov. 2014. Distributed Representations of Sentences and Documents. In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32 (ICML'14). JMLR.org, II1188--II1196. http://dl.acm.org/citation.cfm?id=3044805.3045025
[7]
Vivek Kumar Singh, Rajesh Piryani, Ashraf Uddin, and Pranav Waila. 2013. Sentiment analysis of movie reviews: A new feature-based heuristic for aspect-level sentiment classification. In 2013 International Mutli-Conference on Automation, Computing, Communication, Control and Compressed Sensing (iMac4s). 712--717. https://doi.org/10.1109/iMac4s.2013.6526500
[8]
Jiaxing Tan, Alexander Kotov, Rojiar Pir Mohammadiani, and Yumei Huo. 2017. Sentence Retrieval with Sentiment-specific Topical Anchoring for Review Summarization. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM '17). ACM, New York, NY, USA, 2323--2326. https://doi.org/10.1145/3132847.3133153
[9]
Lap Q. Trieu, Huy Q. Tran, and Minh-Triet Tran. 2017. News Classification from Social Media Using Twitter-based Doc2Vec Model and Automatic Query Expansion. In Proceedings of the Eighth International Symposium on Information and Communication Technology (SoICT 2017). ACM, New York, NY, USA, 460--467. https://doi.org/10.1145/3155133.3155206
[10]
Christophe Van Gysel, Maarten de Rijke, and Evangelos Kanoulas. 2018. Mix 'N Match: Integrating Text Matching and Product Substitutability Within Product Search. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM '18). ACM, New York, NY, USA, 1373--1382. https://doi.org/10.1145/3269206.3271668
[11]
Libing Wu, Cong Quan, Chenliang Li, and Donghong Ji. 2018. PARL: Let Strangers Speak Out What You Like. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM 18). ACM, New York, NY, USA, 677--686. https://doi.org/10.1145/3269206.3271695
[12]
Li Zhuang, Feng Jing, and Xiao-Yan Zhu. 2006. Movie Review Mining and Summarization. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management (CIKM '06). ACM, New York, NY, USA, 43--50. https://doi.org/10.1145/1183614.1183625
[13]
Yuan Zuo, Junjie Wu, Hui Zhang, Hao Lin, Fei Wang, Ke Xu, and Hui Xiong. 2016. Topic Modeling of Short Texts: A Pseudo-Document View. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16). ACM, New York, NY, USA, 2105--2114. https://doi.org/10.1145/2939672.2939880

Cited By

View all
  • (2025)Review Search Interface Based on Search Result Summarization Using Large Language ModelDatabase Systems for Advanced Applications. DASFAA 2024 International Workshops10.1007/978-981-96-0914-7_23(293-301)Online publication date: 23-Jan-2025
  • (2021)Learning to Rank-based Approach for Movie Search by Keyword Query and Example QueryThe 23rd International Conference on Information Integration and Web Intelligence10.1145/3487664.3487779(137-145)Online publication date: 29-Nov-2021
  • (2021)Doc2Vec-based Approach for Extracting Diverse Evaluation Expressions from Online Review DataThe 23rd International Conference on Information Integration and Web Intelligence10.1145/3487664.3487773(11-18)Online publication date: 29-Nov-2021

Index Terms

  1. Target-Topic Aware Doc2Vec for Short Sentence Retrieval from User Generated Content

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    iiWAS2019: Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services
    December 2019
    709 pages
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    • JKU: Johannes Kepler Universität Linz
    • @WAS: International Organization of Information Integration and Web-based Applications and Services

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 February 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Doc2Vec
    2. Impression Retrieval
    3. Information Retrieval
    4. Online Review Sites

    Qualifiers

    • Short-paper
    • Research
    • Refereed limited

    Conference

    iiWAS2019

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 13 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Review Search Interface Based on Search Result Summarization Using Large Language ModelDatabase Systems for Advanced Applications. DASFAA 2024 International Workshops10.1007/978-981-96-0914-7_23(293-301)Online publication date: 23-Jan-2025
    • (2021)Learning to Rank-based Approach for Movie Search by Keyword Query and Example QueryThe 23rd International Conference on Information Integration and Web Intelligence10.1145/3487664.3487779(137-145)Online publication date: 29-Nov-2021
    • (2021)Doc2Vec-based Approach for Extracting Diverse Evaluation Expressions from Online Review DataThe 23rd International Conference on Information Integration and Web Intelligence10.1145/3487664.3487773(11-18)Online publication date: 29-Nov-2021

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media