skip to main content
10.1145/3269206.3269314acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

Impact of Document Representation on Neural Ad hoc Retrieval

Published:17 October 2018Publication History

ABSTRACT

Neural embeddings have been effectively integrated into information retrieval tasks including ad hoc retrieval. One of the benefits of neural embeddings is they allow for the calculation of the similarity between queries and documents through vector similarity calculation methods. While such methods have been effective for document matching, they have an inherent bias towards documents that are sized relatively similarly. Therefore, the difference between the query and document lengths, referred to as the query-document size imbalance problem, becomes an issue when incorporating neural embeddings and their associated similarity calculation models into the ad hoc document retrieval process. In this paper, we propose that document representation methods need to be used to address the size imbalance problem and empirically show their impact on the performance of neural embedding-based ad hoc retrieval. In addition, we explore several types of document representation methods and investigate their impact on the retrieval process. We conduct our experiments on three widely used standard corpora, namely Clueweb09B, Clueweb12B and Robust04 and their associated topics. Summarily, we find that document representation methods are able to effectively address the query-document size imbalance problem and significantly improve the performance of neural ad hoc retrieval. In addition, we find that a document representation method based on a simple term-frequency shows significantly better performance compared to more sophisticated representation methods such as neural composition and aspect-based methods.

References

  1. Ebrahim Bagheri, Faezeh Ensan, and Feras Al-Obeidat. 2018. Neural Word and Entity Embeddings for Ad hoc Retrieval. Information Processing and Management 54, 2 (2018), 339--357. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefeb- vre. 2008. Fast unfolding of communities in large networks. JSTAT 2008, 10 (2008), P10008.Google ScholarGoogle ScholarCross RefCross Ref
  3. Jeffrey Dalton, Laura Dietz, and James Allan. 2014. Entity query feature expansion using knowledge base links. In SIGIR. ACM, 365--374. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Wim De Smet and Marie-Francine Moens. 2009. An aspect based document repre- sentation for event clustering. In Proceedings of the 19th Meeting of Computational Linguistics. 55--68.Google ScholarGoogle Scholar
  5. Faezeh Ensan and Ebrahim Bagheri. 2017. Document Retrieval Model Through Semantic Linking. In WSDM 2017. 181--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Dario Fasino and Francesco Tudisco. 2014. An algebraic analysis of the graph modularity. SIAM J. Matrix Anal. Appl. 35, 3 (2014), 997--1018.Google ScholarGoogle ScholarCross RefCross Ref
  7. Debasis Ganguly, Dwaipayan Roy, M. Mitra, and G. Jones. 2015. Word embedding based generalized language model for information retrieval. In SIGIR. 795--798. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jiafeng Guo, Yixing Fan, Qingyao Ai, and W Bruce Croft. 2016. Semantic matching by non-linear word transportation for information retrieval. In CIKM. 701--710. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Faegheh Hasibi, Krisztian Balog, and Svein Erik Bratsberg. 2016. Exploiting entity linking in queries for entity retrieval. In ICTIR 2016. ACM, 209--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Sun Kim, Nicolas Fiorini, W. John Wilbur, and Zhiyong Lu. 2017. Bridging the gap: Incorporating a semantic similarity measure for effectively mapping PubMed queries to documents. Journal of Biomedical Informatics 75 (2017), 122--127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. 2015. From word embeddings to document distances. In ICML. 957--966. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Saar Kuzi, Anna Shtok, and Oren Kurland. 2016. Query Expansion Using Word Embeddings. In CIKM. 1929--1932. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In ICML. 1188--1196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Donald Metzler and W Bruce Croft. 2005. A Markov random field model for term dependencies. In SIGIR. 472--479. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In NIPS. 3111--3119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Sunil Mohan, Nicolas Fiorini, Sun Kim, and Zhiyong Lu. 2017. Deep Learning for Biomedical Information Retrieval: Learning Textual Relevance from Click Logs. BioNLP 2017 (2017), 222--231.Google ScholarGoogle Scholar
  17. Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In EMNLP. 1532--1543.Google ScholarGoogle Scholar
  18. Chenyan Xiong, Jamie Callan, and Tie-Yan Liu. 2017. Learning to Attend and to Rank with Word-Entity Duets. In SIGIR. 763--772. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Hamed Zamani and W Bruce Croft. 2016. Embedding-based query language models. In ICTIR 2016. ACM, 147--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Hamed Zamani and W Bruce Croft. 2016. Estimating embedding vectors for queries. In ICTIR2016. 123--132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Guido Zuccon, Bevan Koopman, Peter Bruza, and Leif Azzopardi. 2015. Integrat- ing and Evaluating Neural Word Embeddings in Information Retrieval. In ADCS 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Impact of Document Representation on Neural Ad hoc Retrieval

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management
          October 2018
          2362 pages
          ISBN:9781450360142
          DOI:10.1145/3269206

          Copyright © 2018 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 17 October 2018

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • short-paper

          Acceptance Rates

          CIKM '18 Paper Acceptance Rate147of826submissions,18%Overall Acceptance Rate1,861of8,427submissions,22%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader