ABSTRACT
Neural embeddings have been effectively integrated into information retrieval tasks including ad hoc retrieval. One of the benefits of neural embeddings is they allow for the calculation of the similarity between queries and documents through vector similarity calculation methods. While such methods have been effective for document matching, they have an inherent bias towards documents that are sized relatively similarly. Therefore, the difference between the query and document lengths, referred to as the query-document size imbalance problem, becomes an issue when incorporating neural embeddings and their associated similarity calculation models into the ad hoc document retrieval process. In this paper, we propose that document representation methods need to be used to address the size imbalance problem and empirically show their impact on the performance of neural embedding-based ad hoc retrieval. In addition, we explore several types of document representation methods and investigate their impact on the retrieval process. We conduct our experiments on three widely used standard corpora, namely Clueweb09B, Clueweb12B and Robust04 and their associated topics. Summarily, we find that document representation methods are able to effectively address the query-document size imbalance problem and significantly improve the performance of neural ad hoc retrieval. In addition, we find that a document representation method based on a simple term-frequency shows significantly better performance compared to more sophisticated representation methods such as neural composition and aspect-based methods.
- Ebrahim Bagheri, Faezeh Ensan, and Feras Al-Obeidat. 2018. Neural Word and Entity Embeddings for Ad hoc Retrieval. Information Processing and Management 54, 2 (2018), 339--357. Google ScholarDigital Library
- Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefeb- vre. 2008. Fast unfolding of communities in large networks. JSTAT 2008, 10 (2008), P10008.Google ScholarCross Ref
- Jeffrey Dalton, Laura Dietz, and James Allan. 2014. Entity query feature expansion using knowledge base links. In SIGIR. ACM, 365--374. Google ScholarDigital Library
- Wim De Smet and Marie-Francine Moens. 2009. An aspect based document repre- sentation for event clustering. In Proceedings of the 19th Meeting of Computational Linguistics. 55--68.Google Scholar
- Faezeh Ensan and Ebrahim Bagheri. 2017. Document Retrieval Model Through Semantic Linking. In WSDM 2017. 181--190. Google ScholarDigital Library
- Dario Fasino and Francesco Tudisco. 2014. An algebraic analysis of the graph modularity. SIAM J. Matrix Anal. Appl. 35, 3 (2014), 997--1018.Google ScholarCross Ref
- Debasis Ganguly, Dwaipayan Roy, M. Mitra, and G. Jones. 2015. Word embedding based generalized language model for information retrieval. In SIGIR. 795--798. Google ScholarDigital Library
- Jiafeng Guo, Yixing Fan, Qingyao Ai, and W Bruce Croft. 2016. Semantic matching by non-linear word transportation for information retrieval. In CIKM. 701--710. Google ScholarDigital Library
- Faegheh Hasibi, Krisztian Balog, and Svein Erik Bratsberg. 2016. Exploiting entity linking in queries for entity retrieval. In ICTIR 2016. ACM, 209--218. Google ScholarDigital Library
- Sun Kim, Nicolas Fiorini, W. John Wilbur, and Zhiyong Lu. 2017. Bridging the gap: Incorporating a semantic similarity measure for effectively mapping PubMed queries to documents. Journal of Biomedical Informatics 75 (2017), 122--127. Google ScholarDigital Library
- Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. 2015. From word embeddings to document distances. In ICML. 957--966. Google ScholarDigital Library
- Saar Kuzi, Anna Shtok, and Oren Kurland. 2016. Query Expansion Using Word Embeddings. In CIKM. 1929--1932. Google ScholarDigital Library
- Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In ICML. 1188--1196. Google ScholarDigital Library
- Donald Metzler and W Bruce Croft. 2005. A Markov random field model for term dependencies. In SIGIR. 472--479. Google ScholarDigital Library
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In NIPS. 3111--3119. Google ScholarDigital Library
- Sunil Mohan, Nicolas Fiorini, Sun Kim, and Zhiyong Lu. 2017. Deep Learning for Biomedical Information Retrieval: Learning Textual Relevance from Click Logs. BioNLP 2017 (2017), 222--231.Google Scholar
- Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In EMNLP. 1532--1543.Google Scholar
- Chenyan Xiong, Jamie Callan, and Tie-Yan Liu. 2017. Learning to Attend and to Rank with Word-Entity Duets. In SIGIR. 763--772. Google ScholarDigital Library
- Hamed Zamani and W Bruce Croft. 2016. Embedding-based query language models. In ICTIR 2016. ACM, 147--156. Google ScholarDigital Library
- Hamed Zamani and W Bruce Croft. 2016. Estimating embedding vectors for queries. In ICTIR2016. 123--132. Google ScholarDigital Library
- Guido Zuccon, Bevan Koopman, Peter Bruza, and Leif Azzopardi. 2015. Integrat- ing and Evaluating Neural Word Embeddings in Information Retrieval. In ADCS 2015. Google ScholarDigital Library
Index Terms
- Impact of Document Representation on Neural Ad hoc Retrieval
Recommendations
Neural word and entity embeddings for ad hoc retrieval
Learning low dimensional dense representations of the vocabularies of a corpus, known as neural embeddings, has gained much attention in the information retrieval community. While there have been several successful attempts at integrating embeddings ...
Attentive Neural Architecture for Ad-hoc Structured Document Retrieval
CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge ManagementThe problem of ad-hoc structured document retrieval arises in many information access scenarios, from Web to product search. Yet neither deep neural networks, which have been successfully applied to ad-hoc information retrieval and Web search, nor the ...
A study of the relationship between ad hoc retrieval and expert finding in enterprise environment
WIDM '08: Proceedings of the 10th ACM workshop on Web information and data managementAd hoc retrieval returns a ranked list of documents in response to a search query, while expert finding returns a ranked list of people in response to an expertise request in the form of a search query, e.g., "information retrieval". In current state of ...
Comments