short-paper

Impact of Document Representation on Neural Ad hoc Retrieval

Authors:
Ebrahim Bagheri

Ryerson University, Toronto, ON, Canada

Ryerson University, Toronto, ON, Canada
View Profile

,
Faezeh Ensan

Ferdowsi University of Mashhad, Mashhad, Iran

Ferdowsi University of Mashhad, Mashhad, Iran
View Profile

,
Feras Al-Obeidat

Zayed University, Abu Dhabi, UAE

Zayed University, Abu Dhabi, UAE
View Profile

CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge ManagementOctober 2018Pages 1635–1638https://doi.org/10.1145/3269206.3269314

Published:17 October 2018Publication History

CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management

Pages 1635–1638

ABSTRACT

Neural embeddings have been effectively integrated into information retrieval tasks including ad hoc retrieval. One of the benefits of neural embeddings is they allow for the calculation of the similarity between queries and documents through vector similarity calculation methods. While such methods have been effective for document matching, they have an inherent bias towards documents that are sized relatively similarly. Therefore, the difference between the query and document lengths, referred to as the query-document size imbalance problem, becomes an issue when incorporating neural embeddings and their associated similarity calculation models into the ad hoc document retrieval process. In this paper, we propose that document representation methods need to be used to address the size imbalance problem and empirically show their impact on the performance of neural embedding-based ad hoc retrieval. In addition, we explore several types of document representation methods and investigate their impact on the retrieval process. We conduct our experiments on three widely used standard corpora, namely Clueweb09B, Clueweb12B and Robust04 and their associated topics. Summarily, we find that document representation methods are able to effectively address the query-document size imbalance problem and significantly improve the performance of neural ad hoc retrieval. In addition, we find that a document representation method based on a simple term-frequency shows significantly better performance compared to more sophisticated representation methods such as neural composition and aspect-based methods.

References

Ebrahim Bagheri, Faezeh Ensan, and Feras Al-Obeidat. 2018. Neural Word and Entity Embeddings for Ad hoc Retrieval. Information Processing and Management 54, 2 (2018), 339--357. Google ScholarDigital Library
Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefeb- vre. 2008. Fast unfolding of communities in large networks. JSTAT 2008, 10 (2008), P10008.Google ScholarCross Ref
Jeffrey Dalton, Laura Dietz, and James Allan. 2014. Entity query feature expansion using knowledge base links. In SIGIR. ACM, 365--374. Google ScholarDigital Library
Wim De Smet and Marie-Francine Moens. 2009. An aspect based document repre- sentation for event clustering. In Proceedings of the 19th Meeting of Computational Linguistics. 55--68.Google Scholar
Faezeh Ensan and Ebrahim Bagheri. 2017. Document Retrieval Model Through Semantic Linking. In WSDM 2017. 181--190. Google ScholarDigital Library
Dario Fasino and Francesco Tudisco. 2014. An algebraic analysis of the graph modularity. SIAM J. Matrix Anal. Appl. 35, 3 (2014), 997--1018.Google ScholarCross Ref
Debasis Ganguly, Dwaipayan Roy, M. Mitra, and G. Jones. 2015. Word embedding based generalized language model for information retrieval. In SIGIR. 795--798. Google ScholarDigital Library
Jiafeng Guo, Yixing Fan, Qingyao Ai, and W Bruce Croft. 2016. Semantic matching by non-linear word transportation for information retrieval. In CIKM. 701--710. Google ScholarDigital Library
Faegheh Hasibi, Krisztian Balog, and Svein Erik Bratsberg. 2016. Exploiting entity linking in queries for entity retrieval. In ICTIR 2016. ACM, 209--218. Google ScholarDigital Library
Sun Kim, Nicolas Fiorini, W. John Wilbur, and Zhiyong Lu. 2017. Bridging the gap: Incorporating a semantic similarity measure for effectively mapping PubMed queries to documents. Journal of Biomedical Informatics 75 (2017), 122--127. Google ScholarDigital Library
Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. 2015. From word embeddings to document distances. In ICML. 957--966. Google ScholarDigital Library
Saar Kuzi, Anna Shtok, and Oren Kurland. 2016. Query Expansion Using Word Embeddings. In CIKM. 1929--1932. Google ScholarDigital Library
Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In ICML. 1188--1196. Google ScholarDigital Library
Donald Metzler and W Bruce Croft. 2005. A Markov random field model for term dependencies. In SIGIR. 472--479. Google ScholarDigital Library
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In NIPS. 3111--3119. Google ScholarDigital Library
Sunil Mohan, Nicolas Fiorini, Sun Kim, and Zhiyong Lu. 2017. Deep Learning for Biomedical Information Retrieval: Learning Textual Relevance from Click Logs. BioNLP 2017 (2017), 222--231.Google Scholar
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In EMNLP. 1532--1543.Google Scholar
Chenyan Xiong, Jamie Callan, and Tie-Yan Liu. 2017. Learning to Attend and to Rank with Word-Entity Duets. In SIGIR. 763--772. Google ScholarDigital Library
Hamed Zamani and W Bruce Croft. 2016. Embedding-based query language models. In ICTIR 2016. ACM, 147--156. Google ScholarDigital Library
Hamed Zamani and W Bruce Croft. 2016. Estimating embedding vectors for queries. In ICTIR2016. 123--132. Google ScholarDigital Library
Guido Zuccon, Bevan Koopman, Peter Bruza, and Leif Azzopardi. 2015. Integrat- ing and Evaluating Neural Word Embeddings in Information Retrieval. In ADCS 2015. Google ScholarDigital Library

Index Terms

Impact of Document Representation on Neural Ad hoc Retrieval
1. Information systems
  1. Information retrieval

Recommendations

Neural word and entity embeddings for ad hoc retrieval

Learning low dimensional dense representations of the vocabularies of a corpus, known as neural embeddings, has gained much attention in the information retrieval community. While there have been several successful attempts at integrating embeddings ...
Read More
Attentive Neural Architecture for Ad-hoc Structured Document Retrieval
CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management

The problem of ad-hoc structured document retrieval arises in many information access scenarios, from Web to product search. Yet neither deep neural networks, which have been successfully applied to ad-hoc information retrieval and Web search, nor the ...
Read More
A study of the relationship between ad hoc retrieval and expert finding in enterprise environment
WIDM '08: Proceedings of the 10th ACM workshop on Web information and data management

Ad hoc retrieval returns a ranked list of documents in response to a search query, while expert finding returns a ranked list of people in response to an expertise request in the form of a search query, e.g., "information retrieval". In current state of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management
October 2018
2362 pages
ISBN:9781450360142
DOI:10.1145/3269206
General Chair:
Alfredo Cuzzocrea
University of Trieste, Italy
,
Program Chairs:
James Allan
University of Massachusetts, USA
,
Norman Paton
University of Manchester, United Kingdom
,
Divesh Srivastava
AT&T Labs Research, USA
,
Rakesh Agrawal
Data Insights Lab, USA
,
Andrei Broder
Google Research, USA
,
Mohammed Zaki
Rensselaer Polytechnic Institute, USA
,
Selcuk Candan
Arizona State University, USA
,
Alexandros Labrinidis
University of Pittsburgh, USA
,
Assaf Schuster
Technion, Israel
,
Haixun Wang
Google Research, USA
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 October 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
ad hoc retrieval
document representation
neural embeddings
search
Qualifiers
- short-paper
Conference

Acceptance Rates
CIKM '18 Paper Acceptance Rate147of826submissions,18%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 177
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Impact of Document Representation on Neural Ad hoc Retrieval

CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Neural word and entity embeddings for ad hoc retrieval

Attentive Neural Architecture for Ad-hoc Structured Document Retrieval

A study of the relationship between ad hoc retrieval and expert finding in enterprise environment