Efficient Top-k Document Retrieval Using a Term-Document Binary Matrix

Fujita, Etsuro; Oyama, Keizo

doi:10.1007/978-3-642-25631-8_27

Efficient Top-k Document Retrieval Using a Term-Document Binary Matrix

Etsuro Fujita²² &
Keizo Oyama^21,22

Conference paper

1322 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7097))

Abstract

Current web search engines perform well for “navigational queries.” However, due to their use of simple conjunctive Boolean filters, such engines perform poorly for “informational queries.” Informational queries would be better handled by a web search engine using an informational retrieval model along with a combination of enhancement techniques such as query expansion and relevance feedback, and the realization of such a engine requires a method to prosess the model efficiently. In this paper, we describe a novel extension of an existing top-k query processing technique. We add a simple data structure called a “term-document binary matrix,” resulting in more efficient evaluation of top-k queries even when the queries have been expanded. We show on the basis of experimental evaluation using the TREC GOV2 data set and expanded versions of the evaluation queries attached to this data set that the expanded technique achieves significant performance gains over existing techniques.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Anh, V.N., Moffat, A.: Pruned query evaluation using pre-computed impacts. In: Proc. of SIGIR, pp. 372–379 (2006)
Google Scholar
Bast, H., Majumdar, D., Schenkel, R., Theobald, M., Weikum, G.: IO-Top-k: index-access optimized top-k query processing. In: Proc. of VLDB, pp. 475–486 (2006)
Google Scholar
Buckley, C., Lewit, A.: Optimization of inverted vector searches. In: Proc. of SIGIR, pp. 97–110 (1985)
Google Scholar
Downey, D., Dumais, S., Liebling, D., Horvitz, E.: Understanding the relationship between searchers’ queries and information goals. In: Proc. of CIKM, pp. 449–458 (2008)
Google Scholar
Fagin, R.: Combining Fuzzy Information: an Overview. SIGMOD Record 31(2), 109–118 (2002)
Article Google Scholar
Kumar, R., Punera, K., Suel, T., Vassilvitskii, S.: Top-k aggregation using intersections of ranked inputs. In: Proc. of WSDM, pp. 222–231 (2009)
Google Scholar
Long, X., Suel, T.: Three-level caching for efficient query processing in large Web search engines. In: Proceedings of WWW, pp. 257–266 (2005)
Google Scholar
Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)
Google Scholar
Persin, M., Zobel, J., Sacks-Davis, R.: Filtered document retrieval with frequency-sorted indexes. Journal of the American Society for Information Science 47(10), 749–764 (1996)
Article Google Scholar
Schenkel, R., Broschart, A., Hwang, S.-w., Theobald, M., Weikum, G.: Efficient Text Proximity Search. In: Ziviani, N., Baeza-Yates, R. (eds.) SPIRE 2007. LNCS, vol. 4726, pp. 287–299. Springer, Heidelberg (2007)
Chapter Google Scholar
Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: Proc. of SIGIR, pp. 21–29 (1996)
Google Scholar
Zhang, J., Long, X., Suel, T.: Performance of compressed inverted list caching in search engines. In: Proc. of WWW, pp. 387–396 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

National Institute of Informatics, Tokyo, Japan
Keizo Oyama
The Graduate University for Advanced Studies (SOKENDAI), Tokyo, Japan
Etsuro Fujita & Keizo Oyama

Authors

Etsuro Fujita
View author publications
You can also search for this author in PubMed Google Scholar
Keizo Oyama
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Computer Science and Engineering, University of Wollongong, Dubai Knowledge Village, P.O. Box 20182, Dubai, United Arab Emirates
Mohamed Vall Mohamed Salem
Faculty of Engineering and IT, Dubai International Academic City, Block 11, 1st and 2nd Floor, P.O. Box 345015, Dubai, United Arab Emirates
Khaled Shaalan
Faculty of Computer Science and Engineering, University of Wollongong, Dubai Knowledge Village, P.O. Box 20183, Dubai, United Arab Emirates
Farhad Oroumchian
Department of Electrical and Computer Engineering, University of Tehran, Faculty of Engineering, North Kargar Street, P.O. Box 14395-515, Tehran, Iran
Azadeh Shakery
Faculty of Computer Science and Engineering, University of Wollongong, Dubai knowledge Village, P.O. Box 20183, Dubai, United Arab Emirates
Halim Khelalfa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fujita, E., Oyama, K. (2011). Efficient Top-k Document Retrieval Using a Term-Document Binary Matrix. In: Salem, M.V.M., Shaalan, K., Oroumchian, F., Shakery, A., Khelalfa, H. (eds) Information Retrieval Technology. AIRS 2011. Lecture Notes in Computer Science, vol 7097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25631-8_27

Download citation

DOI: https://doi.org/10.1007/978-3-642-25631-8_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25630-1
Online ISBN: 978-3-642-25631-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics