Years and Authors of Summarized Original Work
2002; Muthukrishan
2009; Hon, Shah, Vitter
2012; Navarro, Nekrich
2013; Shah, Sheng, Thankachan, Vitter
Problem Definition
Indexing data so that it can be easily searched is one of the most fundamental problems in computer science. Especially in the fields of databases and information retrieval, indexing is at the heart of query processing. One of the most popular indexes, used by all search engines, is the inverted index. However, in many cases like bioinformatics, eastern language texts, and phrase queries for Web, one may not be able to assume word demarcations. In such cases, these documents are to be seen as a string of characters. Thus, more sophisticated solutions are required for these string documents.
Formally, we are given a collection of D documents \(\mathcal{D} =\{ d_{1},d_{2},d_{3},\ldots ,d_{D}\}\). Each document d i is a string drawn from the character set \(\varSigma\) of size \(\sigma\)and the total number of characters...
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsRecommended Reading
Afshani P (2008) On dominance reporting in 3d. In: ESA, Karlsruhe, pp 41–51
Arge L, Samoladas V, Vitter JS (1999) On two-dimensional indexability and optimal range search indexing. In: PODS, Philadephia, pp 346–357
Hon WK, Shah R, Vitter JS (2009) Space-efficient framework for top-k string retrieval problems. In: FOCS, Atlanta, pp 713–722
Muthukrishnan S (2002) Efficient algorithms for document retrieval problems. In: SODA, San Francisco, pp 657–666
Navarro G, Nekrich Y (2012) Top-k document retrieval in optimal time and linear space. In: SODA, Kyoto, pp 1066–1077
Shah R, Sheng C, Thankachan SV, Vitter JS (2013) Top-k document retrieval in external memory. In: ESA, Sophia Antipolis, pp 803–814
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media New York
About this entry
Cite this entry
Shah, R. (2016). Document Retrieval on String Collections. In: Kao, MY. (eds) Encyclopedia of Algorithms. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-2864-4_629
Download citation
DOI: https://doi.org/10.1007/978-1-4939-2864-4_629
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-2863-7
Online ISBN: 978-1-4939-2864-4
eBook Packages: Computer ScienceReference Module Computer Science and Engineering