Skip to main content

Document Retrieval on String Collections

  • Reference work entry
  • First Online:
  • 34 Accesses

Years and Authors of Summarized Original Work

  • 2002; Muthukrishan

  • 2009; Hon, Shah, Vitter

  • 2012; Navarro, Nekrich

  • 2013; Shah, Sheng, Thankachan, Vitter

Problem Definition

Indexing data so that it can be easily searched is one of the most fundamental problems in computer science. Especially in the fields of databases and information retrieval, indexing is at the heart of query processing. One of the most popular indexes, used by all search engines, is the inverted index. However, in many cases like bioinformatics, eastern language texts, and phrase queries for Web, one may not be able to assume word demarcations. In such cases, these documents are to be seen as a string of characters. Thus, more sophisticated solutions are required for these string documents.

Formally, we are given a collection of D documents \(\mathcal{D} =\{ d_{1},d_{2},d_{3},\ldots ,d_{D}\}\). Each document d i is a string drawn from the character set \(\varSigma\) of size \(\sigma\)and the total number of characters...

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   1,599.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   1,999.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Recommended Reading

  1. Afshani P (2008) On dominance reporting in 3d. In: ESA, Karlsruhe, pp 41–51

    MATH  Google Scholar 

  2. Arge L, Samoladas V, Vitter JS (1999) On two-dimensional indexability and optimal range search indexing. In: PODS, Philadephia, pp 346–357

    Google Scholar 

  3. Hon WK, Shah R, Vitter JS (2009) Space-efficient framework for top-k string retrieval problems. In: FOCS, Atlanta, pp 713–722

    MATH  Google Scholar 

  4. Muthukrishnan S (2002) Efficient algorithms for document retrieval problems. In: SODA, San Francisco, pp 657–666

    MATH  Google Scholar 

  5. Navarro G, Nekrich Y (2012) Top-k document retrieval in optimal time and linear space. In: SODA, Kyoto, pp 1066–1077

    Google Scholar 

  6. Shah R, Sheng C, Thankachan SV, Vitter JS (2013) Top-k document retrieval in external memory. In: ESA, Sophia Antipolis, pp 803–814

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media New York

About this entry

Cite this entry

Shah, R. (2016). Document Retrieval on String Collections. In: Kao, MY. (eds) Encyclopedia of Algorithms. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-2864-4_629

Download citation

Publish with us

Policies and ethics