Skip to main content

Compressed Document Retrieval on String Collections

  • Reference work entry
  • First Online:
  • 67 Accesses

Years and Authors of Summarized Original Work

  • 2009; Hon, Shah, Vitter

  • 2013; Belazzougui, Navarro, Valenzuela

  • 2013; Tsur

  • 2014; Hon, Shah, Thankachan, Vitter

  • 2014; Navarro, Thankachan

Problem Definition

We face the following problem.

Problem 1 (Top-k document retrieval)

Let\(\mathcal{D} =\{ \mathsf{T}_{1},\mathsf{T}_{2},\ldots ,\mathsf{T}_{D}\}\)be a collection of D documents of n characters in total, drawn from an alphabet set Σ = [σ]. The relevance of a documentTdwith respect to a pattern P, denoted by w(P,d) is a function of the set of occurrences of P inTd. Our task is to index\(\mathcal{D}\), such that whenever a pattern P[1,p] and a parameter k comes as a query, the k documents with the highest w(P,⋅) values can be reported efficiently.

Compressed Document Retrieval on String Collections, Table 1 Indexes of space \(2\vert \mathsf{CSA}\vert + D\log (n/D) + O(D) + o(n)\) bits

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   1,599.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   1,999.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Recommended Reading

  1. Belazzougui D, Navarro G, Valenzuela D (2013) Improved compressed indexes for full-text document retrieval. J Discret Algorithms 18:3–13

    Article  MathSciNet  MATH  Google Scholar 

  2. Gagie T, Kärkkäinen J, Navarro G, Puglisi SJ (2013) Colored range queries and document retrieval. Theor Comput Sci 483:36–50

    Article  MathSciNet  MATH  Google Scholar 

  3. Hon WK, Shah R, Vitter JS (2009) Space-efficient framework for top-k string retrieval problems. In: FOCS, Atlanta, pp 713–722

    MATH  Google Scholar 

  4. Hon WK, Shah R, Thankachan SV, Vitter JS (2014) Space-efficient frameworks for top-k string retrieval. J ACM 61(2):9

    Article  MathSciNet  MATH  Google Scholar 

  5. Navarro G (2014) Spaces, trees, and colors: the algorithmic landscape of document retrieval on sequences. ACM Comput Surv 46(4):52

    Article  MATH  Google Scholar 

  6. Navarro G, Mäkinen V (2007) Compressed full-text indexes. ACM Comput Surv 39(1):2

    Article  MATH  Google Scholar 

  7. Navarro G, Nekrich Y (2012) Top-k document retrieval in optimal time and linear space. In: SODA, Kyoto, pp 1066–1077

    Google Scholar 

  8. Navarro G, Thankachan SV (2014) New space/time tradeoffs for top-k document retrieval on sequences. Theor Comput Sci 542:83–97

    Article  MathSciNet  MATH  Google Scholar 

  9. Raman R, Raman V, Satti SR (2007) Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Trans Algorithms 3(4):43

    Article  MathSciNet  Google Scholar 

  10. Russo L, Navarro G, Oliveira AL (2011) Fully compressed suffix trees. ACM Trans Algorithms 7(4):53

    Article  MathSciNet  MATH  Google Scholar 

  11. Shah R, Sheng C, Thankachan SV, Vitter JS (2013) Top-k document retrieval in external memory. In: ESA, Sophia Antipolis, pp 803–814

    MATH  Google Scholar 

  12. Tsur D (2013) Top-k document retrieval in optimal space. Inf Process Lett 113(12):440–443

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media New York

About this entry

Cite this entry

Thankachan, S.V. (2016). Compressed Document Retrieval on String Collections. In: Kao, MY. (eds) Encyclopedia of Algorithms. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-2864-4_644

Download citation

Publish with us

Policies and ethics