Top-k Term-Proximity in Succinct Space

Munro, J. Ian; Navarro, Gonzalo; Nielsen, Jesper Sindahl; Shah, Rahul; Thankachan, Sharma V.

Artículo

Open/Download

Top-kTerm-Proximity-in-Succinct-Space.pdf (365.5Kb)

Publication date

2017

Metadata

Show full item record

Cómo citar

Top-k Term-Proximity in Succinct SpaceFormato de cita

Copiar

Cerrar

Author

Abstract

LetD={T1,T2,...,TD}be a collection ofDstring doc-uments ofncharacters in total, that are drawn from an alphabet setΣ= [σ]. Thetop-kdocument retrieval problemis to preprocessDintoa data structure that, given a query (P[1..p],k), can return thekdocu-ments ofDmost relevant to patternP. The relevance is captured usinga predefined ranking function, which depends on the set of occurrencesofPinTd. For example, it can be the term frequency (i.e., the num-ber of occurrences ofPinTd), or it can be the term proximity (i.e., thedistance between the closest pair of occurrences ofPinTd), or a pattern-independent importance score ofTdsuch as PageRank. Linear space andoptimal query time solutions already exist for this problem. Compressedand compact space solutions are also known, but only for a few rank-ing functions such as term frequency and importance. However, spaceefficient data structures for term proximity based retrieval have beenevasive. In this paper we present the first sub-linear space data structurefor this relevance function, which uses onlyo(n) bits on top of any com-pressed suffix array ofDand solves queries in timeO((p+k) polylogn).

Indexation

Artículo de publicación SCOPUS

Identifier

URI: https://repositorio.uchile.cl/handle/2250/168876
DOI: 10.1007/s00453-016-0167-2
ISSN: 14320541
01784617

Quote Item

Algorithmica, Volumen 78, Issue 2, 2017, Pages 379-393

Collections

Artículos de revistas

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Chile