Converting suffix trees into factor/suffix oracles

https://doi.org/10.1016/j.jda.2006.11.005Get rights and content
Under an Elsevier user license
open archive

Abstract

Several methods to compress suffix trees were defined, most of them with the aim of obtaining compact (that is, space economical) index structures. Besides this practical aspect, a compression method can reveal structural properties of the resulting data structure, allowing a better understanding of it and a better estimation of its performances.

In this paper, we propose a simple method to compress suffix trees by merging couples of nodes. This idea was already used in the literature in a context different from ours. The originality of our approach is that the nodes we merge are not chosen with respect to their subtrees (which is difficult to test algorithmically), nor with respect to the words spelled along branches (which usually requires testing several branches before finding the good one) but with respect to their position in the tree (which is easy to compute). Another particularity of our method is it needs to read no edge label: it is exclusively based on the topology of the suffix tree. The compact structure resulting after compression is the factor/suffix oracle introduced by Allauzen, Crochemore and Raffinot whose accepted language includes the accepted language of the corresponding suffix tree.

The interest of our paper is therefore threefold:

  • 1.

    A topology-based compression method is defined for (compact) suffix trees.

  • 2.

    A new property of a factor/suffix oracle is established, that is, like a DAG, it results from the corresponding suffix tree after a linear number of appropriate node mergings; unlike a DAG, the merged nodes do not necessarily have isomorphical subtrees.

  • 3.

    A new algorithm to transform a suffix tree into a factor/suffix oracle is given, which has linear running time and thus improves the quadratic complexity previously known for the same task.

Keywords

Indexing structure
Factor recognition
Suffix recognition

Cited by (0)