A faster implementation of online RLBWT and its application to LZ77 parsing

https://doi.org/10.1016/j.jda.2018.11.002Get rights and content
Under a Creative Commons license
open access

Abstract

Run-length encoded Burrows–Wheeler Transformed strings, resulting in Run-Length BWT (RLBWT), is a powerful tool for processing highly repetitive strings. We propose a new algorithm for online RLBWT working in run-compressed space, which runs in O(nlgr) time and O(rlgn) bits of space, where n is the length of input string S received so far and r is the number of runs in the BWT of the reversed S. We improve the state-of-the-art algorithm for online RLBWT in terms of empirical construction time. Adopting the dynamic list for maintaining a total order, we can replace rank queries in a dynamic wavelet tree on a run-length compressed string by the direct comparison of labels in a dynamic list. Enlisting the proposed online RLBWT, we can efficiently compute the LZ77 factorization in run-compressed space. The empirical results show the efficiencies of both our online RLBWT and LZ77 parsing, especially for highly repetitive strings.

Keywords

Run-length Burrows–Wheeler transformation
LZ77 factorization
Recompression

Cited by (0)

This work was supported by JST CREST (Grant Number JPMJCR1402), and KAKENHI (Grant Numbers 17H01791, 16K16009 and 17H06954).