Abstract:
A number of methods have been proposed for suffix sorting on internal memory of RAM and external memory of hard disks. The current best results for suffix sorting on inte...Show MoreMetadata
Abstract:
A number of methods have been proposed for suffix sorting on internal memory of RAM and external memory of hard disks. The current best results for suffix sorting on internal or external memory are achieved by several algorithms using the induced sorting (IS) method in various ways. While these algorithms are efficient, the internal ones are much different from those external in terms of the algorithm designs. A scalable IS method that can be applied for suffix sorting on both internal and external memory is highly desired. This article proposes a blockwise IS method to facilitate pipelined access on internal memory and sequential I/Os on external memory. The detailed algorithm of using this method for a 4-stage pipeline with multiple threads is described, where multiple threads are applied to parallelize not only the pipelined stages of consecutive blocks but also the tasks within each stage wherever possible. This algorithm is evaluated by our experiments on a set of realistic and artificial datasets to achieve better overall time and space performance than the existing best results from pSACAK, pDSS and pKS. Beside sorting suffixes on internal memory in linear time, the proposed method can be ported to external memory for sorting massive suffixes in linear I/O complexity.
Published in: IEEE Transactions on Computers ( Volume: 69, Issue: 9, 01 September 2020)