Rank and select operations on a word
Introduction
The word sizes in modern CPUs are 32-bit, 64-bit, and 128-bit. CPU instructions access and process bits in a word in parallel. Much research has focused on efficient bit-wise operations on words and their applications [1], [2], [3], [5], [7], [14], including the position of the least/most significant bit, suffix and prefix sums, and bit-permutations. Knuth [9] gave a comprehensive introduction to bit-wise tricks and techniques.
Operations rank and select are fundamental to succinct data structures. Given a string t of length n on an alphabet Σ, () returns the number of occurrences of a in the prefix of t ending at position i; () returns the position of the c-th occurrence of a in t. These problems were first studied by Jacobson [8] who presented a data structure that takes bits to support the rank queries on binary sequences in constant time. For select queries, Clark and Munro [4] gave a data structure that answers the query in constant time on a RAM with word-size , using bits space. Pagh [10] and Raman et al. [11] presented the compressed representations of bit-vectors that support these queries in time.
In this paper, we consider rank and select on a bit-vector packed in a word without any pre-processing. These operations are used in the construction of succinct data structures such as their counterparts for long strings. When the input is packed into words, these algorithms will benefit from the new approach in this paper.
Previous results. Counting the number of 1-bits in a word (also called the population count) has been studied extensively. Wegner [12] introduced an efficient -time method that uses to set the rightmost 1 of word x to 0, where w is the machine word size in bits. The first time method is attributed to Wheeler et al. [13] by Kunth [9]. The method computes a word y from the input word x, where each field contains the number of 1's in the corresponding field of x. By doubling the length of the field of y, it achieves time. HAKMEM [1] memo gave an time method. It is the same as the -time algorithm, except that when the field length l exceeds , it adds all the fields of y by y modulo . Recently, both AMD and Intel introduced a new instruction that supports counting the number of 1's in a word, called POPCNT, but no direct support for select.
Our results. We consider the Random Access Machine (RAM) with word size w and instructions on words, including arithmetic operations and bit-wise Boolean operations. We present algorithms of rank and select on an arbitrary word that both run in time and use constant space, where is the number of times the logarithm is repeated before the number is less than 1. We also introduce a method to diffuse bits to a word. Based on the method, we present constant time and space algorithms of rank and select for bit-vectors of length not longer than .
Section snippets
Notions and basic operations
For a bit-vector of n bits, denote the substring () by . Taking as an integer, the value of t is . For convenience, when the context is clear we use t as the value of t. An l-bit () field of t, say , is denoted by . We will write a bit-vector in big-endian, that is, the higher bits are on the left. For example, a bit-vector t of two bits, where , is written as 01. For an integer , denotes the
Computing a block sum of a word
We use multiplication to count the number of 1's in a word. Assume that the number of 1's in x is not greater than , that is, the number fits into l bits. If w is a multiple of l, then by , we have that the most significant l bits of word y equals the sum of all , that is, the number of 1's in x.
For smaller l such that , that is, the number of 1's in x cannot fit into l bits, we use a multiplication to work out from , where is a multiple of l and
Rank
Algorithm. Using the idea of the last section, we give the rank algorithm as follows.
Analysis. The procedure computes block sums of x with increasing block sizes, say, . If is greater than , line 6 sets the block size to a multiple of l that is the minimum not less than , and computes the block sum. Each iteration uses a constant number of operations, and the number of iterations of the while loop is . The running time of rank is .
Select
We present an algorithm to compute . The algorithm uses a prefix sum of x of block size g, where . The selection of g and the computation of are presented in Subsection 5.1. We assume that the most significant bits of the input word are 0's.
The select procedure operates in two stages. In the first stage, it finds the index of the block where the c-th 1-bit resides, which is the minimum integer j such that . In the second stage, it
-time rank
We give a variation of the rank algorithm that runs in time. We set the initial block size of rank to 2 and the block size in the i-th () iteration to . The maximum number that can be represented by bits is . So bits cannot hold the number of 1's in an -bit block. In the running of this revised rank, sums in blocks may overflow to adjacent left blocks. But the revised rank works on correctly. In , the number of 1's in each
Constant-time rank and select for short bit-vectors
For short bit-vectors, the number of 1's can be computed by a constant number of basic operations. Let v be an n-bit bit-vector. Let l be the block size such that . Assume that we can diffuse bits in v to a word z such that the LSB of each block equals a unique bit in v. If l is big enough, for example , then one BS operation can compute the number of 1's in z.
In this section, we first present a diffuse algorithm. The algorithm of Diffuse was first given by Fredman and
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
This work is supported by the Commission of Development and Reform of Jilin Province No. 2019C053-10 and the Education Department of Jilin Province No. JJKH20190162KJ.
References (14)
- et al.
Towards optimal packed string matching
Theor. Comput. Sci.
(2014) - et al.
Surpassing the information theoretic bound with fusion trees
J. Comput. Syst. Sci.
(1993) - et al.
Exploiting word-level parallelism for fast convolutions and their applications in approximate string matching
Eur. J. Comb.
(2013) - et al.
- et al.
Trans-dichotomous algorithms without multiplication - some upper and lower bounds
Compact PAT Trees
(1996)Constant time operations for words of length w
(1999)