Fast decoding algorithms for variable-lengths codes
Introduction
Data compression has been widely applied in many data processing areas. Various compression algorithms were developed for processing text documents, images, video, etc. In particular, data compression is of the foremost importance and has been well researched as it is presented in excellent surveys [24], [31].
Various codes have been applied for data compression [25]. In contrast with fixed-length codes, statistical methods use variable-length codes, with the shorter codes assigned to symbols or groups of symbols that have a higher probability of occurrence. People who design and implement variable-length codes have to deal with these two problems: (1) assigning codes that can be decoded unambiguously and (2) assigning codes with the minimum average size.
In some applications, a prefix code is required to code a set of integers whose length is not known in advance. The prefix code is a variable-length code that satisfies the prefix attribute. As we know, the binary representation of integers does not satisfy this condition. In other words, the size n of the set of integers has to be known in advance for the binary representation since it determines the code size as 1 + ⌊log2n⌋. Several prefix codes such as Elias [4], Fibonacci [6], [2], Golomb [8], [30], and Huffman codes [10] are well-known representatives of prefix codes.
Although we often do not consider time consumption of decompression as well as compression algorithms, there are cases where these times are a critical issue. Furthermore, there are applications where the time consumption of a decompression algorithm is more important than the time of a compression algorithm. For example, real-time compression of data structures [26], [7], wireless network communication [16], and text decompression [20], [5], [15], [18], [1], [19], [22], [27], [32] follow this issue. In the case of data structures, pages are decompressed during every reading from a secondary storage into the main memory or items of a page are decompressed during every access to the page. Obviously, efficiency of a decompression algorithm is extremely important.
Data structures (like B-tree [3] or R-tree [9]) often store similar items on one page. When difference coding [24] is applied to the items, it is necessary to compress small values. Variable-length codes are suitable for the compression of these values. Since fast decoding algorithms are not yet known and conventional decoding algorithms require long decoding times, variable-length codes have not been used in the compression of data structures, and, in general, in the data processing area.
The first effort of the fast decoding algorithm for Fibonacci codes of order ⩾2 has been proposed in [11], [12]. Their mapping tables are large and are therefore not useful for large numbers. In contrast, our approach deals with the general length of numbers. Moreover, we introduce fast algorithms for several codes; therefore, the scalability of the proposed method is much higher.
In Section 2, theoretical issues of variable-length codes such as Elias-delta, Fibonacci of order 2 and 3 are described. Moreover, in this section, we introduce a new code called Elias–Fibonacci. In Section 3, we provide a theoretical background of the fast decoding algorithms. These algorithms are based on a finite automaton. Since the number of automaton states is high, we introduce two types of automaton reduction in Section 4. In Section 5, we compare our work with other works. In Section 6, we describe the fast algorithms for Elias-delta code, Fibonacci codes of order 2 and 3 as well as the Elias–Fibonacci code. In Section 7, experimental results are presented and the proposed algorithms are compared to each other. In the last section, we conclude this paper and outline future works.
Section snippets
Overview of universal codes
In this section, we describe Elias-delta, Fibonacci family, and the new Elias–Fibonacci codes. We propose conventional coding/decoding algorithms for each code. Although there are other codes, like Elias-gamma [4] and Golomb codes [8], [30], in [30], [28] authors propose that Elias-delta and Fibonacci family codes provide better compression ratio than other codes. We also studied works [11], [12] where the Fibonacci of order 3 is recommended as the most effective code, in this case, for the
General principles of fast algorithms
In this section, we introduce general principles of fast algorithms. All fast algorithms are based on the finite automaton with precomputed mapping tables. This automaton is described in Section 3.1. Section 3.2 describes an identification of automaton states by a brute-force algorithm. In Section 3.3, we explain mapping table building in more details. Since the number of automaton states is rather high, we propose two types of automaton reduction in Section 4.
Reduction of automaton states number
Since the number of automaton states is high, especially for longer segment sizes, we need to reduce the number of states. We apply two principles of automaton reduction. The first principle is the identification of similar states explained in Section 4.1. The second principle includes a shift operation introduced in Section 4.2.
Comparison with other works
A fast algorithm for the Fibonacci code proposed by Shmuel T. Klein in [11], [12], [13] also applies a kind of mapping table for each decoded segment and it utilizes an automaton reduction by the state similarity as well. Author proposed two approaches for the computation of the Fibonacci shift. In the first approach, the Fibonacci shift is calculated utilizing properties of the Fibonacci code. Author utilizes float numbers for the computation and this issue results in a slower computation.
Fast Elias-delta decoding
In this case, the number of automaton states depends on the bit length of coded numbers. Let L(n) denote the bit length of a coded number n and Z(n) the length of L(n) − 1, which means Z(n) = ⌊ log2(BMAX)⌋, where BMAX is the length of the maximum encoded number. If we analyze the Elias-delta code, we identify the following types of automaton states:
- •
Initial state – it represents the first state of the automaton where we read bits from the beginning of a coded number. This state arises when a coded
Conclusion
In this article, we introduced fast decoding algorithms for Elias-delta, Fibonacci of orders 2 and 3, and Elias–Fibonacci codes and we introduced a theoretical background of these fast algorithms. Whereas Elias-delta and Fibonacci of orders 2 and 3 are well-known codes, the Elias–Fibonacci code has been introduced in this paper. Fast algorithms are based on the finite automaton and precomputed mapping table. Since the number of states is rather high, we introduced two types of automaton
Acknowledgment
This work is supported by Grants of GACR GAP202/10/0573 and GA201/09/0990 and the Ministry of Education MSM 6198910007, Czech Republic.
References (32)
- et al.
An on-line variable-length binary encoding of text
Information Sciences
(1996) - et al.
An experimental study of a compressed index
Information Science
(2001) - et al.
Enabling energy-efficient and lossy-aware data compression in wireless sensor networks by multi-objective evolutionary optimization
Information Sciences
(2010) - et al.
A data compression scheme for Chinese text files using Huffman coding and a two-level dictionary
Information Sciences
(1995) - et al.
Optimal encoding of non-stationary sources
Information Sciences
(2001) - et al.
A fast and efficient nearly-optimal adaptive Fano coding scheme
Information Sciences
(2006) PPM with the extended alphabet
Information Sciences
(2006)- et al.
Integrating unsupervised and supervised word segmentation: the role of goodness measures
Information Sciences
(2011) - et al.
Robust transmission of unbounded strings using Fibonacci representations
IEEE Transactions on Information Theory
(1987) - et al.
Organization and maintenance of large ordered indexes
Acta Informatica
(1972)