Elsevier

Information Sciences

Volume 183, Issue 1, 15 January 2012, Pages 66-91
Information Sciences

Fast decoding algorithms for variable-lengths codes

https://doi.org/10.1016/j.ins.2011.06.019Get rights and content

Abstract

Data compression has been widely applied in many data processing areas. Compression methods use variable-length codes with the shorter codes assigned to symbols or groups of symbols that appear in the data frequently. There exist many coding algorithms, e.g. Elias-delta codes, Fibonacci codes and other variable-length codes which are often applied to encoding of numbers. Although we often do not consider time consumption of decompression as well as compression algorithms, there are cases where the decompression time is a critical issue. For example, a real-time compression of data structures, applied in the case of the physical implementation of database management systems, follows this issue. In this case, pages of a data structure are decompressed during every reading from a secondary storage into the main memory or items of a page are decompressed during every access to the page. Obviously, efficiency of a decompression algorithm is extremely important. Since fast decoding algorithms were not known until recently, variable-length codes have not been used in the data processing area. In this article, we introduce fast decoding algorithms for Elias-delta, Fibonacci of order 2 as well as Fibonacci of order 3 codes. We provide a theoretical background making these fast algorithms possible. Moreover, we introduce a new code, called the Elias–Fibonacci code, with a lower compression ratio than the Fibonacci of order 3 code for lower numbers; however, this new code provides a faster decoding time than other tested codes. Codes of Elias–Fibonacci are shorter than other compared codes for numbers longer than 26 bits. All these algorithms are suitable in the case of data processing tasks with special emphasis on the decompression time.

Introduction

Data compression has been widely applied in many data processing areas. Various compression algorithms were developed for processing text documents, images, video, etc. In particular, data compression is of the foremost importance and has been well researched as it is presented in excellent surveys [24], [31].

Various codes have been applied for data compression [25]. In contrast with fixed-length codes, statistical methods use variable-length codes, with the shorter codes assigned to symbols or groups of symbols that have a higher probability of occurrence. People who design and implement variable-length codes have to deal with these two problems: (1) assigning codes that can be decoded unambiguously and (2) assigning codes with the minimum average size.

In some applications, a prefix code is required to code a set of integers whose length is not known in advance. The prefix code is a variable-length code that satisfies the prefix attribute. As we know, the binary representation of integers does not satisfy this condition. In other words, the size n of the set of integers has to be known in advance for the binary representation since it determines the code size as 1 + ⌊log2n⌋. Several prefix codes such as Elias [4], Fibonacci [6], [2], Golomb [8], [30], and Huffman codes [10] are well-known representatives of prefix codes.

Although we often do not consider time consumption of decompression as well as compression algorithms, there are cases where these times are a critical issue. Furthermore, there are applications where the time consumption of a decompression algorithm is more important than the time of a compression algorithm. For example, real-time compression of data structures [26], [7], wireless network communication [16], and text decompression [20], [5], [15], [18], [1], [19], [22], [27], [32] follow this issue. In the case of data structures, pages are decompressed during every reading from a secondary storage into the main memory or items of a page are decompressed during every access to the page. Obviously, efficiency of a decompression algorithm is extremely important.

Data structures (like B-tree [3] or R-tree [9]) often store similar items on one page. When difference coding [24] is applied to the items, it is necessary to compress small values. Variable-length codes are suitable for the compression of these values. Since fast decoding algorithms are not yet known and conventional decoding algorithms require long decoding times, variable-length codes have not been used in the compression of data structures, and, in general, in the data processing area.

The first effort of the fast decoding algorithm for Fibonacci codes of order ⩾2 has been proposed in [11], [12]. Their mapping tables are large and are therefore not useful for large numbers. In contrast, our approach deals with the general length of numbers. Moreover, we introduce fast algorithms for several codes; therefore, the scalability of the proposed method is much higher.

In Section 2, theoretical issues of variable-length codes such as Elias-delta, Fibonacci of order 2 and 3 are described. Moreover, in this section, we introduce a new code called Elias–Fibonacci. In Section 3, we provide a theoretical background of the fast decoding algorithms. These algorithms are based on a finite automaton. Since the number of automaton states is high, we introduce two types of automaton reduction in Section 4. In Section 5, we compare our work with other works. In Section 6, we describe the fast algorithms for Elias-delta code, Fibonacci codes of order 2 and 3 as well as the Elias–Fibonacci code. In Section 7, experimental results are presented and the proposed algorithms are compared to each other. In the last section, we conclude this paper and outline future works.

Section snippets

Overview of universal codes

In this section, we describe Elias-delta, Fibonacci family, and the new Elias–Fibonacci codes. We propose conventional coding/decoding algorithms for each code. Although there are other codes, like Elias-gamma [4] and Golomb codes [8], [30], in [30], [28] authors propose that Elias-delta and Fibonacci family codes provide better compression ratio than other codes. We also studied works [11], [12] where the Fibonacci of order 3 is recommended as the most effective code, in this case, for the

General principles of fast algorithms

In this section, we introduce general principles of fast algorithms. All fast algorithms are based on the finite automaton with precomputed mapping tables. This automaton is described in Section 3.1. Section 3.2 describes an identification of automaton states by a brute-force algorithm. In Section 3.3, we explain mapping table building in more details. Since the number of automaton states is rather high, we propose two types of automaton reduction in Section 4.

Reduction of automaton states number

Since the number of automaton states is high, especially for longer segment sizes, we need to reduce the number of states. We apply two principles of automaton reduction. The first principle is the identification of similar states explained in Section 4.1. The second principle includes a shift operation introduced in Section 4.2.

Comparison with other works

A fast algorithm for the Fibonacci code proposed by Shmuel T. Klein in [11], [12], [13] also applies a kind of mapping table for each decoded segment and it utilizes an automaton reduction by the state similarity as well. Author proposed two approaches for the computation of the Fibonacci shift. In the first approach, the Fibonacci shift is calculated utilizing properties of the Fibonacci code. Author utilizes float numbers for the computation and this issue results in a slower computation.

Fast Elias-delta decoding

In this case, the number of automaton states depends on the bit length of coded numbers. Let L(n) denote the bit length of a coded number n and Z(n) the length of L(n)  1, which means Z(n) = ⌊ log2(BMAX)⌋, where BMAX is the length of the maximum encoded number. If we analyze the Elias-delta code, we identify the following types of automaton states:

  • Initial state – it represents the first state of the automaton where we read bits from the beginning of a coded number. This state arises when a coded

Conclusion

In this article, we introduced fast decoding algorithms for Elias-delta, Fibonacci of orders 2 and 3, and Elias–Fibonacci codes and we introduced a theoretical background of these fast algorithms. Whereas Elias-delta and Fibonacci of orders 2 and 3 are well-known codes, the Elias–Fibonacci code has been introduced in this paper. Fast algorithms are based on the finite automaton and precomputed mapping table. Since the number of states is rather high, we introduced two types of automaton

Acknowledgment

This work is supported by Grants of GACR GAP202/10/0573 and GA201/09/0990 and the Ministry of Education MSM 6198910007, Czech Republic.

References (32)

  • P. Elias

    Universal codeword sets and representations of the integers

    IEEE Transactions on Information Theory IT-21

    (1975)
  • A. Fraenkel, S. Klein, Robust Universal Complete Codes As Alternatives to Huffman Codes, Technical Report CS85-16,...
  • J. Goldstein et al.

    Compressing relations and indexes

  • S.W. Golomb

    Run-length encodings

    IEEE Transactions on Information Theory IT-12

    (1966)
  • A. Guttman

    R-Trees: a dynamic index structure for spatial searching

  • D. Huffman, A method for the construction of minimum redundancy codes, in: Proceedings of the IRE vol. 40, 1952, pp....
  • Cited by (0)

    View full text