Table Compression

Giancarlo, Raffaele; Buchsbaum, Adam L.

doi:10.1007/978-1-4939-2864-4_418

Raffaele Giancarlo² &
Adam L. Buchsbaum³

225 Accesses

Years and Authors of Summarized Original Work

2003; Buchsbaum, Fowler, Giancarlo

Problem Definition

Table compression was introduced by Buchsbaum et al. [3] as a unique application of compression, based on several distinguishing characteristics. Tables are collections of fixed-length records and can grow to be terabytes in size. They are often generated by information systems and kept in data warehouses to facilitate ongoing operations. These data warehouses will typically manage many terabytes of data online, with significant capital and operational costs. In addition, the tables must be transmitted to different parts of an organization, incurring additional costs for transmission. Typical examples are tables of transaction activity, like phone calls and credit card usage, which are stored once but then shipped repeatedly to different parts of an organization: for fraud detection, billing, operations support, etc. The goals of table compression are to be fast, online, and effective:...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 1,599.99; Price excludes VAT (USA)

Hardcover Book: USD 1,999.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recommended Reading

Apostolico A, Cunian F, Kaul V (2008) Table compression by record intersection. In: Proceedings of the IEEE data compression conference (DCC), Snowbird, pp 13–22
Google Scholar
Blum A, Li M, Tromp J, Yannakakis M (1994) Linear approximation of shortest superstrings. J ACM 41:630–647
Article MathSciNet MATH Google Scholar
Buchsbaum AL, Caldwell DF, Church KW, Fowler GS, Muthukrishnan S (2000) Engineering the compression of massive tables: an experimental approach. In: Proceedings of the 11th ACM-SIAM symposium on discrete algorithms, San Francisco, pp 175–184
Google Scholar
Buchsbaum AL, Fowler GS, Giancarlo R (2003) Improving table compression with combinatorial optimization. J ACM 50:825–851
Article MathSciNet MATH Google Scholar
Burrows M, Wheeler D (1994) A block sorting lossless data compression algorithm. Technical report 124, Digital Equipment Corporation
Google Scholar
Cilibrasi R, Vitanyi PMB (2005) Clustering by compression. IEEE Trans Inf Theory 51:1523–1545
Article MathSciNet MATH Google Scholar
Cormack G (1985) Data compression in a data base system. Commun ACM 28:1336–1350
Article MathSciNet Google Scholar
Cover TM, Thomas JA (1990) Elements of information theory. Wiley Interscience, New York
MATH Google Scholar
Ferragina P, Giancarlo R, Manzini G, Sciortino M (2005) Boosting textual compression in optimal linear time. J ACM 52:688–713
Article MathSciNet MATH Google Scholar
Ferragina P, Luccio F, Manzini G, Muthukrishnan S (2005) Structuring labeled trees for optimal succinctness, and beyond. In: Proceedings of the 45th annual IEEE symposium on foundations of computer science, Pittsburgh, pp 198–207
Google Scholar
Giancarlo R, Sciortino M, Restivo A (2007) From first principles to the Burrows and Wheeler transform and beyond, via combinatorial optimization. Theor Comput Sci 387:236–248
Article MathSciNet MATH Google Scholar
Li M, Chen X, Li X, Ma B, Vitanyi PMB (2004) The similarity metric. IEEE Trans Inf Theory 50:3250–3264
Article MathSciNet MATH Google Scholar
Liefke H, Suciu D (2000) XMILL: an efficient compressor for XML data. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, Dallas. ACM, New York, pp 153–164
Chapter Google Scholar
Lifshits Y, Mozes S, Weimann O, Ziv-Ukelson M (2009) Speeding up HMM decoding and training by exploiting sequence repetitions. Algorithmica 54:379–399
Article MathSciNet MATH Google Scholar
Manzini G (2001) An analysis of the Burrows-Wheeler transform. J ACM 48:407–430
Article MathSciNet MATH Google Scholar
Vo K-P (2006) Compression as data transformation. In: DCC: data compression conference, Snowbird. IEEE Computer Society TCC, Washington DC, p 403
Google Scholar
Vo BD, Vo K-P (2004) Using column dependency to compress tables. In: DCC: data compression conference, Snowbird. IEEE Computer Society TCC, Washington DC, pp 92–101
Google Scholar
Vo BD, Vo K-P (2007) Compressing table data with column dependency. Theor Comput Sci 387:273–283
Article MathSciNet MATH Google Scholar
Ziv J, Lempel A (1977) A universal algorithm for sequential data compression. IEEE Trans Inf Theory 23:337–343
Article MathSciNet MATH Google Scholar
Ziv J, Lempel A (1978) Compression of individual sequences via variable length coding. IEEE Trans Inf Theory 24:530–536
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Applications, University of Palermo, Palermo, Italy
Raffaele Giancarlo
Madison, NJ, USA
Adam L. Buchsbaum

Authors

Raffaele Giancarlo
View author publications
You can also search for this author in PubMed Google Scholar
Adam L. Buchsbaum
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Raffaele Giancarlo .

Editor information

Editors and Affiliations

Department of Electrical Engineering and Computer Science, Northwestern University, Evanston, IL, USA
Ming-Yang Kao

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Giancarlo, R., Buchsbaum, A.L. (2016). Table Compression. In: Kao, MY. (eds) Encyclopedia of Algorithms. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-2864-4_418

Download citation

DOI: https://doi.org/10.1007/978-1-4939-2864-4_418
Published: 22 April 2016
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-2863-7
Online ISBN: 978-1-4939-2864-4
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics