Skip to main content

A Hilbert Space Compression Architecture for Data Warehouse Environments

  • Conference paper
Data Warehousing and Knowledge Discovery (DaWaK 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4654))

Included in the following conference series:

Abstract

Multi-dimensional data sets are very common in areas such as data warehousing and statistical databases. In these environments, core tables often grow to enormous sizes. In order to reduce storage requirements, and therefore to permit the retention of even larger data sets, compression methods are an attractive option. In this paper we discuss an efficient compression framework that is specifically designed for very large relational database implementations. The primary methods exploit a Hilbert space filling curve to dramatically reduce the storage footprint for the underlying tables. Tuples are individually compressed into page sized units so that only blocks relevant to the user’s multi-dimensional query need be accessed. Compression is available not only for the relational tables themselves, but also for the associated r-tree indexes. Experimental results demonstrate compression rates of more than 90% for multi-dimensional data, and up to 98% for the indexes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Faloutsos, C., Roseman, S.: Fractals for secondary key retrieval. In: ACM Symposium on Principles of Database Systems, pp. 247–252. ACM Press, New York (1989)

    Google Scholar 

  2. Gaede, V., Gunther, O.: Multidimensional access methods. ACM Computing Surveys 30(2), 170–231 (1998)

    Article  Google Scholar 

  3. Goldstein, J., Ramakrishnan, R., Shaft, U.: Compressing relations and indexes. In: ICDE. International Conference on Data Engineering, pp. 370–379 (1998)

    Google Scholar 

  4. Golomb, S.W.: Run-length encodings. IEEE Transactions on Information Theory 12(3), 399–401 (1966)

    Article  MATH  MathSciNet  Google Scholar 

  5. Gray, J., Bosworth, A., Layman, A., Pirahesh, H.: Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. In: ICDE. International Conference On Data Engineering, pp. 152–159 (1996)

    Google Scholar 

  6. Guttman, A.: R-trees: A dynamic index structure for spatial searching, pp. 47–57 (1984)

    Google Scholar 

  7. Hahn, C., Warren, S., Loudon, J.: Edited synoptic cloud reports from ships and land stations over the globe. Available at http://cdiac.esd.ornl.gov/cdiac/ndps/ndpo26b.html

  8. Hilbert, D.: Ueber die stetige abbildung einer line auf ein flchenstck. Mathematische Annalen 38(3), 459–460 (1891)

    Article  MathSciNet  Google Scholar 

  9. Huffman, D.: A method for the construction of minimum redundancy codes. Proceedings of the Institute of Radio Engineers (IRE) 40(9), 1098–1101 (1952)

    Google Scholar 

  10. Jagadish, H.: Linear clustering of objects with multiple attributes. In: ACM SIGMOD, 332–342 (1990)

    Google Scholar 

  11. Kamel, I., Faloutsos, C.: On packing r-trees. In: CIKM. International Conference on Information and Knowledge Management, pp. 490–499 (1993)

    Google Scholar 

  12. Leutenegger, S., Lopez, M., Eddington, J.: STR: A simple and efficient algorithm for r-tree packing. In: ICDE. International Conference on Data Engineering, pp. 497–506 (1997)

    Google Scholar 

  13. Moon, B., Jagadish, H., Faloutsos, C., Saltz, J.: Analysis of the clustering properties of the hilbert space-filling curve. Knowledge and Data Engineering 13(1), 124–141 (2001)

    Article  Google Scholar 

  14. Ng, W., Ravishankar, C.V.: Block-oriented compression techniques for large statistical databases. IEEE Transactions on Knowledge and Data Engineering 9(2), 314–328 (1997)

    Article  Google Scholar 

  15. Peano, G.: Sur une courbe, qui remplit toute une aire plane. Mathematische Annalen 36(1), 157–160 (1890)

    Article  MathSciNet  Google Scholar 

  16. Ray, G., Haritsa, J.R., Seshadri, S.: Database compression: A performance enhancement tool. In: COMAD. International Conference on Management of Data (1995)

    Google Scholar 

  17. Rissanen, J.: Generalized kraft inequality and arithmetic coding. IBM Journal of Research and Development 20(3), 198–203 (1976)

    Article  MATH  MathSciNet  Google Scholar 

  18. Roussopoulos, N., Leifker, D.: Direct spatial search on pictorial databases using packed r-trees, pp. 17–31 (1985)

    Google Scholar 

  19. Sismanis, Y., Deligiannakis, A., Roussopoulos, N., Kotidis, Y.: Dwarf: shrinking the petacube. In: ACM SIGMOD, pp. 464–475 (2002)

    Google Scholar 

  20. Westmann, T., Kossmann, D., Helmer, S., Moerkotte, G.: The implementation and performance of compressed databases. SIGMOD Record 29(3), 55–67 (2000)

    Article  Google Scholar 

  21. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23(3), 337–343 (1977)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Il Yeal Song Johann Eder Tho Manh Nguyen

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Eavis, T., Cueva, D. (2007). A Hilbert Space Compression Architecture for Data Warehouse Environments. In: Song, I.Y., Eder, J., Nguyen, T.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2007. Lecture Notes in Computer Science, vol 4654. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74553-2_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74553-2_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74552-5

  • Online ISBN: 978-3-540-74553-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics