Skip to main content

Data Structures: Time, I/Os, Entropy, Joules!

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6347))

Abstract

Data volumes are exploding as organizations and users collect and store increasing amounts of information for their own use and for sharing it with others. To cope with these large datasets, software developers typically take advantage of faster and faster I/O-subsystems and multi-core processors, and/or they exploit the virtual memory to make the caching and delivering of data requested by their algorithms simple and effective whenever their working set is small. Sometimes, they gain an additional speed-up by reducing the storage usage of their algorithms because this impacts on the number of machines/disks required for a given computation, and on the amount of data that is transferred/cached to/in the faster memory levels closer to the CPUs. However it is well known, both in the algorithm and software engineering communities, that the principled exploitation of all these issues via a proper arrangement of data and a properly structured algorithmic computation can abundantly surpass the best expected technology advancements and the help coming from (sophisticated) operating systems or heuristics. As a result, data compression and indexing nowadays play a key role in the design of modern algorithms for applications that manage massive datasets.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Future and Emerging Technologies – Proactive: Disruptive Solutions for Energy Efficient ICT. In: EU Expert Consultation Workshop (February 2010)

    Google Scholar 

  2. Adjeroh, D., Bell, T., Mukherjee, A.: The Burrows-Wheeler Transform: Data Compression, Suffix Arrays and Pattern Matching. Springer, Heidelberg (2008)

    Book  Google Scholar 

  3. Agarwal, P.K., Erickson, J.: Geometric Range Searching and Its Relatives. Advances in Discrete and Computational Geometry 23, 156 (1999)

    Google Scholar 

  4. Ajwani, D., Beckmann, A., Jacob, R., Meyer, U., Moruz, G.: On computational models for flash memory devices. In: Procs. SEA. LNCS, vol. 5526, pp. 16–27. Springer, Heidelberg (2009)

    Google Scholar 

  5. Albers, S.: Energy-efficient algorithms. Comm. ACM 53(5), 86–96 (2010)

    Article  Google Scholar 

  6. Arge, L., Brodal, G.S., Fagerberg, R.: Cache-Oblivious Data Structures. In: Handbook of Data Structures. CRC Press, Boca Raton (2005)

    Google Scholar 

  7. Arroyuelo, D., Cánovas, R., Navarro, G., Sadakane, K.: Succinct trees in practice. In: Procs. ALENEX, pp. 84–97. SIAM, Philadelphia (2010)

    Google Scholar 

  8. Arroyuelo, D., Navarro, G.: A Lempel-Ziv text index on secondary storage. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 83–94. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  9. Barbay, J., Claude, F., Navarro, G.: Compact rich-functional binary relation representations. In: López-Ortiz, A. (ed.) LATIN 2010. LNCS, vol. 6034, pp. 170–183. Springer, Heidelberg (2010)

    Google Scholar 

  10. Barbay, J., He, M., Munro, J.I., Srinivasa Rao, S.: Succinct indexes for string, bynary relations and multi-labeled trees. In: Procs. SODA, pp. 680–689 (2007)

    Google Scholar 

  11. Barbay, J., Navarro, G.: Compressed representations of permutations, and applications. In: Procs. STACS, pp. 111–122 (2009)

    Google Scholar 

  12. Barroso, L.A., Hölzle, U.: The case for energy-proportional computing. IEEE Computer 40(12), 33–37 (2007)

    Google Scholar 

  13. Beckmann, A., Meyer, U., Sanders, P., Singler, J.: Energy-Efficient Sorting using Solid State Disks. In: Procs. IEEE Green Computing Conference (2010)

    Google Scholar 

  14. Bender, M., Farach-Colton, M., Kuszmaul, B.: Cache-oblivious String B-trees. In: Procs. ACM PODS, pp. 233–242 (2006)

    Google Scholar 

  15. Benoit, D., Demaine, E., Munro, I., Raman, R., Raman, V., Rao, S.: Representing trees of higher degree. Algorithmica 43, 275–292 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  16. Buchsbaum, A.L., Fowler, G.S., Giancarlo, R.: Improving table compression with combinatorial optimization. J. ACM 50(6), 825–851 (2003)

    Article  MathSciNet  Google Scholar 

  17. Buehrer, G., Chellapilla, K.: A scalable pattern mining approach to web graph compression with communities. In: Procs. ACM WSDM, pp. 95–106 (2008)

    Google Scholar 

  18. Burrows, M., Wheeler, D.: A block-sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation (1994)

    Google Scholar 

  19. Cameron, K.W., Pruhs, K., Irani, S., Ranganathan, P., Brooks, D.: Report of the science of power management workshop. NSF Report (August 2009)

    Google Scholar 

  20. Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: A distributed storage system for structured data. ACM Trans. on Computer Systems 26(2) (2008)

    Google Scholar 

  21. Chien, Y.-F., Hon, W.-K., Shah, R., Vitter, J.S.: Geometric BWT: Linking range searching and text indexing. In: Procs. IEEE DCC, pp. 252–261 (2008)

    Google Scholar 

  22. Chiu, S.Y., Hon, W.K., Shah, R., Vitter, J.: I/O-efficient compressed text indexes: From theory to practice. In: Procs. IEEE DCC (2010)

    Google Scholar 

  23. Claude, F., Navarro, G.: A fast and compact web graph representation. In: Ziviani, N., Baeza-Yates, R. (eds.) SPIRE 2007. LNCS, vol. 4726, pp. 118–129. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  24. Claude, F., Navarro, G.: Practical Rank/Select queries over arbitrary sequences. In: Amir, A., Turpin, A., Moffat, A. (eds.) SPIRE 2008. LNCS, vol. 5280, pp. 176–187. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  25. Claude, F., Navarro, G.: Self-Indexed Text Compression using Straight-Line Programs. In: Královič, R., Niwiński, D. (eds.) MFCS 2009. LNCS, vol. 5734, pp. 235–246. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  26. Claude, F., Navarro, G.: Extended compact web graph representations. In: Elomaa, T. (ed.) Ukkonen Festschrift 2010. LNCS, vol. 6060, pp. 77–91. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  27. Cutting, D.: Apache Lucene (2008), http://lucene.apache.org/

  28. Delpratt, O., Rahman, N., Raman, R.: Compressed prefix sums. In: van Leeuwen, J., Italiano, G.F., van der Hoek, W., Meinel, C., Sack, H., Plášil, F. (eds.) SOFSEM 2007. LNCS, vol. 4362, pp. 235–247. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  29. Ding, S., Attenberg, J., Suel, T.: Scalable techniques for document identifier assignment in inverted indexes. In: Procs. WWW, pp. 311–320 (2010)

    Google Scholar 

  30. Farzan, A., Munro, I.: Succinct Representations of Arbitrary Graphs. In: Halperin, D., Mehlhorn, K. (eds.) ESA 2008. LNCS, vol. 5193, pp. 393–404. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  31. Farzan, A., Raman, R., Rao, S.S.: Universal succinct representations of trees? In: Albers, S., Marchetti-Spaccamela, A., Matias, Y., Nikoletseas, S., Thomas, W. (eds.) ICALP 2009. LNCS, vol. 5555, pp. 451–462. Springer, Heidelberg (2009)

    Google Scholar 

  32. Ferragina, P., Gagie, T., Manzini, G.: Lightweight data indexing and compression in external memory. In: López-Ortiz, A. (ed.) LATIN 2010. LNCS, vol. 6034, pp. 697–710. Springer, Heidelberg (2010)

    Google Scholar 

  33. Ferragina, P., Giancarlo, R., Manzini, G.: The engineering of a compression boosting library: Theory vs practice in BWT compression. In: Azar, Y., Erlebach, T. (eds.) ESA 2006. LNCS, vol. 4168, pp. 756–767. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  34. Ferragina, P., Giancarlo, R., Manzini, G.: The myriad virtues of wavelet trees. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4051, pp. 561–572. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  35. Ferragina, P., Giancarlo, R., Manzini, G., Sciortino, M.: Boosting textual compression in optimal linear time. J. ACM 52, 688–713 (2005)

    Article  MathSciNet  Google Scholar 

  36. Ferragina, P., Gonzalez, R., Navarro, G., Venturini, R.: Compressed text indexes: From theory to practice. ACM Journal of Experimental Algorithmics (2009)

    Google Scholar 

  37. Ferragina, P., Grossi, R.: The string B-tree: A new data structure for string search in external memory and its applications. J. ACM 46(2), 236–280 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  38. Ferragina, P., Grossi, R., Gupta, A., Shah, R., Vitter, J.S.: On searching compressed string collections cache-obliviously. In: Procs. ACM PODS, pp. 181–190 (2008)

    Google Scholar 

  39. Ferragina, P., Luccio, F., Manzini, G., Muthukrishnan, S.: Compressing and searching XML data via two zips. In: Procs. WWW, pp. 751–760 (2006)

    Google Scholar 

  40. Ferragina, P., Luccio, F., Manzini, G., Muthukrishnan, S.: Compressing and indexing labeled trees, with applications. J. ACM 57(1) (2009)

    Google Scholar 

  41. Ferragina, P., Manzini, G.: Indexing compressed text. J. ACM 52(4), 552–581 (2005)

    Article  MathSciNet  Google Scholar 

  42. Ferragina, P., Manzini, G.: On compressing the textual web. In: Procs. ACM WSDM, pp. 391–400 (2010)

    Google Scholar 

  43. Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed representations of sequences and full-text indexes. ACM Trans. on Algorithms 3(2) (2007)

    Google Scholar 

  44. Ferragina, P., Nitto, I.: A delta-compressed storage scheme supporting I/O-efficient random access. Draft (2010)

    Google Scholar 

  45. Ferragina, P., Nitto, I., Venturini, R.: On optimally partitioning a text to improve its compression. In: Fiat, A., Sanders, P. (eds.) ESA 2009. LNCS, vol. 5757, pp. 420–431. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  46. Ferragina, P., Nitto, I., Venturini, R.: On the bit-complexity of lempel-ziv compression. In: Procs. ACM-SIAM SODA, pp. 768–777 (2009)

    Google Scholar 

  47. Ferragina, P., Rao, S.S.: Tree compression and indexing. In: Kao, M.-Y. (ed.) Encyclopedia of Algorithms. Springer, Heidelberg (2008)

    Google Scholar 

  48. Ferragina, P., Venturini, R.: Compressed permuterm index. In: Procs. ACM SIGIR, pp. 535–542 (2007)

    Google Scholar 

  49. Ferragina, P., Venturini, R.: A simple storage scheme for strings achieving entropy bounds. In: Procs. ACM-SIAM SODA, pp. 690–696 (2007)

    Google Scholar 

  50. Ferragina, P., Venturini, R.: Weighted compressed self-indexes. Draft (2010)

    Google Scholar 

  51. Fischer, J.: Optimal succinctness for range minimum queries. In: López-Ortiz, A. (ed.) LATIN 2010. LNCS, vol. 6034, pp. 158–169. Springer, Heidelberg (2010)

    Google Scholar 

  52. Giancarlo, R., Restivo, A., Sciortino, M.: From first principles to the Burrows and Wheeler transform and beyond, via combinatorial optimization. Theoretical Computer Science 387(3), 236–248 (2007)

    MATH  MathSciNet  Google Scholar 

  53. Golynski, A.: Optimal lower bounds for rank and select indexes. Theoretical Computer Science 387, 348–359 (2007)

    MATH  MathSciNet  Google Scholar 

  54. Golynski, A., Grossi, R., Gupta, A., Raman, R., Rao, S.S.: On the size of succinct indices. In: Arge, L., Hoffmann, M., Welzl, E. (eds.) ESA 2007. LNCS, vol. 4698, pp. 371–382. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  55. Grossi, R., Orlandi, A., Raman, R., Rao, S.S.: More haste, less waste: Lowering the redundancy in fully indexable dictionaries. In: Procs STACS, pp. 517–528 (2009)

    Google Scholar 

  56. Hon, W.K., Lam, T., Shah, R., Tam, S., Vitter, J.S.: Compressed index for dictionary matching. In: Procs. IEEE DCC, pp. 23–32 (2008)

    Google Scholar 

  57. Hon, W.K., Shah, R., Thankachan, S.V., Vitter, J.: On entropy-compressed text indexing in external memory. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 75–89. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  58. Hon, W.K., Shah, R., Vitter, J.S.: Compression, indexing, and retrieval for massive string data. In: Amir, A., Parida, L. (eds.) CPM 2010. LNCS, vol. 6129, pp. 260–274. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  59. Jansson, J., Sadakane, K., Sung, W.K.: Ultra-succinct representation of ordered trees. In: Procs ACM-SIAM SODA, pp. 575–584 (2007)

    Google Scholar 

  60. Kant, K.: Data center evolution: A tutorial on state of the art, issues, and challenges. Computer Networks 53(17), 2939–2965 (2009)

    Article  Google Scholar 

  61. Kant, K.: Toward a science of power management. IEEE Computer 42(9), 99–101 (2009)

    Google Scholar 

  62. Kaplan, H., Landau, S., Verbin, E.: A simpler analysis of Burrows-Wheeler-based compression. Theoretical Computer Science 387(3), 220–235 (2007)

    MATH  MathSciNet  Google Scholar 

  63. Lam, T.W., Sung, W.K., Tam, S.L., Wong, C.K., Yiu, S.M.: Compressed indexing and local alignment of DNA. BioInformatics 24(6), 791–797 (2008)

    Article  Google Scholar 

  64. Langmead, B., Trapnell, C., Pop, M., Salzberg, S.L.: Ultrafast and memory-efficient alignment of short dna sequences to the human genome. Genome Biology 10(R25) (2009)

    Google Scholar 

  65. Li, H., Durbin, R.: Fast and accurate long-read alignment with burrows-wheeler transform. BioInformatics 26(5), 589–595 (2010)

    Article  Google Scholar 

  66. Li, Y., He, B., Luo, Q., Yi, K.: Tree indexing on flash disks. In: Procs. ICDE, pp. 1303–1306 (2009)

    Google Scholar 

  67. Mäkinen, V., Navarro, G.: Implicit compression boosting with applications to self-indexing. In: Ziviani, N., Baeza-Yates, R. (eds.) SPIRE 2007. LNCS, vol. 4726, pp. 229–241. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  68. Mäkinen, V., Navarro, G., Siren, J., Valimaki, N.: Storage and Retrieval of Highly Repetitive Sequence Collections. Journal of Computational Biology 17(3), 281–308 (2010)

    Article  MathSciNet  Google Scholar 

  69. Manzini, G.: An analysis of the Burrows-Wheeler transform. J. ACM 48(3), 407–430 (2001)

    Article  MathSciNet  Google Scholar 

  70. Mehlhorn, K., Ziegelmann, M.: Resource constrained shortest paths. In: Paterson, M. (ed.) ESA 2000. LNCS, vol. 1879, pp. 326–337. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  71. Nath, S., Gibbons, P.B.: Online maintenance of very large random samples on flash storage. VLDB Journal 19(1), 67–90 (2010)

    Article  Google Scholar 

  72. Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Computing Surveys 39(1) (2007)

    Google Scholar 

  73. Navarro, G., Russo, L.M.: Re-pair achieves high-order entropy. In: Procs. IEEE DCC, p. 537 (2008)

    Google Scholar 

  74. Okanohara, D., Sadakane, K.: Practical entropy-compressed Rank/Select dictionary. In: Procs. ALENEX (2007)

    Google Scholar 

  75. Patrascu, M.: Succincter. In: Procs. IEEE FOCS, pp. 305–313 (2008)

    Google Scholar 

  76. Rajpoot, N., Şahinalp, C.: Dictionary-based data compression. In: Sayood, K. (ed.) Handbook of Lossless Data Compression, Academic Press, London (2002)

    Google Scholar 

  77. Ranganathan, P.: Recipe for efficiency: Principles of power-aware computing. Comm. ACM 53(4), 60–67 (2010)

    Article  Google Scholar 

  78. Rivoire, S., Shah, M.A., Ranganathan, P., Kozyrakis, C.: Joulesort: a balanced energy-efficiency benchmark. In: Procs. ACM SIGMOD, pp. 365–376 (2007)

    Google Scholar 

  79. Sadakane, K., Navarro, G.: Fully-functional succinct trees. In: Procs. ACM-SIAM SODA, pp. 134–149 (2010)

    Google Scholar 

  80. Vigna, S.: Broadword implementation of rank/select queries. In: McGeoch, C.C. (ed.) WEA 2008. LNCS, vol. 5038, pp. 154–168. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  81. Vitter, J.: External memory algorithms and data structures. ACM Computing Surveys 33(2), 209–271 (2001)

    Article  Google Scholar 

  82. Vo, B.D., Vo, K.-P.: Compressing table data with column dependency. Theoretical Computer Science 387(3), 273–283 (2007)

    MATH  MathSciNet  Google Scholar 

  83. Willets, K.: Full-text searching & the Burrows-Wheeler transform. Dr Dobbs Journal (December 2003)

    Google Scholar 

  84. Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann Publishers, San Francisco (1999)

    Google Scholar 

  85. Yan, H., Ding, S., Suel, T.: Inverted index compression and query processing with optimized document ordering. In: Procs. WWW, pp. 401–410 (2009)

    Google Scholar 

  86. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Information Theory 23, 337–343 (1977)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ferragina, P. (2010). Data Structures: Time, I/Os, Entropy, Joules!. In: de Berg, M., Meyer, U. (eds) Algorithms – ESA 2010. ESA 2010. Lecture Notes in Computer Science, vol 6347. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15781-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15781-3_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15780-6

  • Online ISBN: 978-3-642-15781-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics