Skip to main content

The Burrows-Wheeler Transform between Data Compression and Combinatorics on Words

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7921))

Abstract

The Burrows-Wheeler Transform (BWT) is a tool of fundamental importance in Data Compression and, recently, has found many applications well beyond its original purpose. The main goal of this paper is to highlight the mathematical and combinatorial properties on which the outstanding versatility of the BWT is based, i.e., its reversibility and the clustering effect on the output. Such properties have aroused curiosity and fervent interest in the scientific world both for theoretical aspects and for practical effects. In particular, in this paper we are interested both to survey the theoretical research issues which, by taking their cue from Data Compression, have been developed in the context of Combinatorics on Words, and to focus on those combinatorial results useful to explore the applicative potential of the Burrows-Wheeler Transform.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adjeroh, D., Bell, T., Mukherjee, A.: The Burrows-Wheeler Transform: Data Compression, Suffix Arrays, and Pattern Matching, 1st edn. Springer Publishing Company, Incorporated (2008)

    Google Scholar 

  2. Bauer, M.J., Cox, A.J., Rosone, G.: Lightweight algorithms for constructing and inverting the BWT of string collections. Theoret. Comput. Sci. 483, 134–148 (2013)

    Article  MathSciNet  Google Scholar 

  3. Bonomo, S., Mantaci, S., Restivo, A., Rosone, G., Sciortino, M.: Suffixes, Conjugates and Lyndon words. In: Béal, M.-P., Carton, O. (eds.) DLT 2013. LNCS, vol. 7907, pp. 131–142. Springer, Heidelberg (2013)

    Google Scholar 

  4. Burrows, M., Wheeler, D.J.: A block sorting data compression algorithm. Technical report, DIGITAL System Research Center (1994)

    Google Scholar 

  5. Cai, H., Kulkarni, S.R., Verdú, S.: Universal entropy estimation via block sorting. IEEE Transactions on Information Theory 50(7), 1551–1561 (2004)

    Article  Google Scholar 

  6. Cox, A.J., Bauer, M.J., Jakobi, T., Rosone, G.: Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform. Bioinformatics 28(11), 1415–1419 (2012)

    Article  Google Scholar 

  7. Cox, A.J., Jakobi, T., Rosone, G., Schulz-Trieglaff, O.B.: Comparing DNA sequence collections by direct comparison of compressed text indexes. In: Raphael, B., Tang, J. (eds.) WABI 2012. LNCS (LNBI), vol. 7534, pp. 214–224. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  8. Crochemore, M., Désarménien, J., Perrin, D.: A note on the Burrows-Wheeler transformation. Theoret. Comput. Sci. 332, 567–572 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  9. de Luca, A.: Combinatorics of standard sturmian words. In: Mycielski, J., Rozenberg, G., Salomaa, A. (eds.) Structures in Logic and Computer Science. LNCS, vol. 1261, Springer, Heidelberg (1997)

    Chapter  Google Scholar 

  10. de Luca, A., Mignosi, F.: Some combinatorial properties of sturmian words. Theoret. Comput. Sci. 136(2), 361–385 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  11. Droubay, X., Justin, J., Pirillo, G.: Episturmian words and some constructions of de Luca and Rauzy. Theoret. Comput. Sci. 255(1-2), 539–553 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  12. Effros, M., Visweswariah, K., Kulkarni, S.R., Verdú, S.: Universal lossless source coding with the Burrows Wheeler Transform. IEEE Transactions on Information Theory 48(5), 1061–1081 (2002)

    Article  MATH  Google Scholar 

  13. Ferenczi, S., Zamboni, L.Q.: Clustering Words and Interval Exchanges. Journal of Integer Sequences 16(2), Article 13.2.1 (2013)

    Google Scholar 

  14. Ferragina, P., Gagie, T., Manzini, G.: Lightweight Data Indexing and Compression in External Memory. Algorithmica 63(3), 707–730 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  15. Ferragina, P., Giancarlo, R., Manzini, G., Sciortino, M.: Boosting textual compression in optimal linear time. J. ACM 52(4), 688–713 (2005)

    Article  MathSciNet  Google Scholar 

  16. Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: FOCS 2000, pp. 390–398. IEEE Computer Society (2000)

    Google Scholar 

  17. Ferragina, P., Manzini, G.: An experimental study of an opportunistic index. In: SODA 2001, pp. 269–278. SIAM (2001)

    Google Scholar 

  18. Ferragina, P., Manzini, G.: Indexing compressed text. J. ACM 52, 552–581 (2005)

    Article  MathSciNet  Google Scholar 

  19. Ferragina, P., Nitto, I., Venturini, R.: On optimally partitioning a text to improve its compression. Algorithmica 61, 51–74 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  20. Gessel, I.M., Reutenauer, C.: Counting permutations with given cycle structure and descent set. J. Combin. Theory Ser. A 64(2), 189–215 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  21. Giancarlo, R., Sciortino, M.: Optimal partitions of strings: A new class of Burrows-Wheeler compression algorithms. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 129–143. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  22. Gil, J.Y., Scott, D.A.: A bijective string sorting transform. CoRR (2012); abs/1201.3077

    Google Scholar 

  23. Hon, W.-K., Ku, T.-H., Lu, C.-H., Shah, R., Thankachan, S.V.: Efficient Algorithm for Circular Burrows-Wheeler Transform. In: Kärkkäinen, J., Stoye, J. (eds.) CPM 2012. LNCS, vol. 7354, pp. 257–268. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  24. Kaplan, H., Landau, S., Verbin, E.: A simpler analysis of Burrows-Wheeler-based compression. Theoret. Comput. Sci. 387(3), 220–235 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  25. Kaplan, H., Verbin, E.: Most burrows-wheeler based compressors are not optimal. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 107–118. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  26. Knuth, D., Morris, J., Pratt, V.: Fast pattern matching in strings. SIAM Journal on Computing 6(2), 323–350 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  27. Kufleitner, M.: On bijective variants of the Burrows-Wheeler transform, pp. 65–79 (2009)

    Google Scholar 

  28. Likhomanov, K.M., Shur, A.M.: Two combinatorial criteria for BWT images. In: Kulikov, A., Vereshchagin, N. (eds.) CSR 2011. LNCS, vol. 6651, pp. 385–396. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  29. Lothaire, M.: Algebraic Combinatorics on Words. Cambridge Univ. Press (2002)

    Google Scholar 

  30. Mantaci, S., Restivo, A., Rosone, G., Sciortino, M.: An Extension of the Burrows Wheeler Transform and Applications to Sequence Comparison and Data Compression. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol. 3537, pp. 178–189. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  31. Mantaci, S., Restivo, A., Rosone, G., Sciortino, M.: An extension of the Burrows-Wheeler Transform. Theoret. Comput. Sci. 387(3), 298–312 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  32. Mantaci, S., Restivo, A., Rosone, G., Sciortino, M.: A new combinatorial approach to sequence comparison. Theory Comput. Syst. 42(3), 411–429 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  33. Mantaci, S., Restivo, A., Sciortino, M.: Burrows-Wheeler transform and Sturmian words. Information Processing Letters 86, 241–246 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  34. Mantaci, S., Restivo, A., Sciortino, M.: Distance measures for biological sequences: Some recent approaches. Int. J. Approx. Reasoning 47(1), 109–124 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  35. Manzini, G.: An analysis of the Burrows-Wheeler transform. J. ACM 48(3), 407–430 (2001)

    Article  MathSciNet  Google Scholar 

  36. Ng, K.-H., Ho, C.-K., Phon-Amnuaisuk, S.: A hybrid distance measure for clustering expressed sequence tags originating from the same gene family. PLoS ONE 7(10) (2012)

    Google Scholar 

  37. Jenkinson, O., Zamboni, L.Q.: Characterisations of balanced words via orderings. Theoret. Comput. Sci. 310(1), 247–271 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  38. Pak, I., Redlich, A.: Long cycles in abc-permutations. Functional Analysis and Other Mathematics 2, 87–92 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  39. Restivo, A., Rosone, G.: Burrows-Wheeler transform and palindromic richness. Theoret. Comput. Sci. 410(30-32), 3018–3026 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  40. Restivo, A., Rosone, G.: Balancing and clustering of words in the Burrows-Wheeler transform. Theoret. Comput. Sci. 412(27), 3019–3032 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  41. Simpson, J., Puglisi, S.J.: Words with simple Burrows-Wheeler transforms. Electronic Journal of Combinatorics 15 article R83 (2008)

    Google Scholar 

  42. Simpson, J.T., Durbin, R.: Efficient construction of an assembly string graph using the FM-index. Bioinformatics 26(12), i367–i373 (2010)

    Article  Google Scholar 

  43. Vinga, S., Almeida, J.: Alignment-free sequence comparison a review. Bioinformatics 19(4), 513–523 (2003)

    Article  Google Scholar 

  44. Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann (1999)

    Google Scholar 

  45. Yang, L., Zhang, X., Wang, T.: The Burrows-Wheeler similarity distribution between biological sequences based on Burrows-Wheeler transform. Journal of Theoretical Biology 262(4), 742–749 (2010)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rosone, G., Sciortino, M. (2013). The Burrows-Wheeler Transform between Data Compression and Combinatorics on Words. In: Bonizzoni, P., Brattka, V., Löwe, B. (eds) The Nature of Computation. Logic, Algorithms, Applications. CiE 2013. Lecture Notes in Computer Science, vol 7921. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39053-1_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-39053-1_42

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-39052-4

  • Online ISBN: 978-3-642-39053-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics