Skip to main content

Fragmented BWT: An Extended BWT for Full-Text Indexing

  • Conference paper
  • First Online:
String Processing and Information Retrieval (SPIRE 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9954))

Included in the following conference series:

  • 590 Accesses

Abstract

This paper proposes Fragmented Burrows Wheeler Transform (FBWT), an extension to the well-known BWT structure for full-text indexing and searching. A FBWT consists of a number of BWT fragments each covering only a subset of all the suffixes of the original string. As constructing FBWT does not entail building the BWT of the whole string, it is faster than constructing BWT. On the other hand, searching with FBWT can be more costly than that with BWT, since searching the former requires searching all fragments; its amount of work is \(O(dp + {\textit{occ}}\log ^{1+\epsilon }n)\) as opposed to \(O(p + {\textit{occ}}\log ^{1+\epsilon }n)\) of regular BWT, where p is the length of the query string, n the length of the original text, occ the occurrences of the query string, and d the number of fragments. To compensate the search cost, searching with FBWT can be accelerated with SIMD instructions by searching multiple fragments in parallel. Experiments show that building FBWT is about twice as fast as building BWT via a state of the art algorithm (SA-IS); and that FBWT’s search performance compared to BWT’s depends on the number of occurrences, ranging from four times slower than BWT (when there are few occurrences), to twice as fast as BWT (when there are many).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Burrows, M., Wheeler, D.: A block-sorting lossless data compression algorithm. Algorithm Data Compression (124), p. 18 (1994)

    Google Scholar 

  2. Claude, F., Navarro, G.: The wavelet matrix. In: SPIRE, pp. 167–179 (2012)

    Google Scholar 

  3. Ferragina, P., Manzini, G.: Indexing compressed text. J. ACM 52(4), 552–581 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  4. Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed representations of sequences and full-text indexes. ACM Trans. Algorithms 3(2), 20 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  5. Grossi, R., Gupta, A., Vitter, S.: High-order entropy-compressed text indexes. In: Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 841–850 (2003)

    Google Scholar 

  6. Hayashi, S., Taura, K.: Parallel and memory-efficient Burrows-Wheeler transform. In: Proceedings - 2013 IEEE International Conference on Big Data, pp. 43–50 (2013)

    Google Scholar 

  7. Kärkkäinen, J., Sanders, P.: Simple linear work suffix array construction. In: Colloquium on Automata, Languages and Programming, pp. 943–955 (2003)

    Google Scholar 

  8. Kärkkäinen, J., K.D., S., P.: Parallel external memory suffix sorting. In: CPM 2015, pp. 329–342 (2015)

    Google Scholar 

  9. Langmead, B., Trapnell, C., Pop, M., Salzberg, S.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10(3), 1 (2009)

    Article  Google Scholar 

  10. Li, H., Durbin, R.: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009)

    Article  Google Scholar 

  11. Li, R., Yu, C., Li, Y., Lam, W., Yiu, M., Kristiansen, K., Wang, J.: SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25, 1966–1967 (2009)

    Article  Google Scholar 

  12. Manber, U., Myers, G.: Suffix string arrays: a new searches method for on-line. In: Proceedings of the First Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 319–327 (1990)

    Google Scholar 

  13. Nong, G., Zhang, S., Chan, H.: Linear suffix array construction by almost pure induced-sorting. In: 2009 Data Compression Conference, pp. 193–202 (2009)

    Google Scholar 

  14. Sadakane, K.: New text indexing functionalities of the compressed suffix arrays. J. Algorithms 48(2), 294–313 (2003)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgement

This work was in part supported by Grant-in-Aid for Scientific Research (A) 16H01715.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Masaru Ito .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Ito, M., Inoue, H., Taura, K. (2016). Fragmented BWT: An Extended BWT for Full-Text Indexing. In: Inenaga, S., Sadakane, K., Sakai, T. (eds) String Processing and Information Retrieval. SPIRE 2016. Lecture Notes in Computer Science(), vol 9954. Springer, Cham. https://doi.org/10.1007/978-3-319-46049-9_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46049-9_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46048-2

  • Online ISBN: 978-3-319-46049-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics