Skip to main content

A Lightweight Algorithm for Computing BWT from Suffix Array in Disk

  • Conference paper
  • First Online:
Parallel Architecture, Algorithm and Programming (PAAP 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 729))

Abstract

The Burrows-Wheeler transform (BWT) and the suffix array (SA) of an input string are important data structures widely used in modern bioinformatics researches such as full-text search, alignment etc. In this paper, we present a lightweight external memory algorithm for computing the BWT from a given suffix array and the input string. The algorithm has a linear I/O complexity O(n) and a workspace of at most n/2 integers. An experiment study is conducted to evaluate the time and space performance of the proposed algorithm on a number of realistic datasets. The experimental results are consistent with the theoretical complexities of the algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Beller, T., Zwerger, M., Gog, S., Ohlebusch, E.: Space-efficient construction of the burrows-wheeler transform. In: Kurland, O., Lewenstein, M., Porat, E. (eds.) SPIRE 2013. LNCS, vol. 8214, pp. 5–16. Springer, Cham (2013). doi:10.1007/978-3-319-02432-5_5

    Chapter  Google Scholar 

  2. Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm (1994)

    Google Scholar 

  3. Dementiev, R., Kettner, L., Sanders, P.: STXXL: standard template library for XXL data sets. Softw. Pract. Exp. 38(6), 589–638 (2008)

    Article  Google Scholar 

  4. Kärkkäinen, J., Kempa, D., et al.: Faster sparse suffix sorting. In: LIPIcs-Leibniz International Proceedings in Informatics, vol. 25. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2014)

    Google Scholar 

  5. Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  6. Manzini, G.: An analysis of the Burrows-Wheeler transform. J. ACM 48(3), 407–430 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  7. Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comput. Surv. 39(1), 2 (2007)

    Article  MATH  Google Scholar 

  8. Nong, G., Chan, W.H., Hu, S.Q., Wu, Y.: Induced sorting suffixes in external memory. ACM Trans. Inf. Syst. 33(3), 12 (2015)

    Article  Google Scholar 

  9. Nong, G., Zhang, S., Chan, W.H.: Two efficient algorithms for linear time suffix array construction. IEEE Trans. Comput. 60(10), 1471–1484 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  10. Weiner, P.: Linear pattern matching algorithms. In: IEEE Conference Record of 14th Annual Symposium on Switching and Automata Theory, 1973. SWAT’08, pp. 1–11. IEEE (1973)

    Google Scholar 

  11. Wu, Y., Nong, G., Chan, W.H., Han, L.B.: Checking big suffix and lcp arrays by probabilistic methods. IEEE Trans. Comput. 1, 1 (2017)

    Google Scholar 

Download references

Acknowledgments

The work of G. Nong was supported by the Guangzhou Science and Technology Program grant 201707010165 and the Project of DEGP grant 2014KTSCX007.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ge Nong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd

About this paper

Cite this paper

Xie, J.Y., Lao, B., Nong, G. (2017). A Lightweight Algorithm for Computing BWT from Suffix Array in Disk. In: Chen, G., Shen, H., Chen, M. (eds) Parallel Architecture, Algorithm and Programming. PAAP 2017. Communications in Computer and Information Science, vol 729. Springer, Singapore. https://doi.org/10.1007/978-981-10-6442-5_25

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-6442-5_25

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-6441-8

  • Online ISBN: 978-981-10-6442-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics