Skip to main content

Divide and Conquer Computation of the Multi-string BWT and LCP Array

  • Conference paper
  • First Online:
Sailing Routes in the World of Computation (CiE 2018)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10936))

Included in the following conference series:

Abstract

Indexing huge collections of strings, such as those produced by the widespread sequencing technologies, heavily relies on multi-string generalizations of the Burrows-Wheeler Transform (BWT) and the Longest Common Prefix (LCP) array, since solving efficiently both problems are essential ingredients of several algorithms on a collection of strings.

In this paper we explore lightweight and parallel computational strategies for building the BWT and LCP array. We design a novel algorithm based on a divide and conquer approach that leads to a simultaneous and parallel computation of multi-string BWT and LCP array.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bauer, M.J., Cox, A.J., Rosone, G.: Lightweight algorithms for constructing and inverting the BWT of string collections. Theor. Comp. Sci. 483, 134–148 (2013)

    Article  MathSciNet  Google Scholar 

  2. Bauer, M.J., Cox, A.J., Rosone, G., Sciortino, M.: Lightweight LCP construction for next-generation sequencing datasets. In: Raphael, B., Tang, J. (eds.) WABI 2012. LNCS, vol. 7534, pp. 326–337. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33122-0_26

    Chapter  Google Scholar 

  3. Belazzougui, D., Gagie, T., Mäkinen, V., Previtali, M., Puglisi, S.J.: Bidirectional variable-order de Bruijn graphs. In: Kranakis, E., Navarro, G., Chávez, E. (eds.) LATIN 2016. LNCS, vol. 9644, pp. 164–178. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49529-2_13

    Chapter  Google Scholar 

  4. Beretta, S., Bonizzoni, P., Denti, L., Previtali, M., Rizzi, R.: Mapping RNA-seq data to a transcript graph via approximate pattern matching to a hypertext. In: Figueiredo, D., Martín-Vide, C., Pratas, D., Vega-Rodríguez, M.A. (eds.) AlCoB 2017. LNCS, vol. 10252, pp. 49–61. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58163-7_3

    Chapter  Google Scholar 

  5. Bonizzoni, P., Della Vedova, G., Pirola, Y., Previtali, M., Rizzi, R.: LSG: an external-memory tool to compute string graphs for next-generation sequencing data assembly. J. Comput. Biol. 23(3), 137–149 (2016)

    Article  MathSciNet  Google Scholar 

  6. Bonizzoni, P., Della Vedova, G., Pirola, Y., Previtali, M., Rizzi, R.: Computing the BWT and LCP array of a set of strings in external memory. CoRR abs/1705.07756 (2017). http://arxiv.org/abs/1705.07756

  7. Bonizzoni, P., Della Vedova, G., Pirola, Y., Previtali, M., Rizzi, R.: FSG: fast string graph construction for de novo assembly. J. Comput. Biol. 24(10), 953–968 (2017)

    Article  MathSciNet  Google Scholar 

  8. Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. Technical report, Digital Systems Research Center (1994)

    Google Scholar 

  9. Cox, A.J., Garofalo, F., Rosone, G., Sciortino, M.: Lightweight LCP construction for very large collections of strings. J. Discrete Algorithms 37(C), 17–33 (2016)

    Article  MathSciNet  Google Scholar 

  10. Egidi, L., Manzini, G.: Lightweight BWT and LCP merging via the gap algorithm. In: Fici, G., Sciortino, M., Venturini, R. (eds.) SPIRE 2017. LNCS, vol. 10508, pp. 176–190. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67428-5_15

    Chapter  Google Scholar 

  11. Ferragina, P., Luccio, F., Manzini, G., Muthukrishnan, S.: Compressing and indexing labeled trees, with applications. J. ACM 57(1), 4:1–4:33 (2009)

    Article  MathSciNet  Google Scholar 

  12. Ferragina, P., Manzini, G.: Indexing compressed text. J. ACM 52(4), 552–581 (2005)

    Article  MathSciNet  Google Scholar 

  13. Gagie, T., Manzini, G., Sirén, J.: Wheeler graphs: a framework for BWT-based data structures. Theor. Comput. Sci. 698, 67–78 (2017)

    Article  MathSciNet  Google Scholar 

  14. Holt, J., McMillan, L.: Merging of multi-string BWTs with applications. Bioinformatics 30(24), 3524–3531 (2014)

    Article  Google Scholar 

  15. Li, H.: Fast construction of FM-index for long sequence reads. Bioinformatics 30(22), 3274–3275 (2014)

    Article  Google Scholar 

  16. Mantaci, S., Restivo, A., Rosone, G., Sciortino, M.: An extension of the Burrows-Wheeler transform. Theor. Comput. Sci. 387(3), 298–312 (2007)

    Article  MathSciNet  Google Scholar 

  17. Myers, E.: The fragment assembly string graph. Bioinformatics 21(suppl. 2), ii79–ii85 (2005)

    Google Scholar 

  18. Rosone, G., Sciortino, M.: The Burrows-Wheeler transform between data compression and combinatorics on words. In: Bonizzoni, P., Brattka, V., Löwe, B. (eds.) CiE 2013. LNCS, vol. 7921, pp. 353–364. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39053-1_42

    Chapter  MATH  Google Scholar 

  19. Simpson, J., Durbin, R.: Efficient construction of an assembly string graph using the FM-index. Bioinformatics 26(12), i367–i373 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marco Previtali .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bonizzoni, P., Della Vedova, G., Nicosia, S., Pirola, Y., Previtali, M., Rizzi, R. (2018). Divide and Conquer Computation of the Multi-string BWT and LCP Array. In: Manea, F., Miller, R., Nowotka, D. (eds) Sailing Routes in the World of Computation. CiE 2018. Lecture Notes in Computer Science(), vol 10936. Springer, Cham. https://doi.org/10.1007/978-3-319-94418-0_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-94418-0_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-94417-3

  • Online ISBN: 978-3-319-94418-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics