Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 4192))

Abstract

Suffix arrays are a simple and powerful data structure for text processing that can be used for full text indexes, data compression, and many other applications in particular in bioinformatics. We describe the first implementation and experimental evaluation of a scalable parallel algorithm for suffix array construction. The implementation works on distributed memory computers using MPI, Experiments with up to 128 processors show good constant factors and make it look likely that the algorithm would also scale to considerably larger systems. This makes it possible to build suffix arrays for huge inputs very quickly. Our algorithm is a parallelization of the linear time DC3 algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Manber, U., Myers, G.: Suffix arrays: A new method for on-line string searches. SIAM Journal on Computing 22, 935–948 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  2. Gonnet, G., Baeza-Yates, R., Snider, T.: New indices for text: PAT trees and PAT arrays. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval: Data Structures & Algorithms. Prentice-Hall, Englewood Cliffs (1992)

    Google Scholar 

  3. Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: The enhanced suffix array and its applications to genome analysis. In: Guigó, R., Gusfield, D. (eds.) WABI 2002. LNCS, vol. 2452, pp. 449–463. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  4. Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. Technical Report 124, SRC (digital, Palo Alto) (1994)

    Google Scholar 

  5. Kärkkäinen, J., Sanders, P.: Simple linear work suffix array construction. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 943–955. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  6. Kim, D.K., Sim, J.S., Park, H., Park, K.: Linear-time construction of suffix arrays. In: Proc. 14th Annual Symposium on Combinatorial Pattern Matching, pp. 186–199. Springer, Heidelberg (2003) (to appear)

    Chapter  Google Scholar 

  7. Ko, P., Aluru, S.: Space efficient linear time construction of suffix arrays. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 200–210. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  8. Kärkkäinen, J., Sanders, P.: Simple linear work suffix array construction. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 943–955. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  9. Dementiev, R., Kärkkäinen, J., Mehnert, J., Sanders, P.: Better external memory suffix array construction. In: Workshop on Algorithm Engineering & Experiments, Vancouver, pp. 86–97 (2005)

    Google Scholar 

  10. Iliopoulos, C.S., Rytter, W.: On parallel transformations of suffix arrays into suffix trees. In: 15th Australasian Workshop on Combinatorial Algorithms (AWOCA) (2004)

    Google Scholar 

  11. Kärkkäinen, J., Sanders, P., Burkhardt, S.: Linear work suffix array construction. Journal of the ACM (to appear, 2006)

    Google Scholar 

  12. Futamura, N., Aluru, S., Kurtz, S.: Parallel suffix sorting. In: Proc. 9th International Conference on Advanced Computing and Communications, pp. 76–81. McGraw-Hill, New York (2001)

    Google Scholar 

  13. Manzini, G., Ferragina, P.: Engineering a lightweight suffix array construction algorithm. In: Möhring, R.H., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, pp. 698–710. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  14. Shi, H., Schaeffer, J.: Parallel sorting by regular sampling. Journal of Parallel and Distributed Computing 14, 361–372 (1992)

    Article  MATH  Google Scholar 

  15. Snir, M., Otto, S.W., Huss-Lederman, S., Walker, D.W., Dongarra, J.: MPI – the Complete Reference. MIT Press, Cambridge (1996)

    Google Scholar 

  16. Smyth, B., Turpin, A.: The performance of linear time suffix sorting algorithms. In: IEEE Data Compression Conference (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kulla, F., Sanders, P. (2006). Scalable Parallel Suffix Array Construction. In: Mohr, B., Träff, J.L., Worringen, J., Dongarra, J. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2006. Lecture Notes in Computer Science, vol 4192. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11846802_12

Download citation

  • DOI: https://doi.org/10.1007/11846802_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-39110-4

  • Online ISBN: 978-3-540-39112-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics