Skip to main content

SACABench: Benchmarking Suffix Array Construction

  • Conference paper
  • First Online:
String Processing and Information Retrieval (SPIRE 2019)

Abstract

We present a practical comparison of suffix array construction algorithms on modern hardware. The benchmark is conducted using our new benchmark framework SACABench, which allows for an easy deployment of publicly available implementations, simple plotting of the results, and straight forward support to include new construction algorithms. We use the framework to develop a construction algorithm running on the GPU that is competitive with the fastest parallel algorithm in our test environment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Abdelhadi, A., Kandil, A., Abouelhoda, M.: Cloud-based parallel suffix array construction based on MPI. In: Middle East Conference on Biomedical Engineering (MECBME), pp. 334–337. IEEE (2014)

    Google Scholar 

  2. Adjeroh, D.A., Nan, F.: Suffix sorting via Shannon-Fano-Elias codes. In: Data Compression Conference (DCC), p. 502. IEEE (2008)

    Google Scholar 

  3. Baier, U.: Linear-time suffix sorting - a new approach for suffix array construction. In: 27th Annual Symposium on Combinatorial Pattern Matching (CPM). LIPIcs, vol. 54, pp. 23:1–23:12. Schloss Dagstuhl – Leibniz Center for Informatics (2016)

    Google Scholar 

  4. Bingmann, T.: Scalable string and suffix sorting: algorithms, techniques, and tools. Ph.D. thesis, Karlsruhe Institute of Technology, Germany (2018). https://doi.org/10.5445/IR/1000085031

  5. Bingmann, T., Fischer, J., Osipov, V.: Inducing suffix and LCP arrays in external memory. ACM J. Exp. Algorithmics 21(1), 2.3:1–2.3:27 (2016)

    MathSciNet  MATH  Google Scholar 

  6. Bingmann, T., Gog, S., Kurpicz, F.: Scalable construction of text indexes with thrill. In: IEEE International Conference on Big Data, pp. 634–643. IEEE (2018)

    Google Scholar 

  7. Burkhardt, S., Kärkkäinen, J.: Fast lightweight suffix array construction and checking. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 55–69. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-44888-8_5

    Chapter  Google Scholar 

  8. Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. Technical report, Digital Equipment Corporation (1994)

    Google Scholar 

  9. Dementiev, R., Kärkkäinen, J., Mehnert, J., Sanders, P.: Better external memory suffix array construction. ACM J. Exp. Algorithmics 12, 3.4:1–3.4:24 (2008)

    MathSciNet  MATH  Google Scholar 

  10. Deo, M., Keely, S.: Parallel suffix array and least common prefix for the GPU. In: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pp. 197–206. ACM (2013)

    Google Scholar 

  11. Farach, M.: Optimal suffix tree construction with large alphabets. In: 38th IEEE Annual Symposium on Foundations of Computer Science (FOCS), pp. 137–143. IEEE (1997)

    Google Scholar 

  12. Fischer, J., Kurpicz, F.: Dismantling DivSufSort. In: Prague Stringology Conference (PSC), pp. 62–76. Department of Theoretical Computer Science, Faculty of Information Technology, Czech Technical University in Prague (2017)

    Google Scholar 

  13. Fischer, J., Kurpicz, F.: Lightweight distributed suffix array construction. In: 21st Workshop on Algorithm Engineering and Experiments (ALENEX), pp. 27–38. SIAM (2019)

    Google Scholar 

  14. Flick, P., Aluru, S.: Parallel distributed memory construction of suffix and longest common prefix arrays. In: International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 16:1–16:10. ACM (2015)

    Google Scholar 

  15. Goto, K.: Optimal time and space construction of suffix arrays and LCP arrays for integer alphabets. CoRR arXiv:1703.01009 (2017)

  16. Hon, W., Sadakane, K., Sung, W.: Breaking a time-and-space barrier in constructing full-text indices. SIAM J. Comput. 38(6), 2162–2178 (2009)

    Article  MathSciNet  Google Scholar 

  17. Itoh, H., Tanaka, H.: An efficient method for in memory construction of suffix arrays. In: 6th International Symposium on String Processing and Information Retrieval (SPIRE), pp. 81–88. IEEE (1999)

    Google Scholar 

  18. Kärkkäinen, J., Kempa, D., Puglisi, S.J.: Parallel external memory suffix sorting. In: Cicalese, F., Porat, E., Vaccaro, U. (eds.) CPM 2015. LNCS, vol. 9133, pp. 329–342. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19929-0_28

    Chapter  Google Scholar 

  19. Kärkkäinen, J., Kempa, D., Puglisi, S.J., Zhukova, B.: Engineering external memory induced suffix sorting. In: 19th Workshop on Algorithm Engineering and Experiments (ALENEX), pp. 98–108. SIAM (2017)

    Google Scholar 

  20. Kärkkäinen, J., Sanders, P.: Simple linear work suffix array construction. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 943–955. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-45061-0_73

    Chapter  Google Scholar 

  21. Kärkkäinen, J., Sanders, P., Burkhardt, S.: Linear work suffix array construction. J. ACM 53(6), 918–936 (2006)

    Article  MathSciNet  Google Scholar 

  22. Kim, D.K., Jo, J., Park, H.: A fast algorithm for constructing suffix arrays for fixed-size alphabets. In: Ribeiro, C.C., Martins, S.L. (eds.) WEA 2004. LNCS, vol. 3059, pp. 301–314. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24838-5_23

    Chapter  Google Scholar 

  23. Kim, D.K., Sim, J.S., Park, H., Park, K.: Constructing suffix arrays in linear time. J. Discrete Algorithms 3(2–4), 126–142 (2005)

    Article  MathSciNet  Google Scholar 

  24. Ko, P., Aluru, S.: Space efficient linear time construction of suffix arrays. J. Discrete Algorithms 3(2–4), 143–156 (2005)

    Article  MathSciNet  Google Scholar 

  25. Labeit, J., Shun, J., Blelloch, G.E.: Parallel lightweight wavelet tree, suffix array and FM-index construction. J. Discrete Algorithms 43, 2–17 (2017)

    Article  MathSciNet  Google Scholar 

  26. Larsson, N.J., Sadakane, K.: Faster suffix sorting. Theor. Comput. Sci. 387(3), 258–272 (2007)

    Article  MathSciNet  Google Scholar 

  27. Li, Z., Li, J., Huo, H.: Optimal in-place suffix sorting. In: Data Compression Conference (DCC), p. 422. IEEE (2018)

    Google Scholar 

  28. Manber, U., Myers, E.W.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)

    Article  MathSciNet  Google Scholar 

  29. Maniscalco, M.A., Puglisi, S.J.: An efficient, versatile approach to suffix sorting. ACM J. Exp. Algorithmics 12, 1.2:1–1.2:23 (2007)

    MathSciNet  MATH  Google Scholar 

  30. Manzini, G.: Two space saving tricks for linear time LCP array computation. In: Hagerup, T., Katajainen, J. (eds.) SWAT 2004. LNCS, vol. 3111, pp. 372–383. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-27810-8_32

    Chapter  Google Scholar 

  31. Manzini, G., Ferragina, P.: Engineering a lightweight suffix array construction algorithm. Algorithmica 40(1), 33–50 (2004)

    Article  MathSciNet  Google Scholar 

  32. Metwally, A.A., Kandil, A.H., Abouelhoda, M.: Distributed suffix array construction algorithms: comparison of two algorithms. In: Cairo International Biomedical Engineering Conference (CIBEC), pp. 27–30. IEEE (2016)

    Google Scholar 

  33. Mori, Y.: DivSufSort (2006). https://github.com/y-256/libdivsufsort

  34. Mori, Y.: SAIS (2008). https://sites.google.com/site/yuta256/sais

  35. Na, J.C.: Linear-time construction of compressed suffix arrays using o(n log n)-bit working space for large alphabets. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol. 3537, pp. 57–67. Springer, Heidelberg (2005). https://doi.org/10.1007/11496656_6

    Chapter  Google Scholar 

  36. Navarro, G., Kitajima, J.P., Ribeiro-Neto, B.A., Ziviani, N.: Distributed generation of suffix arrays. In: Apostolico, A., Hein, J. (eds.) CPM 1997. LNCS, vol. 1264, pp. 102–115. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63220-4_54

    Chapter  Google Scholar 

  37. Nong, G.: Practical linear-time O(1)-workspace suffix sorting for constant alphabets. ACM Trans. Inf. Syst. 31(3), 15 (2013)

    Article  MathSciNet  Google Scholar 

  38. Nong, G., Chan, W.H., Hu, S.Q., Wu, Y.: Induced sorting suffixes in external memory. ACM Trans. Inf. Syst. 33(3), 12:1–12:15 (2015)

    Article  Google Scholar 

  39. Nong, G., Zhang, S.: Optimal lightweight construction of suffix arrays for constant alphabets. In: Dehne, F., Sack, J.-R., Zeh, N. (eds.) WADS 2007. LNCS, vol. 4619, pp. 613–624. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-73951-7_53

    Chapter  Google Scholar 

  40. Nong, G., Zhang, S., Chan, W.H.: Two efficient algorithms for linear time suffix array construction. IEEE Trans. Comput 60(10), 1471–1484 (2011)

    Article  MathSciNet  Google Scholar 

  41. Osipov, V.: Parallel suffix array construction for shared memory architectures. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 379–384. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34109-0_40

    Chapter  Google Scholar 

  42. Puglisi, S.J., Smyth, W.F., Turpin, A.H.: A taxonomy of suffix array construction algorithms. ACM Comput. Surv. 39(2) (2007). Article No. 4

    Google Scholar 

  43. Schürmann, K., Stoye, J.: An incomplex algorithm for fast suffix array construction. Softw. Pract. Exp. 37(3), 309–329 (2007)

    Article  Google Scholar 

  44. Seward, J.: On the performance of BWT sorting algorithms. In: Data Compression Conference (DCC), pp. 173–182. IEEE (2000)

    Google Scholar 

  45. Shun, J., et al.: Brief announcement: The problem based benchmark suite. In: 24th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pp. 68–70. ACM (2012)

    Google Scholar 

  46. Sun, W., Ma, Z.: Parallel lexicographic names construction with CUDA. In: 15th IEEE International Conference on Parallel and Distributed Systems (ICPADS), pp. 913–918. IEEE (2009)

    Google Scholar 

  47. Wang, L., Baxter, S., Owens, J.D.: Fast parallel skew and prefix-doubling suffix array construction on the GPU. Concurr. Comput. Pract. Exp. 28(12), 3466–3484 (2016)

    Article  Google Scholar 

Download references

Acknowledgment

We would like to thank the anonymous reviewer who pointed us to additional parallel suffix array construction algorithms that we had not previously included in the framework.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Florian Kurpicz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bahne, J. et al. (2019). SACABench: Benchmarking Suffix Array Construction. In: Brisaboa, N., Puglisi, S. (eds) String Processing and Information Retrieval. SPIRE 2019. Lecture Notes in Computer Science(), vol 11811. Springer, Cham. https://doi.org/10.1007/978-3-030-32686-9_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-32686-9_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-32685-2

  • Online ISBN: 978-3-030-32686-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics