Skip to main content

Parallel Suffix Sorting for Large String Analytics

  • Conference paper
  • First Online:
Parallel Processing and Applied Mathematics (PPAM 2022)

Abstract

The suffix array is a fundamental data structure to support string analysis efficiently. It took about 26 years for the sequential suffix array construction algorithm to achieve \(\mathcal {O}(n)\) time complexity and in-place sorting. In this paper, we develop the D-Limited Parallel Induce (DLPI) algorithm, the first \(\mathcal {O}(\frac{n}{p})\) time parallel suffix array construction algorithm. The basic idea of DLPI includes two aspects: dividing the \(\mathcal {O}(n)\) size problem into p reduced sub-problems with size \(\mathcal {O}(\frac{n}{p})\) so we can handle them on p processors in parallel; developing an efficient parallel induce sorting method to achieve correct order for all the reduced sub-problems. The complete algorithm description is given to show the implementation method of the proposed idea. The time and space complexity analysis and proof are also given to show the correctness and efficiency of the proposed algorithm. The proposed DLPI algorithm can handle large strings with scalable performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Burkhardt, S., Kärkkäinen, J.: Fast lightweight suffix array construction and checking. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 55–69. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-44888-8_5

    Chapter  Google Scholar 

  2. Deo, M., Keely, S.: Parallel suffix array and least common prefix for the GPU. In: Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 197–206 (2013). https://doi.org/10.1145/2442516.2442536

  3. Flick, P., Aluru, S.: Parallel distributed memory construction of suffix and longest common prefix arrays. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–10 (2015)

    Google Scholar 

  4. Futamura, N., Aluru, S., Kurtz, S.: Parallel suffix sorting. Electrical Engineering and Computer Science - All Scholarship 64 (2001). https://surface.syr.edu/eecs/64

  5. Helman, D.R., JáJá, J., Bader, D.A.: A new deterministic parallel sorting algorithm with an experimental evaluation. J. Exp. Algorithmics (JEA) 3, 4-es (1998). https://doi.org/10.1145/297096.297128

  6. Homann, R., Fleer, D., Giegerich, R., Rehmsmeier, M.: mkESA: enhanced suffix array construction tool. Bioinformatics 25(8), 1084–1085 (2009). https://doi.org/10.1093/bioinformatics/btp112

    Article  Google Scholar 

  7. JáJá, J.: An Introduction to Parallel Algorithms, vol. 10, p. 133889. Addison-Wesley, Reading (1992)

    Google Scholar 

  8. Kärkkäinen, J.: Fast BWT in small space by blockwise suffix sorting. Theoret. Comput. Sci. 387(3), 249–257 (2007). https://doi.org/10.1016/j.tcs.2007.07.018

    Article  MathSciNet  MATH  Google Scholar 

  9. Kärkkäinen, J., Sanders, P.: Simple linear work suffix array construction. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 943–955. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-45061-0_73

    Chapter  Google Scholar 

  10. Kärkkäinen, J., Sanders, P., Burkhardt, S.: Linear work suffix array construction. J. ACM (JACM) 53(6), 918–936 (2006). https://doi.org/10.1145/1217856.1217858

    Article  MathSciNet  MATH  Google Scholar 

  11. Kim, D.K., Sim, J.S., Park, H., Park, K.: Linear-time construction of suffix arrays. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 186–199. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-44888-8_14

    Chapter  Google Scholar 

  12. Ko, P., Aluru, S.: Space efficient linear time construction of suffix arrays. J. Discret. Algorithms 3(2–4), 143–156 (2005). https://doi.org/10.1016/j.jda.2004.08.002

    Article  MathSciNet  MATH  Google Scholar 

  13. Kulla, F., Sanders, P.: Scalable parallel suffix array construction. Parallel Comput. 33(9), 605–612 (2007)

    Article  MATH  Google Scholar 

  14. Lao, B., Nong, G., Chan, W.H., Pan, Y.: Fast induced sorting suffixes on a multicore machine. J. Supercomput. 74(7), 3468–3485 (2018). https://doi.org/10.1007/s11227-018-2395-5

    Article  Google Scholar 

  15. Lao, B., Nong, G., Chan, W.H., Xie, J.Y.: Fast in-place suffix sorting on a multicore computer. IEEE Trans. Comput. 67(12), 1737–1749 (2018). https://doi.org/10.1109/TC.2018.2842050

    Article  MathSciNet  MATH  Google Scholar 

  16. Larsson, N.J., Sadakane, K.: Faster suffix sorting. Theoret. Comput. Sci. 387(3), 258–272 (2007). https://doi.org/10.1016/j.tcs.2007.07.017

    Article  MathSciNet  MATH  Google Scholar 

  17. Li, Z., Li, J., Huo, H.: Optimal in-place suffix sorting. Inf. Comput. 285, 104818 (2022)

    Article  MathSciNet  MATH  Google Scholar 

  18. Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993). https://doi.org/10.1137/0222058

    Article  MathSciNet  MATH  Google Scholar 

  19. Maniscalco, M.A., Puglisi, S.J.: Faster lightweight suffix array construction. In: Proceedings of International Workshop on Combinatorial Algorithms (IWOCA), pp. 16–29 (2006)

    Google Scholar 

  20. Manzini, G., Ferragina, P.: Engineering a lightweight suffix array construction algorithm. Algorithmica 40(1), 33–50 (2004). https://doi.org/10.1007/s00453-004-1094-1

    Article  MathSciNet  MATH  Google Scholar 

  21. McCreight, E.M.: A space-economical suffix tree construction algorithm. J. ACM (JACM) 23(2), 262–272 (1976). https://doi.org/10.1145/321941.321946

    Article  MathSciNet  MATH  Google Scholar 

  22. Nong, G.: Practical linear-time O(1)-workspace suffix sorting for constant alphabets. ACM Trans. Inf. Syst. (TOIS) 31(3), 1–15 (2013). https://doi.org/10.1145/2493175.2493180

    Article  Google Scholar 

  23. Osipov, V.: Parallel suffix array construction for shared memory architectures. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 379–384. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34109-0_40

    Chapter  Google Scholar 

  24. Puglisi, S.J., Smyth, W.F., Turpin, A.H.: A taxonomy of suffix array construction algorithms. ACM Comput. Surv. (CSUR) 39(2), 4-es (2007). https://doi.org/10.1145/1242471.1242472

  25. Shun, J.: Fast parallel computation of longest common prefixes. In: SC 2014: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 387–398. IEEE (2014). https://doi.org/10.1109/SC.2014.37

  26. Shun, J., et al.: Brief announcement: the problem based benchmark suite. In: Proceedings of the Twenty-Fourth Annual ACM Symposium on Parallelism in Algorithms and Architectures, pp. 68–70 (2012). https://doi.org/10.1145/2312005.2312018

  27. Wang, L., Baxter, S., Owens, J.D.: Fast parallel skew and prefix-doubling suffix array construction on the GPU. Concurr. Comput. Pract. Exp. 28(12), 3466–3484 (2016). https://doi.org/10.1002/cpe.3867

    Article  Google Scholar 

Download references

Acknowledgement

This research was funded in part by NSF grant number CCF-2109988.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhihui Du .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Du, Z., Zhang, S., Bader, D.A. (2023). Parallel Suffix Sorting for Large String Analytics. In: Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2022. Lecture Notes in Computer Science, vol 13826. Springer, Cham. https://doi.org/10.1007/978-3-031-30442-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-30442-2_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-30441-5

  • Online ISBN: 978-3-031-30442-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics