Skip to main content

Induced Sorting Suffixes in External Memory with Better Design and Less Space

  • Conference paper
  • First Online:
String Processing and Information Retrieval (SPIRE 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9309))

Included in the following conference series:

  • International Symposium on String Processing and Information Retrieval

Abstract

Recently, several attempts have been made to extend the internal memory suffix array (SA) construction algorithm SA-IS to the external memory model, e.g., eSAIS, EM-SA-DS and DSA-IS. While the developed programs for these algorithms achieve remarkable performance in terms of I/O complexity and speed, their designs are quite complex and their disk requirements remain rather heavy. Currently, the core algorithmic part of each of these programs consists of thousands of lines in C++, and the average peak disk requirement is over 20n bytes for an input string of size \(n<2^{40}\). We re-investigate the problem of induced sorting suffixes in external memory and propose a new algorithm SAIS-PQ (SAIS with Priority Queue) and its enhanced alternative SAIS-PQ+. Using the library STXXL, the core algorithmic parts of SAIS-PQ and SAIS-PQ+ are coded in around 800 and 1600 lines in C++, respectively. The time and space performance of these two programs are evaluated in comparison with eSAIS that is also implemented using STXXL. In our experiment, eSAIS runs the fastest for the input strings not larger than 16 GiB, but it is slower than SAIS-PQ+ for the only two input strings of 32 and 48.44 GiB. For the average peak disk requirements, eSAIS and SAIS-PQ+ are around 23n and 15n bytes, respectively.

Corresponding authors: Ge Nong (issng@mail.sysu.edu.cn) and Wai Hong Chan (waihchan@ied.edu.hk). Nong is supported by the Project of DEGP (2012KJCX0001). Chan is partially supported by the General Research Fund (810012), The Research Grant Council, Hong Kong SAR.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bingmann, T., Fischer, J., Osipov, V.: Inducing suffix and LCP arrays in external memory. In: Proceedings of ALENEX, pp. 88–102 (2013)

    Google Scholar 

  2. Dementiev, R., Kärkkäinen, J., Mehnert, J., Sanders, P.: Better external memory suffix array construction. ACM Journal of Experimental Algorithmics 12, 3.4:1–3.4:24 (2008). http://dx.doi.org/10.1145/1227161.1402296

    Google Scholar 

  3. Dementiev, R., Kettner, L., Sanders, P.: Stxxl: standard template library for xxl data sets. Software: Practice and Experience 38(6), 589–637 (2008). http://dx.doi.org/10.1002/spe.844

    Google Scholar 

  4. Dementiev, R., Kettner, L., Sanders, P.: Stxxl: standard template library for XXL data sets. In: Brodal, G.S., Leonardi, S. (eds.) ESA 2005. LNCS, vol. 3669, pp. 640–651. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  5. Ferragina, P., Gagie, T., Manzini, G.: Lightweight data indexing and compression in external memory. Algorithmica 63, 707–730 (2012). http://dx.doi.org/10.1007/s00453-011-9535-0

    Article  MathSciNet  MATH  Google Scholar 

  6. Kärkkäinen, J., Kempa, D.: Engineering a lightweight external memory suffix array construction algorithm. In: Proceedings of the 2nd International Conference on Algorithms for Big Data, pp. 53–60 (2014)

    Google Scholar 

  7. Kärkkäinen, J., Rantala, T.: Engineering radix sort for strings. In: Amir, A., Turpin, A., Moffat, A. (eds.) SPIRE 2008. LNCS, vol. 5280, pp. 3–14. Springer, Heidelberg (2008). http://dx.doi.org/10.1007/978-3-540-89097-3_3

    Chapter  Google Scholar 

  8. Nong, G., Chan, W.H., Hu, S.Q., Wu, Y.: Induced sorting suffixes in external memory. ACM Transactions on Information Systems 33(3), 12:1–12:15 (2015). http://dx.doi.org/10.1145/2699665

    Article  Google Scholar 

  9. Nong, G., Chan, W.H., Zhang, S., Guan, X.F.: Suffix array construction in external memory using d-critical substrings. ACM Transactions on Information Systems 32(1), 1:1–1:15 (2014). http://doi.acm.org/10.1145/2518175

    Article  Google Scholar 

  10. Nong, G., Zhang, S., Chan, W.H.: Two efficient algorithms for linear time suffix array construction. IEEE Transactions on Computers 60(10), 1471–1484 (2011). http://dx.doi.org/10.1109/TC.2010.188

    Article  MathSciNet  Google Scholar 

  11. Puglisi, S.J., Smyth, W.F., Turpin, A.H.: A taxonomy of suffix array construction algorithms. ACM Comput. Surv. 39(2), 1–31 (2007). http://doi.acm.org/10.1145/1242471.1242472

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Ge Nong or Wai Hong Chan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Liu, W.J., Nong, G., Chan, W.H., Wu, Y. (2015). Induced Sorting Suffixes in External Memory with Better Design and Less Space. In: Iliopoulos, C., Puglisi, S., Yilmaz, E. (eds) String Processing and Information Retrieval. SPIRE 2015. Lecture Notes in Computer Science(), vol 9309. Springer, Cham. https://doi.org/10.1007/978-3-319-23826-5_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23826-5_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23825-8

  • Online ISBN: 978-3-319-23826-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics