Abstract
Recently, several attempts have been made to extend the internal memory suffix array (SA) construction algorithm SA-IS to the external memory model, e.g., eSAIS, EM-SA-DS and DSA-IS. While the developed programs for these algorithms achieve remarkable performance in terms of I/O complexity and speed, their designs are quite complex and their disk requirements remain rather heavy. Currently, the core algorithmic part of each of these programs consists of thousands of lines in C++, and the average peak disk requirement is over 20n bytes for an input string of size \(n<2^{40}\). We re-investigate the problem of induced sorting suffixes in external memory and propose a new algorithm SAIS-PQ (SAIS with Priority Queue) and its enhanced alternative SAIS-PQ+. Using the library STXXL, the core algorithmic parts of SAIS-PQ and SAIS-PQ+ are coded in around 800 and 1600 lines in C++, respectively. The time and space performance of these two programs are evaluated in comparison with eSAIS that is also implemented using STXXL. In our experiment, eSAIS runs the fastest for the input strings not larger than 16 GiB, but it is slower than SAIS-PQ+ for the only two input strings of 32 and 48.44 GiB. For the average peak disk requirements, eSAIS and SAIS-PQ+ are around 23n and 15n bytes, respectively.
Corresponding authors: Ge Nong (issng@mail.sysu.edu.cn) and Wai Hong Chan (waihchan@ied.edu.hk). Nong is supported by the Project of DEGP (2012KJCX0001). Chan is partially supported by the General Research Fund (810012), The Research Grant Council, Hong Kong SAR.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bingmann, T., Fischer, J., Osipov, V.: Inducing suffix and LCP arrays in external memory. In: Proceedings of ALENEX, pp. 88–102 (2013)
Dementiev, R., Kärkkäinen, J., Mehnert, J., Sanders, P.: Better external memory suffix array construction. ACM Journal of Experimental Algorithmics 12, 3.4:1–3.4:24 (2008). http://dx.doi.org/10.1145/1227161.1402296
Dementiev, R., Kettner, L., Sanders, P.: Stxxl: standard template library for xxl data sets. Software: Practice and Experience 38(6), 589–637 (2008). http://dx.doi.org/10.1002/spe.844
Dementiev, R., Kettner, L., Sanders, P.: Stxxl: standard template library for XXL data sets. In: Brodal, G.S., Leonardi, S. (eds.) ESA 2005. LNCS, vol. 3669, pp. 640–651. Springer, Heidelberg (2005)
Ferragina, P., Gagie, T., Manzini, G.: Lightweight data indexing and compression in external memory. Algorithmica 63, 707–730 (2012). http://dx.doi.org/10.1007/s00453-011-9535-0
Kärkkäinen, J., Kempa, D.: Engineering a lightweight external memory suffix array construction algorithm. In: Proceedings of the 2nd International Conference on Algorithms for Big Data, pp. 53–60 (2014)
Kärkkäinen, J., Rantala, T.: Engineering radix sort for strings. In: Amir, A., Turpin, A., Moffat, A. (eds.) SPIRE 2008. LNCS, vol. 5280, pp. 3–14. Springer, Heidelberg (2008). http://dx.doi.org/10.1007/978-3-540-89097-3_3
Nong, G., Chan, W.H., Hu, S.Q., Wu, Y.: Induced sorting suffixes in external memory. ACM Transactions on Information Systems 33(3), 12:1–12:15 (2015). http://dx.doi.org/10.1145/2699665
Nong, G., Chan, W.H., Zhang, S., Guan, X.F.: Suffix array construction in external memory using d-critical substrings. ACM Transactions on Information Systems 32(1), 1:1–1:15 (2014). http://doi.acm.org/10.1145/2518175
Nong, G., Zhang, S., Chan, W.H.: Two efficient algorithms for linear time suffix array construction. IEEE Transactions on Computers 60(10), 1471–1484 (2011). http://dx.doi.org/10.1109/TC.2010.188
Puglisi, S.J., Smyth, W.F., Turpin, A.H.: A taxonomy of suffix array construction algorithms. ACM Comput. Surv. 39(2), 1–31 (2007). http://doi.acm.org/10.1145/1242471.1242472
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Liu, W.J., Nong, G., Chan, W.H., Wu, Y. (2015). Induced Sorting Suffixes in External Memory with Better Design and Less Space. In: Iliopoulos, C., Puglisi, S., Yilmaz, E. (eds) String Processing and Information Retrieval. SPIRE 2015. Lecture Notes in Computer Science(), vol 9309. Springer, Cham. https://doi.org/10.1007/978-3-319-23826-5_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-23826-5_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23825-8
Online ISBN: 978-3-319-23826-5
eBook Packages: Computer ScienceComputer Science (R0)