Abstract
Given a size-n input text T and its suffix array, a new method is proposed to compute the K-order longest common prefix (LCP) array for T, in terms of that the maximum LCP of two suffixes is truncated to be at most K. This method employs a fingerprint function to convert a comparison of two variable-length strings into a comparison of their fingerprints encoded as fixed-size integers. This method takes \( {\text{O}}\left( {n\,\log K} \right) \) time and \( {\text{O}}\left( n \right) \) space on internal and external memory models. It is also scalable for a typical distributed model consisting of \( d \) computing nodes, where the time and space complexities are evenly divided onto each node as \( {\text{O}}\left( {n\,\log K/d} \right) \) and \( {\text{O}}\left( {n/d} \right) \), respectively. For performance evaluation, an experimental study has been conducted on both external memory and distributed models. From our perspective, a cluster of computers in a local area network is commonly available in practice, but there is currently a lack of scalable LCP-array construction algorithm for such a distributed model. Our method provides a candidate solution to meet this demand.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)
Abouelhodaa, M., Kurtzb, S., Ohlebuscha, E.: Replacing suffix trees with enhanced suffix arrays. J. Discret. Algorithms 2(1), 53–86 (2004)
Burkhardt, S., Kärkkäinen, J.: Fast lightweight suffix array construction and checking. In: proceedings of the 14th Symposium on Combinatorial Pattern Matching, pp. 55–69, Morelia, Mexico, May 2003
Manzini, G., Ferragina, P.: Engineering a lightweight suffix array construction algorithm. Algorithmica 40, 33–50 (2004)
Schürmann, K.B., Stoye, J.: An incomplex algorithm for fast suffix array construction. Softw. Pract. Exp. 37, 309–329 (2007)
Ko, P., Aluru, S.: Space efficient linear time construction of suffix arrays. In: Proceedings of the 14th Annual Symposium on Combinatorial Pattern Matching, pp. 200–210, Morelia, Mexico, May 2003
Kim, D.K., Jo, J., Park, H.: A fast algorithm for constructing suffix arrays for fixed-size alphabets. In: Proceedings of the 3rd International Workshop on Experimental and Efficient Algorithms, pp. 25–28, Angra dos Reis, Brazil, May 2004
Kärkkäinen, J., Sanders, P.: Simple linear work suffix array construction. In: proceedings of the 30th International Colloquium on Automata, Languages and Programming, pp. 943–955, Eindhoven, Netherlands, June 2003
Nong, G., Zhang, S., Chan, W.H.: Two efficient algorithms for linear time suffix array construction. IEEE Trans. Comput. 60(10), 1471–1484 (2011)
Puglisi, S.J., Smyth, W.F., Turpin, A.H.: A taxonomy of suffix array construction algorithms. ACM Comput. Surv. 39(2), 1–31 (2007)
Fischer, J.: Inducing the LCP-array. Algorithms Data Struct. 6844, 374–385 (2011)
Bingmann, T., Fischer, J., Osipov, V.: Inducing suffix and LCP arrays in external memory. In: Proceedings of the 15th Workshop on Algorithm Engineering and Experiments, pp. 88–102 (2012)
Kasai, T., Lee, G., Arimura, H.: Linear-time longest-common-prefix computation in suffix arrays and its applications. In: proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching, pp. 181–192, Jerusalem, Israel, July 2001
Manzini, G.: Two space saving tricks for linear time LCP array computation. In: Proceedings of the 9th Workshop on Algorithm Theory, pp. 372–383, Humlebaek, Denmark, July 2004
Kärkkäinen, J., Manzini, G., Puglisi, S.J.: Permuted longest-common-prefix array. In: Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching, pp. 181–192, Lille, France, June 2009
Nong, G., Chan, W.H., Zhang, S., Guan, X.F.: Suffix array construction in external memory using D-critical substrings. ACM Trans. Inf. Syst. 32(1), 1–15 (2014)
Nong, G., Chan, W.H., Hu, S.Q., Wu, Y.: Induced sorting suffixes in external memory. ACM Trans. Inf. Syst. 33(3), 1–15 (2015)
Dementiev, R., Kärkkäinen, J., Mehnert, J., Sanders, P.: Better external memory suffix array construction. ACM J. Exp. Algorithmics 12(3), 1–24 (2008)
Kärkkäinen, J., Kempa, D.: LCP array construction in external memory. In: Proceedings of the 13th International Symposium on Experimental Algorithms, pp. 412–423, Copenhagen, Denmark, June 2014
Louza, F., Telles, G., Ciferri, C.: External memory generalized suffix and LCP arrays construction. In: Proceedings of the 24th Annual Symposium on Combinatorial Pattern Matching, pp. 201–210, Bad Herrenalb, Germany, June 2013
Bauer, M., Rosone, A.C.G., Sciortino, M.: Lightweight LCP construction for next-generation sequencing datasets. In Proceedings of the 12th International Workshop on Algorithms in Bioinformatics, pp. 326–337, Ljubljana, Slovenia (2012)
Bille, P., GØrtz, I.L., Kopelowitz, T., Sach, B., VildhØj, H.W.:. Sparse suffix tree construction in small space. In Proceedings of the 40th International Colloquium on Automata, Languages, and Programming, pp. 148–159, Riga, Latvia, July 2013
Shun, J.: Fast parallel computation of longest common prefixes. In Proceedings of the 40th International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 387–398, New Orleans, LA (2014)
Deo, M., Keely, S.: Parallel suffix array and least common prefix for the GPU. In: Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 197–206, New York, USA, August 2013
Karp, R., Rabin, M.: Efficient randomized pattern matching algorithms. IBM J. Res. Dev. 31(2), 249–260 (1987)
Acknowledgement
The work of G. Nong was supported by the Guangzhou Science and Technology Program grant 201707010165 and the Project of DEGP grant 2014KTSCX007. The work of W.H. Chan was supported by GRF (18300215), Research Grant Council, Hong Kong SAR.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd
About this paper
Cite this paper
Wu, Y., Han, L.B., Chan, W.H., Nong, G. (2017). Scalable K-Order LCP Array Construction for Massive Data. In: Chen, G., Shen, H., Chen, M. (eds) Parallel Architecture, Algorithm and Programming. PAAP 2017. Communications in Computer and Information Science, vol 729. Springer, Singapore. https://doi.org/10.1007/978-981-10-6442-5_55
Download citation
DOI: https://doi.org/10.1007/978-981-10-6442-5_55
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6441-8
Online ISBN: 978-981-10-6442-5
eBook Packages: Computer ScienceComputer Science (R0)