Skip to main content

Scalable K-Order LCP Array Construction for Massive Data

  • Conference paper
  • First Online:
Parallel Architecture, Algorithm and Programming (PAAP 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 729))

  • 1330 Accesses

Abstract

Given a size-n input text T and its suffix array, a new method is proposed to compute the K-order longest common prefix (LCP) array for T, in terms of that the maximum LCP of two suffixes is truncated to be at most K. This method employs a fingerprint function to convert a comparison of two variable-length strings into a comparison of their fingerprints encoded as fixed-size integers. This method takes \( {\text{O}}\left( {n\,\log K} \right) \) time and \( {\text{O}}\left( n \right) \) space on internal and external memory models. It is also scalable for a typical distributed model consisting of \( d \) computing nodes, where the time and space complexities are evenly divided onto each node as \( {\text{O}}\left( {n\,\log K/d} \right) \) and \( {\text{O}}\left( {n/d} \right) \), respectively. For performance evaluation, an experimental study has been conducted on both external memory and distributed models. From our perspective, a cluster of computers in a local area network is commonly available in practice, but there is currently a lack of scalable LCP-array construction algorithm for such a distributed model. Our method provides a candidate solution to meet this demand.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  2. Abouelhodaa, M., Kurtzb, S., Ohlebuscha, E.: Replacing suffix trees with enhanced suffix arrays. J. Discret. Algorithms 2(1), 53–86 (2004)

    Article  MathSciNet  Google Scholar 

  3. Burkhardt, S., Kärkkäinen, J.: Fast lightweight suffix array construction and checking. In: proceedings of the 14th Symposium on Combinatorial Pattern Matching, pp. 55–69, Morelia, Mexico, May 2003

    Google Scholar 

  4. Manzini, G., Ferragina, P.: Engineering a lightweight suffix array construction algorithm. Algorithmica 40, 33–50 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  5. Schürmann, K.B., Stoye, J.: An incomplex algorithm for fast suffix array construction. Softw. Pract. Exp. 37, 309–329 (2007)

    Article  Google Scholar 

  6. Ko, P., Aluru, S.: Space efficient linear time construction of suffix arrays. In: Proceedings of the 14th Annual Symposium on Combinatorial Pattern Matching, pp. 200–210, Morelia, Mexico, May 2003

    Google Scholar 

  7. Kim, D.K., Jo, J., Park, H.: A fast algorithm for constructing suffix arrays for fixed-size alphabets. In: Proceedings of the 3rd International Workshop on Experimental and Efficient Algorithms, pp. 25–28, Angra dos Reis, Brazil, May 2004

    Google Scholar 

  8. Kärkkäinen, J., Sanders, P.: Simple linear work suffix array construction. In: proceedings of the 30th International Colloquium on Automata, Languages and Programming, pp. 943–955, Eindhoven, Netherlands, June 2003

    Google Scholar 

  9. Nong, G., Zhang, S., Chan, W.H.: Two efficient algorithms for linear time suffix array construction. IEEE Trans. Comput. 60(10), 1471–1484 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  10. Puglisi, S.J., Smyth, W.F., Turpin, A.H.: A taxonomy of suffix array construction algorithms. ACM Comput. Surv. 39(2), 1–31 (2007)

    Article  Google Scholar 

  11. Fischer, J.: Inducing the LCP-array. Algorithms Data Struct. 6844, 374–385 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  12. Bingmann, T., Fischer, J., Osipov, V.: Inducing suffix and LCP arrays in external memory. In: Proceedings of the 15th Workshop on Algorithm Engineering and Experiments, pp. 88–102 (2012)

    Google Scholar 

  13. Kasai, T., Lee, G., Arimura, H.: Linear-time longest-common-prefix computation in suffix arrays and its applications. In: proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching, pp. 181–192, Jerusalem, Israel, July 2001

    Google Scholar 

  14. Manzini, G.: Two space saving tricks for linear time LCP array computation. In: Proceedings of the 9th Workshop on Algorithm Theory, pp. 372–383, Humlebaek, Denmark, July 2004

    Google Scholar 

  15. Kärkkäinen, J., Manzini, G., Puglisi, S.J.: Permuted longest-common-prefix array. In: Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching, pp. 181–192, Lille, France, June 2009

    Google Scholar 

  16. Nong, G., Chan, W.H., Zhang, S., Guan, X.F.: Suffix array construction in external memory using D-critical substrings. ACM Trans. Inf. Syst. 32(1), 1–15 (2014)

    Article  Google Scholar 

  17. Nong, G., Chan, W.H., Hu, S.Q., Wu, Y.: Induced sorting suffixes in external memory. ACM Trans. Inf. Syst. 33(3), 1–15 (2015)

    Article  Google Scholar 

  18. Dementiev, R., Kärkkäinen, J., Mehnert, J., Sanders, P.: Better external memory suffix array construction. ACM J. Exp. Algorithmics 12(3), 1–24 (2008)

    MathSciNet  MATH  Google Scholar 

  19. Kärkkäinen, J., Kempa, D.: LCP array construction in external memory. In: Proceedings of the 13th International Symposium on Experimental Algorithms, pp. 412–423, Copenhagen, Denmark, June 2014

    Google Scholar 

  20. Louza, F., Telles, G., Ciferri, C.: External memory generalized suffix and LCP arrays construction. In: Proceedings of the 24th Annual Symposium on Combinatorial Pattern Matching, pp. 201–210, Bad Herrenalb, Germany, June 2013

    Google Scholar 

  21. Bauer, M., Rosone, A.C.G., Sciortino, M.: Lightweight LCP construction for next-generation sequencing datasets. In Proceedings of the 12th International Workshop on Algorithms in Bioinformatics, pp. 326–337, Ljubljana, Slovenia (2012)

    Google Scholar 

  22. Bille, P., GØrtz, I.L., Kopelowitz, T., Sach, B., VildhØj, H.W.:. Sparse suffix tree construction in small space. In Proceedings of the 40th International Colloquium on Automata, Languages, and Programming, pp. 148–159, Riga, Latvia, July 2013

    Google Scholar 

  23. Shun, J.: Fast parallel computation of longest common prefixes. In Proceedings of the 40th International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 387–398, New Orleans, LA (2014)

    Google Scholar 

  24. Deo, M., Keely, S.: Parallel suffix array and least common prefix for the GPU. In: Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 197–206, New York, USA, August 2013

    Google Scholar 

  25. Karp, R., Rabin, M.: Efficient randomized pattern matching algorithms. IBM J. Res. Dev. 31(2), 249–260 (1987)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgement

The work of G. Nong was supported by the Guangzhou Science and Technology Program grant 201707010165 and the Project of DEGP grant 2014KTSCX007. The work of W.H. Chan was supported by GRF (18300215), Research Grant Council, Hong Kong SAR.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Wai Hong Chan or Ge Nong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd

About this paper

Cite this paper

Wu, Y., Han, L.B., Chan, W.H., Nong, G. (2017). Scalable K-Order LCP Array Construction for Massive Data. In: Chen, G., Shen, H., Chen, M. (eds) Parallel Architecture, Algorithm and Programming. PAAP 2017. Communications in Computer and Information Science, vol 729. Springer, Singapore. https://doi.org/10.1007/978-981-10-6442-5_55

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-6442-5_55

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-6441-8

  • Online ISBN: 978-981-10-6442-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics