skip to main content
10.1145/1739041.1739075acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article

Suffix tree construction algorithms on modern hardware

Published:22 March 2010Publication History

ABSTRACT

Suffix trees are indexing structures that enhance the performance of numerous string processing algorithms. In this paper, we propose cache-conscious suffix tree construction algorithms that are tailored to CMP architectures. The proposed algorithms utilize a novel sample-based cache partitioning algorithm to improve cache performance and exploit on-chip parallelism on CMPs. Furthermore, several compression techniques are applied to effectively trade space for cache performance.

Through an extensive experimental evaluation using real text data from different domains, we demonstrate that the algorithms proposed herein exhibit better cache performance than their cache-unaware counterparts and effectively utilize all processing elements, achieving satisfactory speedup.

References

  1. A. Apostolico and W. Szpankowski. Self-alignment in words and their applications. J. Algorithms, 13:446--467, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. J. Bedathur and J. R. Haritsa. Engineering a fast online persistent suffix tree construction. In ICDE, page 720, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Bieganski. Genetic sequence data retrieval and manipulation based on generalized suffix trees. PhD thesis, University of Minnesota, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. M. Carvalho, A. L. Oliveira, A. T. Freitas, and M.-F. Sagot. A parallel algorithm for the extraction of structured motifs. In SAC, pages 147--153, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Chen and B. Schmidt. Constructing large suffix trees on a computational grid. Journal of Parallel and Distributed Computing, 66(12):1512--1523, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Chen, A. Ailamaki, P. B. Gibbons, and T. C. Mowry. Inspector joins. In VLDB, pages 817--828, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C.-F. Cheung, J. X. Yu, and H. Lu. Constructing suffix tree for gigabyte sequences with megabyte memory. IEEE TKDE, 17(1):90--105, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Cieslewicz and K. A. Ross. Adaptive aggregation on chip multiprocessors. In VLDB, pages 339--350, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. I. Coorporation. Intel 64 and IA-32 architectures optimization reference manual, May 2009.Google ScholarGoogle Scholar
  10. M. Farach-Colton, P. Ferragina, and S. Muthukrishnan. On the sorting-complexity of suffix tree construction. J. ACM, 47(6):987--1011, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B. Gedik, R. R. Bordawekar, and P. S. Yu. Cellsort: high performance sorting on the cell processor. In VLDB, pages 1286--1297, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Ghoting and K. Makarychev. Serial and parallel methods for I/O efficient suffix tree construction. In SIGMOD '09, pages 827--840, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Giegerich, S. Kurtz, and J. Stoye. Efficient implementation of lazy suffix trees. Software - Practice and Experience, 33:1035--1049, 2003.Google ScholarGoogle Scholar
  14. D. Gusfield. Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Hariharan. Optimal parallel suffix tree construction. In STOC, pages 290--299, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. E. Hunt, M. P. Atkinson, and R. W. Irving. A database index to large biological sequences. In VLDB, pages 139--148, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Kärkkäinen and E. Ukkonen. Sparse suffix trees. In COCOON, pages 219--230, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Konig, K. Church, and M. Markov. A data structure for sponsored search. In ICDE, pages 90--101, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Kurtz. Reducing the space requirement of suffix trees. Softw. Pract. Exper., 29(13):1149--1171, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. G. Landau, B. Schiever, and U. Vishkin. Parallel construction of a suffix tree. Lecture Notes in Computer Science, 267:314--325, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. E. M. McCreight. A space-economical suffix tree construction algorithm. J. ACM, 23(2):262--272, 1976. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. B. Phoophakdee and M. J. Zaki. Genome-scale disk-based suffix tree indexing. In SIGMOD, pages 833--844, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Rao and K. A. Ross. Making B+- trees cache conscious in main memory. SIGMOD Rec., 29(2):475--486, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. B. Schieber and U. Vishkin. On finding lowest common ancestors: simplification and parallelization (extended summary). pages 111--123, 1988.Google ScholarGoogle Scholar
  25. A. Shatdal, C. Kant, and J. F. Naughton. Cache conscious algorithms for relational query processing. In VLDB, pages 510--521, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Y. Tian, S. Tata, R. A. Hankins, and J. M. Patel. Practical methods for constructing suffix trees. The VLDB Journal, 14(3):281--299, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. E. Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249--260, 1995.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. P. Weiner. Linear pattern matching algorithms. In In Proceedings of the 14th Annual Symposium on Switching and Automata Theory, IEEE, 1973. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. Yue. A simple proof of the inequality ffd(1) < (11/9)opt(1) + 1, for all 1, for the ffd bin-packing algorithm. Acta Mathematicae Applicatae Sinica, 7:321--331, 1991.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Suffix tree construction algorithms on modern hardware

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        EDBT '10: Proceedings of the 13th International Conference on Extending Database Technology
        March 2010
        741 pages
        ISBN:9781605589459
        DOI:10.1145/1739041

        Copyright © 2010 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 22 March 2010

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate7of10submissions,70%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader