ABSTRACT
Suffix trees are indexing structures that enhance the performance of numerous string processing algorithms. In this paper, we propose cache-conscious suffix tree construction algorithms that are tailored to CMP architectures. The proposed algorithms utilize a novel sample-based cache partitioning algorithm to improve cache performance and exploit on-chip parallelism on CMPs. Furthermore, several compression techniques are applied to effectively trade space for cache performance.
Through an extensive experimental evaluation using real text data from different domains, we demonstrate that the algorithms proposed herein exhibit better cache performance than their cache-unaware counterparts and effectively utilize all processing elements, achieving satisfactory speedup.
- A. Apostolico and W. Szpankowski. Self-alignment in words and their applications. J. Algorithms, 13:446--467, 1992. Google ScholarDigital Library
- S. J. Bedathur and J. R. Haritsa. Engineering a fast online persistent suffix tree construction. In ICDE, page 720, 2004. Google ScholarDigital Library
- P. Bieganski. Genetic sequence data retrieval and manipulation based on generalized suffix trees. PhD thesis, University of Minnesota, 1995. Google ScholarDigital Library
- A. M. Carvalho, A. L. Oliveira, A. T. Freitas, and M.-F. Sagot. A parallel algorithm for the extraction of structured motifs. In SAC, pages 147--153, 2004. Google ScholarDigital Library
- C. Chen and B. Schmidt. Constructing large suffix trees on a computational grid. Journal of Parallel and Distributed Computing, 66(12):1512--1523, 2006. Google ScholarDigital Library
- S. Chen, A. Ailamaki, P. B. Gibbons, and T. C. Mowry. Inspector joins. In VLDB, pages 817--828, 2005. Google ScholarDigital Library
- C.-F. Cheung, J. X. Yu, and H. Lu. Constructing suffix tree for gigabyte sequences with megabyte memory. IEEE TKDE, 17(1):90--105, 2005. Google ScholarDigital Library
- J. Cieslewicz and K. A. Ross. Adaptive aggregation on chip multiprocessors. In VLDB, pages 339--350, 2007. Google ScholarDigital Library
- I. Coorporation. Intel 64 and IA-32 architectures optimization reference manual, May 2009.Google Scholar
- M. Farach-Colton, P. Ferragina, and S. Muthukrishnan. On the sorting-complexity of suffix tree construction. J. ACM, 47(6):987--1011, 2000. Google ScholarDigital Library
- B. Gedik, R. R. Bordawekar, and P. S. Yu. Cellsort: high performance sorting on the cell processor. In VLDB, pages 1286--1297, 2007. Google ScholarDigital Library
- A. Ghoting and K. Makarychev. Serial and parallel methods for I/O efficient suffix tree construction. In SIGMOD '09, pages 827--840, 2009. Google ScholarDigital Library
- R. Giegerich, S. Kurtz, and J. Stoye. Efficient implementation of lazy suffix trees. Software - Practice and Experience, 33:1035--1049, 2003.Google Scholar
- D. Gusfield. Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, 1997. Google ScholarDigital Library
- R. Hariharan. Optimal parallel suffix tree construction. In STOC, pages 290--299, 1994. Google ScholarDigital Library
- E. Hunt, M. P. Atkinson, and R. W. Irving. A database index to large biological sequences. In VLDB, pages 139--148, 2001. Google ScholarDigital Library
- J. Kärkkäinen and E. Ukkonen. Sparse suffix trees. In COCOON, pages 219--230, 1996. Google ScholarDigital Library
- A. Konig, K. Church, and M. Markov. A data structure for sponsored search. In ICDE, pages 90--101, 2009. Google ScholarDigital Library
- S. Kurtz. Reducing the space requirement of suffix trees. Softw. Pract. Exper., 29(13):1149--1171, 1999. Google ScholarDigital Library
- G. Landau, B. Schiever, and U. Vishkin. Parallel construction of a suffix tree. Lecture Notes in Computer Science, 267:314--325, 1987. Google ScholarDigital Library
- E. M. McCreight. A space-economical suffix tree construction algorithm. J. ACM, 23(2):262--272, 1976. Google ScholarDigital Library
- B. Phoophakdee and M. J. Zaki. Genome-scale disk-based suffix tree indexing. In SIGMOD, pages 833--844, 2007. Google ScholarDigital Library
- J. Rao and K. A. Ross. Making B+- trees cache conscious in main memory. SIGMOD Rec., 29(2):475--486, 2000. Google ScholarDigital Library
- B. Schieber and U. Vishkin. On finding lowest common ancestors: simplification and parallelization (extended summary). pages 111--123, 1988.Google Scholar
- A. Shatdal, C. Kant, and J. F. Naughton. Cache conscious algorithms for relational query processing. In VLDB, pages 510--521, 1994. Google ScholarDigital Library
- Y. Tian, S. Tata, R. A. Hankins, and J. M. Patel. Practical methods for constructing suffix trees. The VLDB Journal, 14(3):281--299, 2005. Google ScholarDigital Library
- E. Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249--260, 1995.Google ScholarDigital Library
- P. Weiner. Linear pattern matching algorithms. In In Proceedings of the 14th Annual Symposium on Switching and Automata Theory, IEEE, 1973. Google ScholarDigital Library
- M. Yue. A simple proof of the inequality ffd(1) < (11/9)opt(1) + 1, for all 1, for the ffd bin-packing algorithm. Acta Mathematicae Applicatae Sinica, 7:321--331, 1991.Google ScholarCross Ref
Index Terms
- Suffix tree construction algorithms on modern hardware
Recommendations
The suffix binary search tree and suffix AVL tree
Suffix trees and suffix arrays are classical data structures that are used to represent the set of suffixes of a given string, and thereby facilitate the efficient solution of various string processing problems--in particular on-line string searching. ...
Faster Suffix Tree Construction with Missing Suffix Links
We consider suffix tree construction for situations with missing suffix links. Two examples of such situations are suffix trees for parameterized strings and suffix trees for two-dimensional arrays. These trees also have the property that the node ...
A Suffix Tree Or Not a Suffix Tree?
Combinatorial AlgorithmsAbstractIn this paper we study the structure of suffix trees. Given an unlabeled tree on n nodes and suffix links of its internal nodes, we ask the question “Is a suffix tree?", i.e., is there a string S whose suffix tree has the same topological ...
Comments