research-article

Suffix tree construction algorithms on modern hardware

Authors:
Dimitris Tsirogiannis

University of Toronto, Toronto, Canada

University of Toronto, Toronto, Canada
View Profile

,
Nick Koudas

University of Toronto, Toronto, Canada

University of Toronto, Toronto, Canada
View Profile

EDBT '10: Proceedings of the 13th International Conference on Extending Database TechnologyMarch 2010Pages 263–274https://doi.org/10.1145/1739041.1739075

Published:22 March 2010Publication History

EDBT '10: Proceedings of the 13th International Conference on Extending Database Technology

Pages 263–274

ABSTRACT

Suffix trees are indexing structures that enhance the performance of numerous string processing algorithms. In this paper, we propose cache-conscious suffix tree construction algorithms that are tailored to CMP architectures. The proposed algorithms utilize a novel sample-based cache partitioning algorithm to improve cache performance and exploit on-chip parallelism on CMPs. Furthermore, several compression techniques are applied to effectively trade space for cache performance.

Through an extensive experimental evaluation using real text data from different domains, we demonstrate that the algorithms proposed herein exhibit better cache performance than their cache-unaware counterparts and effectively utilize all processing elements, achieving satisfactory speedup.

References

A. Apostolico and W. Szpankowski. Self-alignment in words and their applications. J. Algorithms, 13:446--467, 1992. Google ScholarDigital Library
S. J. Bedathur and J. R. Haritsa. Engineering a fast online persistent suffix tree construction. In ICDE, page 720, 2004. Google ScholarDigital Library
P. Bieganski. Genetic sequence data retrieval and manipulation based on generalized suffix trees. PhD thesis, University of Minnesota, 1995. Google ScholarDigital Library
A. M. Carvalho, A. L. Oliveira, A. T. Freitas, and M.-F. Sagot. A parallel algorithm for the extraction of structured motifs. In SAC, pages 147--153, 2004. Google ScholarDigital Library
C. Chen and B. Schmidt. Constructing large suffix trees on a computational grid. Journal of Parallel and Distributed Computing, 66(12):1512--1523, 2006. Google ScholarDigital Library
S. Chen, A. Ailamaki, P. B. Gibbons, and T. C. Mowry. Inspector joins. In VLDB, pages 817--828, 2005. Google ScholarDigital Library
C.-F. Cheung, J. X. Yu, and H. Lu. Constructing suffix tree for gigabyte sequences with megabyte memory. IEEE TKDE, 17(1):90--105, 2005. Google ScholarDigital Library
J. Cieslewicz and K. A. Ross. Adaptive aggregation on chip multiprocessors. In VLDB, pages 339--350, 2007. Google ScholarDigital Library
I. Coorporation. Intel 64 and IA-32 architectures optimization reference manual, May 2009.Google Scholar
M. Farach-Colton, P. Ferragina, and S. Muthukrishnan. On the sorting-complexity of suffix tree construction. J. ACM, 47(6):987--1011, 2000. Google ScholarDigital Library
B. Gedik, R. R. Bordawekar, and P. S. Yu. Cellsort: high performance sorting on the cell processor. In VLDB, pages 1286--1297, 2007. Google ScholarDigital Library
A. Ghoting and K. Makarychev. Serial and parallel methods for I/O efficient suffix tree construction. In SIGMOD '09, pages 827--840, 2009. Google ScholarDigital Library
R. Giegerich, S. Kurtz, and J. Stoye. Efficient implementation of lazy suffix trees. Software - Practice and Experience, 33:1035--1049, 2003.Google Scholar
D. Gusfield. Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, 1997. Google ScholarDigital Library
R. Hariharan. Optimal parallel suffix tree construction. In STOC, pages 290--299, 1994. Google ScholarDigital Library
E. Hunt, M. P. Atkinson, and R. W. Irving. A database index to large biological sequences. In VLDB, pages 139--148, 2001. Google ScholarDigital Library
J. Kärkkäinen and E. Ukkonen. Sparse suffix trees. In COCOON, pages 219--230, 1996. Google ScholarDigital Library
A. Konig, K. Church, and M. Markov. A data structure for sponsored search. In ICDE, pages 90--101, 2009. Google ScholarDigital Library
S. Kurtz. Reducing the space requirement of suffix trees. Softw. Pract. Exper., 29(13):1149--1171, 1999. Google ScholarDigital Library
G. Landau, B. Schiever, and U. Vishkin. Parallel construction of a suffix tree. Lecture Notes in Computer Science, 267:314--325, 1987. Google ScholarDigital Library
E. M. McCreight. A space-economical suffix tree construction algorithm. J. ACM, 23(2):262--272, 1976. Google ScholarDigital Library
B. Phoophakdee and M. J. Zaki. Genome-scale disk-based suffix tree indexing. In SIGMOD, pages 833--844, 2007. Google ScholarDigital Library
J. Rao and K. A. Ross. Making B+- trees cache conscious in main memory. SIGMOD Rec., 29(2):475--486, 2000. Google ScholarDigital Library
B. Schieber and U. Vishkin. On finding lowest common ancestors: simplification and parallelization (extended summary). pages 111--123, 1988.Google Scholar
A. Shatdal, C. Kant, and J. F. Naughton. Cache conscious algorithms for relational query processing. In VLDB, pages 510--521, 1994. Google ScholarDigital Library
Y. Tian, S. Tata, R. A. Hankins, and J. M. Patel. Practical methods for constructing suffix trees. The VLDB Journal, 14(3):281--299, 2005. Google ScholarDigital Library
E. Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249--260, 1995.Google ScholarDigital Library
P. Weiner. Linear pattern matching algorithms. In In Proceedings of the 14th Annual Symposium on Switching and Automata Theory, IEEE, 1973. Google ScholarDigital Library
M. Yue. A simple proof of the inequality ffd(1) < (11/9)opt(1) + 1, for all 1, for the ffd bin-packing algorithm. Acta Mathematicae Applicatae Sinica, 7:321--331, 1991.Google ScholarCross Ref

Index Terms

Suffix tree construction algorithms on modern hardware
1. Information systems
  1. Information retrieval
    1. Document representation
  2. Information storage systems

Recommendations

The suffix binary search tree and suffix AVL tree

Suffix trees and suffix arrays are classical data structures that are used to represent the set of suffixes of a given string, and thereby facilitate the efficient solution of various string processing problems--in particular on-line string searching. ...
Read More
Faster Suffix Tree Construction with Missing Suffix Links

We consider suffix tree construction for situations with missing suffix links. Two examples of such situations are suffix trees for parameterized strings and suffix trees for two-dimensional arrays. These trees also have the property that the node ...
Read More
A Suffix Tree Or Not a Suffix Tree?
Combinatorial Algorithms
Abstract
In this paper we study the structure of suffix trees. Given an unlabeled tree on n nodes and suffix links of its internal nodes, we ask the question “Is a suffix tree?", i.e., is there a string S whose suffix tree has the same topological ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EDBT '10: Proceedings of the 13th International Conference on Extending Database Technology
March 2010
741 pages
ISBN:9781605589459
DOI:10.1145/1739041
Editors:
Ioana Manolescu
INRIA, France
,
Stefano Spaccapietra
EPFL, Switzerland
,
Jens Teubner
ETH Zurich, Switzerland
,
Masaru Kitsuregawa
Tokyo University, Japan
,
Alain Leger
Orange - France Telecom R&D, France
,
Felix Naumann
Hasso Plattner Institute, Germany
,
Anastasia Ailamaki
EPFL, Switzerland
,
Fatma Ozcan
IBM Almaden Research Center
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 March 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
multi-core
suffix tree
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate7of10submissions,70%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 11
  Total Citations
  View Citations
- 429
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Suffix tree construction algorithms on modern hardware

EDBT '10: Proceedings of the 13th International Conference on Extending Database Technology

ABSTRACT

References

Cited By

Index Terms

Recommendations

The suffix binary search tree and suffix AVL tree

Faster Suffix Tree Construction with Missing Suffix Links

A Suffix Tree Or Not a Suffix Tree?