Abstract
The suffix tree is a key data structure for biological sequence analysis. Even though efficient algorithms for suffix tree construction exist, for long DNA sequences such as whole human chromosomes, their run-time is still very high . In this paper we introduce a new parallel algorithm for suffix tree construction. This algorithm uses a new data structure call the common prefix suffix tree (CPST). Our parallel implementation on a PC cluster leads to significant run-time savings.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Andersson, A., Nilsson, S.: Efficient Implementation of Suffix Trees. Software-Practice and Experience 25(2), 129–141 (1995)
Brown, A.L.: Constructing Chromosome Scale Suffix Tree. In: The 2nd Asia-Pacific Bioinformatics Conference, New Zealand (2004)
Clifford, R., Sergot, M.: Distributed and Paged Suffix Trees for Large Genetic Databases. Journal of Discrete Algorithms (accepted)
Colussi, L., De Col, A.: A time and space efficient data structure for string searching on large texts. Information Processing Letters 58(5), 217–222 (1996)
Delcher, A., Phillippy, A., Carlton, J., Salzberg, S.: Fast Algorithms for Large-scale Genome Alignment and Comparision. Nucleic Acids Research 30(11), 2478–2483 (2002)
Farach, M., Ferragina, P., Muthukrishnan, S.: Overcoming the Memory Bottleneck in Suffix Tree Construction. Proc. of IEEE Annual Symposium on Foundations of Computer Science (1998)
Ferragina, P., Grossi, R.: The string B-Tree: a new data structure for string search in external memory and its application. Journal of the ACM 46(2), 238–280 (1999)
Gusfield, D.: Algorithms on strings, trees and sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)
Hunt, E., Atkinson, M.P., Irving, R.W.: A Database Index to Large Biological Sequences. The VLDB J. 7(3), 139–148 (2001)
Irving, R.W.: Suffix Binary Search Trees, Research Report, Department of Computer Science, University of Glasgow (1996)
Japp, R.: Persistent Indexes for Data intensive applications. In: James, A., Younas, M., Lings, B. (eds.) BNCOD 2003. LNCS, vol. 2712, Springer, Heidelberg (2003)
Kärkkäinen, J.: Suffix Cactus: A Cross Between Suffix Tree and Suffix Array. In: Galil, Z., Ukkonen, E. (eds.) CPM 1995. LNCS, vol. 937, pp. 191–204. Springer, Heidelberg (1995)
Kärkkäinen, J., Ukkonen, E.: Sparse Suffix Tree. In: Cai, J.-Y., Wong, C.K. (eds.) COCOON 1996. LNCS, vol. 1090, Springer, Heidelberg (1996)
Kurtz, S.: Reducing Space Requirement of Suffix Trees. Software Practice and Experience 29(13), 1149–1171 (1999)
Kurtz, S., Schleiermacher, C.: REPuter: Fast Computation of Maximal Repeats in Complete Genomes. Bioinformatics 15(5), 426–427 (1999)
Manber, U., Myers, E.W.: Sufix Arrays: A New Method for On-line String Searches. SIAM Journal on Computing 22(5), 935–948 (1993)
Meek, C., Patel, J.M., Kasetty, S.: OASIS: An Online and Accurate Technique for Localalignment Searches on Biological Sequences. In: VLDB (2003)
Navarro, G., Baeza-Yates, R., Tariho, J.: Indexing Methods for Approximate String Matching. IEEE Data Engineering Bulletin 24(4), 19–27 (2001)
Tata, S., Hankins, R.A., Patel, J.M.: Practical Sufix Tree Construction. In: proceedings of the 30th VLDB Conference, Toronto (2004)
The Growth of GenBank, NCBI (2004), http://www.ncbi.nlm.nih.gov/genbank/
MPICH project: http://www-unix.mcs.anl.gov/mpi/mpich/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chen, C., Schmidt, B. (2005). Parallel Construction of Large Suffix Trees on a PC Cluster. In: Cunha, J.C., Medeiros, P.D. (eds) Euro-Par 2005 Parallel Processing. Euro-Par 2005. Lecture Notes in Computer Science, vol 3648. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11549468_134
Download citation
DOI: https://doi.org/10.1007/11549468_134
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28700-1
Online ISBN: 978-3-540-31925-2
eBook Packages: Computer ScienceComputer Science (R0)