Skip to main content

Trie-based data structures for sequence assembly

  • Conference paper
  • First Online:
Combinatorial Pattern Matching (CPM 1997)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1264))

Included in the following conference series:

Abstract

We investigate the application of trie-based data structures, suffix trees and suffix arrays in the problem of overlap detection in fragment assembly. Both data structures are theoretically and experimentally analyzed on speed and space. By using heuristics, we can greatly reduce the calls to the time-consuming dynamic programming, and have improved the speed of overlap detection up to 1,000 times with high accuracy in our collaborative DNA sequencing with Brookhaven National Laboratory. We also studied the problem of approximating maximum space savings in tries structures for unification factoring in logic programming, which is proved to be hard.

Supported by ONR award 400x116yip01 and NSF Grant CCR-9625669.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S.F. Altschul, W. Gish, W. Miller, E.W. Myers, and D.J. Lipman. Basic local alignment search tool. J. Mol. Biol., 215:403–410, 1990.

    Google Scholar 

  2. M Bellare, O. Goldreich, and M. Sudan. Free bits, PCPs, and non-approximability — towards tight results. In Proc. IEEE 36th Symp. Foundations of Computer Science, pages 422–431, 1995.

    Google Scholar 

  3. D.R. Clark and J.I. Munro. Efficient suffix trees on secondary storage. In Proc. Seventh ACM Symp. on Discrete Algorithms (SODA), pages 383–391, 1996.

    Google Scholar 

  4. S. Dawson, C.R. Ramakrishnan, I.V. Ramakrishnan, K. Sagonas, T. Swift, and D.S. Warren. Unification factoring for efficient execution of logic programs. In 2nd ACM Symposium on Principles of Programming Languages (POPL '95), pages 247–258, 1995.

    Google Scholar 

  5. S. Dawson, C.R. Ramakrishnan, and T. Swift. Principles and practice of unification factoring. In ACM Trans. on Programming Languages (TOPLAS), pages 528–563, 1996.

    Google Scholar 

  6. M.L. Engle and C. Burks. Artificially generated data sets for testing DNA fragment assembly algorithms. Genomics, 16:286–288, 1993.

    Google Scholar 

  7. P. Green. Documentation for phrap. Genome Center, University of Washington, http://bozeman.mbt.washington.edu, 1996.

    Google Scholar 

  8. J. Kececioglu and E.W. Myers. Exact and approximate algorithms for the sequence reconstruction problem. Algorithmica, 13:5–51, 1995.

    Google Scholar 

  9. C.-L. Lin. Optimizing tries for ordered pattern matching is π p2 -complete. In Proc. 10th IEEE Structures in Complexity Theory Conference, pages 238–244, 1995.

    Google Scholar 

  10. C. Lund and M. Yannakakis. The approximation of maximum subgraph problems. In Proc. 20th ICALP, pages 40–51, 1992.

    Google Scholar 

  11. U. Manber and E.W. Myers. Suffix arrays: A new method for on-line string searches. SIAM J. Computing, 22:935–948, 1993.

    Google Scholar 

  12. E. W. Myers. Towards simplifying and accurately formulating fragment assembly. J. Comp. Biol., 2(2):275–290, 1995.

    Google Scholar 

  13. W.R. Pearson and D.J. Lipman. Improved tools for biological sequence comparison. In Proc. Natl. Acad. Sci., pages 2444–2448, 1988.

    Google Scholar 

  14. H. Simon. On approximate solutions for combinatorial optimization problems. SIAM J. Discrete Math., 3:294–310, 1990.

    Google Scholar 

  15. G.G. Sutton, O. White, M.D. Admas, and A.R. Kerlavage. TIGR assembler: a new tool for assembling large shotgun sequencing projects. Genome Science and Technology, 1:9–19, 1995.

    Google Scholar 

  16. M. S. Waterman. Introduction to Computational Biology. Chapman & Hall, London, UK, 1995.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Alberto Apostolico Jotun Hein

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chen, T., Skiena, S.S. (1997). Trie-based data structures for sequence assembly. In: Apostolico, A., Hein, J. (eds) Combinatorial Pattern Matching. CPM 1997. Lecture Notes in Computer Science, vol 1264. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63220-4_61

Download citation

  • DOI: https://doi.org/10.1007/3-540-63220-4_61

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-63220-7

  • Online ISBN: 978-3-540-69214-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics