Skip to main content

Parallel Algorithms for the Analysis of Biological Sequences

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2127))

Abstract

In the last few years, molecular biology has produced a large amount of data, mainly in the form of sequences, that is, strings over an alphabet of four (DNA/RNA) or twenty symbols (proteins). For computational biologists the main challenge now is to provide efficient tools for the analysis and the comparison of the sequences. In this paper, we introduce and briefly discuss some open problems, and present a parallel algorithm that finds repeated substrings in a DNA sequence or common substrings in a set of sequences. The occurrences of the substrings can be approximate, that is, can differ up to a maximum number of mismatches that depends on the length of the substring itself. The output of the algorithm is sorted according to different statistical measures of significance. The algorithm has been successfully implemented on a cluster of workstations.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. M. D. Adams et al. The genome sequence of Drosophila Melanogaster. Science 287 (2000), pp. 2185–2195.

    Article  Google Scholar 

  2. The Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis Thaliana. Nature, 408 (2000), pp. 796–815.

    Article  Google Scholar 

  3. D. Gusfield. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York, NY, 1997.

    Google Scholar 

  4. M. F. Sagot. Spelling approximate repeated or common motifs using a suffix tree. Proc. of Latin ⊃8, Springer Verlag LNCS 1380, pages 111–127, 1998.

    Google Scholar 

  5. E. Ukkonen. On-line construction of suffix trees. Algorithmica, 14 (1995), pp. 249–260.

    Article  MATH  MathSciNet  Google Scholar 

  6. P. Weiner. Linear pattern matching algorithms. In Proceedings of the 14th IEEE Symp. on Switching and Automata Theory, pp. 1–11, 1973.

    Google Scholar 

  7. R. S. Boyer, J. S. Moore. A fast string searching algorithm. Communications of the ACM, 20 (1977), pp. 762–772.

    Article  Google Scholar 

  8. D. E. Knuth, J. H. Morris, V. B. Pratt. Fast pattern matching in strings. SIAM Journal of Computing, 6 (1977), pp. 323–350.

    Article  MATH  MathSciNet  Google Scholar 

  9. G. Reinert, S. Scabath, M.S. Waterman. Probabilistic and statistical properties of words. Journal of Computational Biology, 7 (2000), pp. 1–48.

    Article  Google Scholar 

  10. A. Apostolico, M. E. Bock, S. Lonardi, X. Xu. Efficient detection of unusual words. Journal of Computational Biology, 7 (2000), pp. 71–94.

    Article  Google Scholar 

  11. G. Pavesi, G. Mauri, G. Pesole. An algorithm for finding signals of unknown length in DNA sequences. In Proceedings of the 9th International Conference on Intelligent Systems for Molecular Biology (ISMB 2001), to appear.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mauri, G., Pavesi, G. (2001). Parallel Algorithms for the Analysis of Biological Sequences. In: Malyshkin, V. (eds) Parallel Computing Technologies. PaCT 2001. Lecture Notes in Computer Science, vol 2127. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44743-1_48

Download citation

  • DOI: https://doi.org/10.1007/3-540-44743-1_48

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42522-9

  • Online ISBN: 978-3-540-44743-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics