Parallel Algorithms for the Analysis of Biological Sequences

Mauri, Giancarlo; Pavesi, Giulio

doi:10.1007/3-540-44743-1_48

Parallel Algorithms for the Analysis of Biological Sequences

Giancarlo Mauri⁵ &
Giulio Pavesi⁵

Conference paper
First Online: 01 January 2001

331 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2127))

Abstract

In the last few years, molecular biology has produced a large amount of data, mainly in the form of sequences, that is, strings over an alphabet of four (DNA/RNA) or twenty symbols (proteins). For computational biologists the main challenge now is to provide efficient tools for the analysis and the comparison of the sequences. In this paper, we introduce and briefly discuss some open problems, and present a parallel algorithm that finds repeated substrings in a DNA sequence or common substrings in a set of sequences. The occurrences of the substrings can be approximate, that is, can differ up to a maximum number of mismatches that depends on the length of the substring itself. The output of the algorithm is sorted according to different statistical measures of significance. The algorithm has been successfully implemented on a cluster of workstations.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

M. D. Adams et al. The genome sequence of Drosophila Melanogaster. Science 287 (2000), pp. 2185–2195.
Article Google Scholar
The Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis Thaliana. Nature, 408 (2000), pp. 796–815.
Article Google Scholar
D. Gusfield. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York, NY, 1997.
Google Scholar
M. F. Sagot. Spelling approximate repeated or common motifs using a suffix tree. Proc. of Latin ⊃8, Springer Verlag LNCS 1380, pages 111–127, 1998.
Google Scholar
E. Ukkonen. On-line construction of suffix trees. Algorithmica, 14 (1995), pp. 249–260.
Article MATH MathSciNet Google Scholar
P. Weiner. Linear pattern matching algorithms. In Proceedings of the 14th IEEE Symp. on Switching and Automata Theory, pp. 1–11, 1973.
Google Scholar
R. S. Boyer, J. S. Moore. A fast string searching algorithm. Communications of the ACM, 20 (1977), pp. 762–772.
Article Google Scholar
D. E. Knuth, J. H. Morris, V. B. Pratt. Fast pattern matching in strings. SIAM Journal of Computing, 6 (1977), pp. 323–350.
Article MATH MathSciNet Google Scholar
G. Reinert, S. Scabath, M.S. Waterman. Probabilistic and statistical properties of words. Journal of Computational Biology, 7 (2000), pp. 1–48.
Article Google Scholar
A. Apostolico, M. E. Bock, S. Lonardi, X. Xu. Efficient detection of unusual words. Journal of Computational Biology, 7 (2000), pp. 71–94.
Article Google Scholar
G. Pavesi, G. Mauri, G. Pesole. An algorithm for finding signals of unknown length in DNA sequences. In Proceedings of the 9th International Conference on Intelligent Systems for Molecular Biology (ISMB 2001), to appear.
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, Systems and Communication, University of Milan-Bicocca, Milan, Italy
Giancarlo Mauri & Giulio Pavesi

Authors

Giancarlo Mauri
View author publications
You can also search for this author in PubMed Google Scholar
Giulio Pavesi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Supercomputer Software Department, ICMand MG SB RAS, pr. Lavrentiev 6, 630090, Novosibirsk, Russia
Victor Malyshkin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mauri, G., Pavesi, G. (2001). Parallel Algorithms for the Analysis of Biological Sequences. In: Malyshkin, V. (eds) Parallel Computing Technologies. PaCT 2001. Lecture Notes in Computer Science, vol 2127. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44743-1_48

Download citation

DOI: https://doi.org/10.1007/3-540-44743-1_48
Published: 24 August 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42522-9
Online ISBN: 978-3-540-44743-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics