Skip to main content

An Approximate de Bruijn Graph Approach to Multiple Local Alignment and Motif Discovery in Protein Sequences

  • Conference paper
Data Mining and Bioinformatics (VDMB 2006)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4316))

Included in the following conference series:

Abstract

Motif discovery is an important problem in protein sequence analysis. Computationally, it can be viewed as an application of the more general multiple local alignment problem, which often encounters the difficulty of computer time when aligning many sequences. We introduce a new algorithm for multiple local alignment for protein sequences, based on the de Bruijn graph approach first proposed by Zhang and Waterman for aligning DNA sequence. We generalize their approach to aligning protein sequences by building an approximate de Bruijn graph to allow gluing similar but not identical amino acids. We implement this algorithm and test it on motif discovery of 100 sets of protein sequences. The results show that our method achieved comparable results as other popular motif discovery programs, while offering advantages in terms of speed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bailey, T.L., Elkan, C.: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28–36. AAAI Press, Menlo Park (1994)

    Google Scholar 

  2. Lawrence, C., Altschul, S., Bogouski, M., Liu, J., Neuwald, A., Wooten, J.: Detecting subtle sequence signals: A gibbs sampling strategy for multiple alignment. Science 262, 208–214 (1993)

    Article  Google Scholar 

  3. Henikoff, S., Henikoff, J.G., Alford, W.J., Pietrokovski, S.: Automated construction and graphical presentation of protein blocks from unaligned sequences. Gene 163, GC17–GC26 (1995)

    Article  Google Scholar 

  4. Zhang, Y., Waterman, M.S.: An Eulerian path approach to local multiple alignment for DNA sequences. PNAS 102, 1285–1290 (2005)

    Article  MathSciNet  Google Scholar 

  5. Zhang, Y., Waterman, M.S.: An eulerian path approach to global multiple alignment for DNA sequences. Journal of Computational Biology 10, 803–819 (2003)

    Article  Google Scholar 

  6. Dayhoff, M., Schwartz, R., Orcutt, B.: A model of evolutionary change in proteins. In: Atlas of Protein Sequence and Structure. National Biomedical Research Foundation, vol. 5(3), pp. 345–352 (1978)

    Google Scholar 

  7. Henikoff, S., Henikoff, J.: Amino Acid Substitution Matrices from Protein Blocks. PNAS 89, 10915–10919 (1992)

    Article  Google Scholar 

  8. Falquet, L., Pagni, M., Bucher, P., Hulo, N., Sigrist, C., Hofmann, K., Bairoch, A.: The prosite database, its status in 2002. Nucleic Acids Res. 30, 235–238 (2002)

    Article  Google Scholar 

  9. Jonassen, I.: Efficient discovery of conserved patterns using a pattern graph. CABIOS 13, 509–522 (1997)

    Google Scholar 

  10. van Lint, J., Wilson, R.: A Course in Combinatorics, 2nd edn. Cambridge University Press, Cambridge (2001)

    MATH  Google Scholar 

  11. Myers, E.W., Miller, W.: Optimal alignments in linear space. CABIOS 4, 11–17 (1988)

    Google Scholar 

  12. Smith, T., Waterman, M.: Identification of common molecular subsequences. Journal of Molecular Biology 147, 195–197 (1981)

    Article  Google Scholar 

  13. Hart, R., Royyuru, A., Stolovitzky, G., Califano, A.: Systematic and fully automated identification of protein sequence patterns. Journal of Computational Biology 7(3-4), 585–600 (2000)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Patwardhan, R., Tang, H., Kim, S., Dalkilic, M. (2006). An Approximate de Bruijn Graph Approach to Multiple Local Alignment and Motif Discovery in Protein Sequences. In: Dalkilic, M.M., Kim, S., Yang, J. (eds) Data Mining and Bioinformatics. VDMB 2006. Lecture Notes in Computer Science(), vol 4316. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11960669_14

Download citation

  • DOI: https://doi.org/10.1007/11960669_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68970-6

  • Online ISBN: 978-3-540-68971-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics