Skip to main content

Fast Search Algorithms for Position Specific Scoring Matrices

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4414))

Abstract

Fast search algorithms for finding good instances of patterns given as position specific scoring matrices are developed, and some empirical results on their performance on DNA sequences are reported. The algorithms basically generalize the Aho–Corasick, filtration, and superalphabet techniques of string matching to the scoring matrix search. As compared to the naive search, our algorithms can be faster by a factor which is proportional to the length of the pattern. In our experimental comparison of different algorithms the new algorithms were clearly faster than the naive method and also faster than the well-known lookahead scoring algorithm. The Aho–Corasick technique is the fastest for short patterns and high significance thresholds of the search. For longer patterns the filtration method is better while the superalphabet technique is the best for very long patterns and low significance levels. We also observed that the actual speed of all these algorithms is very sensitive to implementation details.

Supported by the Academy of Finland under grant 211496 (From Data to Knowledge) and by EU project Regulatory Genomics.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic Local Alignment Search Tool. Journal of Molecular Biology 215(3), 403–410 (1990)

    Google Scholar 

  2. Attwood, T.K., Beck, M.E.: PRINTS - A Protein Motif Finger-print Database. Protein Engineering 7(7), 841–848 (1994)

    Article  Google Scholar 

  3. Beckstette, M., Homann, R., Giegerich, R., Kurtz, S.: Fast index based algorithms and software for matching position specific scoring matrices. BMC Bioinformatics 7, 389 (2006)

    Article  Google Scholar 

  4. Crochemore, M., Rytter, W.: Text Algorithms. Oxford University Press, Oxford (1994)

    MATH  Google Scholar 

  5. Dorohonceanu, B., Neville-Manning, C.G.: Accelerating Protein Classification Using Suffix Trees. In: Proc. of the 8th International Conference on Intelligent Systems for Molecular Biology (ISMB), pp. 128–133 (2000)

    Google Scholar 

  6. Freschi, V., Bogliolo, A.: Using Sequence Compression to Speedup Probabilistic Profile Matching. Bioinformatics 21(10), 2225–2229 (2005)

    Article  Google Scholar 

  7. Gribskov, M., McLachlan, A.D., Eisenberg, D.: Profile Analysis: Detection of Distantly related Proteins. Proc. Natl. Acad. Sci. 84(13), 4355–4358 (1987)

    Article  Google Scholar 

  8. Hallikas, O., Palin, K., Sinjushina, N., Rautiainen, R., Partanen, J., Ukkonen, E., Taipale, J.: Genome-wide prediction of mammalian enhancers based on analysis of transcription-factor binding affinity. Cell 124, 47–59 (2006)

    Article  Google Scholar 

  9. Henikoff, S., Wallace, J.C., Brown, J.P.: Finding protein similarities with nucleotide sequence databases. Methods Enzymol. 183, 111–132 (1990)

    Article  Google Scholar 

  10. Henikoff, J.G., Greene, E.A., Pietrokovski, S., Henikoff, S.: Increased Coverage of Protein Families with the Blocks Database Servers. Nucleic Acids Research 28(1), 228–230 (2000)

    Article  Google Scholar 

  11. Liefhooghe, A., Touzet, H., Varre, J.: Large Scale Matching for Position Weight Matrices. In: Pinho, L.M., González Harbour, M. (eds.) Ada-Europe 2006. LNCS, vol. 4006, pp. 401–412. Springer, Heidelberg (2006)

    Google Scholar 

  12. Matys, V., Fricke, E., Geffers, R., Gossling, E., Haubrock, M., Hehl, R., Hornischer, K., Karas, D., Kel, A.E., Kel-Margoulis, O.V., Kloos, D.U., Land, S., Lewicki-Potapov, B., Michael, H., Munch, R., Reuter, I., Rotert, S., Saxel, H., Scheer, M., Thiele, S., Wingender, E.: TRANSFAC: Transcriptional Regulation, from Patterns to Profiles. Nucleic Acids Research 31(1), 374–378 (2003)

    Article  Google Scholar 

  13. Navarro, G., Raffinot, M.: Flexible Pattern Matching in Strings. Cambridge University Press, Cambridge (2002)

    MATH  Google Scholar 

  14. Quandt, K., Frech, K., Karas, H., Wingender, E., Werner, T.: MatInd and MatInspector: New Fast and Versatile Tools for Detection of Consensus Matches in Nucleotide Sequences Data. Nucleic Acid Research 23(23), 4878–4884 (1995)

    Article  Google Scholar 

  15. Rajasekaran, S., Jin, X., Spouge, J.L.: The Efficient Computation of Position-Specific Match Scores with the Fast Fourier Transform. Journal of Computational Biology 9(1), 23–33 (2002)

    Article  Google Scholar 

  16. Sandelin, A., Alkema, W., Engstrom, P., Wasserman, W.W., Lanhard, B.: JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Research 32, D91–D94 (2004)

    Article  Google Scholar 

  17. Scordis, P., Flower, D.R., Attwood, T.: FingerPRINTScan: Intelligent Searching of the PRINTS Motif Database. Bioinformatics 15(10), 799–806 (1999)

    Article  Google Scholar 

  18. Staden, R.: Methods for calculating the probabilities of finding patterns in sequences. CABIOS 5(2), 89–96 (1989)

    Google Scholar 

  19. Stormo, G.D., Schneider, T.D., Gold, L.M., Ehrenfeucht, A.: Use of the ‘Perceptron’ Algorithm to Distinguish Translational Initiation Sites in E.coli. Nucleic Acid Research 10, 2997–3012 (1982)

    Article  Google Scholar 

  20. Stormo, G.D.: Probing Information Content of DNA-binding Sites. Methods in Enzymology 208, 458–468 (1991)

    Google Scholar 

  21. Ukkonen, E.: Approximate string-matching with q-grams and maximal matches. Theoretical Computer Science 92, 191–211 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  22. Wallace, J.C., Henikoff, S.: PATMAT: a Searching and Extraction Program for Sequence, Pattern and Block Queries and Databases. CABIOS 8(3), 249–254 (1992)

    Google Scholar 

  23. Wu, T.D., Neville-Manning, C.G., Brutlag, D.L.: Fast Probabilistic Analysis of Sequence Function using Scoring Matrices. Bioinformatics 16(3), 233–244 (2000)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Sepp Hochreiter Roland Wagner

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Pizzi, C., Rastas, P., Ukkonen, E. (2007). Fast Search Algorithms for Position Specific Scoring Matrices. In: Hochreiter, S., Wagner, R. (eds) Bioinformatics Research and Development. BIRD 2007. Lecture Notes in Computer Science(), vol 4414. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71233-6_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71233-6_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71232-9

  • Online ISBN: 978-3-540-71233-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics