Fast Search Algorithms for Position Specific Scoring Matrices

Pizzi, Cinzia; Rastas, Pasi; Ukkonen, Esko

doi:10.1007/978-3-540-71233-6_19

Fast Search Algorithms for Position Specific Scoring Matrices

Cinzia Pizzi¹,
Pasi Rastas¹ &
Esko Ukkonen¹

Conference paper

1188 Accesses
7 Citations

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4414))

Abstract

Fast search algorithms for finding good instances of patterns given as position specific scoring matrices are developed, and some empirical results on their performance on DNA sequences are reported. The algorithms basically generalize the Aho–Corasick, filtration, and superalphabet techniques of string matching to the scoring matrix search. As compared to the naive search, our algorithms can be faster by a factor which is proportional to the length of the pattern. In our experimental comparison of different algorithms the new algorithms were clearly faster than the naive method and also faster than the well-known lookahead scoring algorithm. The Aho–Corasick technique is the fastest for short patterns and high significance thresholds of the search. For longer patterns the filtration method is better while the superalphabet technique is the best for very long patterns and low significance levels. We also observed that the actual speed of all these algorithms is very sensitive to implementation details.

Supported by the Academy of Finland under grant 211496 (From Data to Knowledge) and by EU project Regulatory Genomics.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic Local Alignment Search Tool. Journal of Molecular Biology 215(3), 403–410 (1990)
Google Scholar
Attwood, T.K., Beck, M.E.: PRINTS - A Protein Motif Finger-print Database. Protein Engineering 7(7), 841–848 (1994)
Article Google Scholar
Beckstette, M., Homann, R., Giegerich, R., Kurtz, S.: Fast index based algorithms and software for matching position specific scoring matrices. BMC Bioinformatics 7, 389 (2006)
Article Google Scholar
Crochemore, M., Rytter, W.: Text Algorithms. Oxford University Press, Oxford (1994)
MATH Google Scholar
Dorohonceanu, B., Neville-Manning, C.G.: Accelerating Protein Classification Using Suffix Trees. In: Proc. of the 8th International Conference on Intelligent Systems for Molecular Biology (ISMB), pp. 128–133 (2000)
Google Scholar
Freschi, V., Bogliolo, A.: Using Sequence Compression to Speedup Probabilistic Profile Matching. Bioinformatics 21(10), 2225–2229 (2005)
Article Google Scholar
Gribskov, M., McLachlan, A.D., Eisenberg, D.: Profile Analysis: Detection of Distantly related Proteins. Proc. Natl. Acad. Sci. 84(13), 4355–4358 (1987)
Article Google Scholar
Hallikas, O., Palin, K., Sinjushina, N., Rautiainen, R., Partanen, J., Ukkonen, E., Taipale, J.: Genome-wide prediction of mammalian enhancers based on analysis of transcription-factor binding affinity. Cell 124, 47–59 (2006)
Article Google Scholar
Henikoff, S., Wallace, J.C., Brown, J.P.: Finding protein similarities with nucleotide sequence databases. Methods Enzymol. 183, 111–132 (1990)
Article Google Scholar
Henikoff, J.G., Greene, E.A., Pietrokovski, S., Henikoff, S.: Increased Coverage of Protein Families with the Blocks Database Servers. Nucleic Acids Research 28(1), 228–230 (2000)
Article Google Scholar
Liefhooghe, A., Touzet, H., Varre, J.: Large Scale Matching for Position Weight Matrices. In: Pinho, L.M., González Harbour, M. (eds.) Ada-Europe 2006. LNCS, vol. 4006, pp. 401–412. Springer, Heidelberg (2006)
Google Scholar
Matys, V., Fricke, E., Geffers, R., Gossling, E., Haubrock, M., Hehl, R., Hornischer, K., Karas, D., Kel, A.E., Kel-Margoulis, O.V., Kloos, D.U., Land, S., Lewicki-Potapov, B., Michael, H., Munch, R., Reuter, I., Rotert, S., Saxel, H., Scheer, M., Thiele, S., Wingender, E.: TRANSFAC: Transcriptional Regulation, from Patterns to Profiles. Nucleic Acids Research 31(1), 374–378 (2003)
Article Google Scholar
Navarro, G., Raffinot, M.: Flexible Pattern Matching in Strings. Cambridge University Press, Cambridge (2002)
MATH Google Scholar
Quandt, K., Frech, K., Karas, H., Wingender, E., Werner, T.: MatInd and MatInspector: New Fast and Versatile Tools for Detection of Consensus Matches in Nucleotide Sequences Data. Nucleic Acid Research 23(23), 4878–4884 (1995)
Article Google Scholar
Rajasekaran, S., Jin, X., Spouge, J.L.: The Efficient Computation of Position-Specific Match Scores with the Fast Fourier Transform. Journal of Computational Biology 9(1), 23–33 (2002)
Article Google Scholar
Sandelin, A., Alkema, W., Engstrom, P., Wasserman, W.W., Lanhard, B.: JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Research 32, D91–D94 (2004)
Article Google Scholar
Scordis, P., Flower, D.R., Attwood, T.: FingerPRINTScan: Intelligent Searching of the PRINTS Motif Database. Bioinformatics 15(10), 799–806 (1999)
Article Google Scholar
Staden, R.: Methods for calculating the probabilities of finding patterns in sequences. CABIOS 5(2), 89–96 (1989)
Google Scholar
Stormo, G.D., Schneider, T.D., Gold, L.M., Ehrenfeucht, A.: Use of the ‘Perceptron’ Algorithm to Distinguish Translational Initiation Sites in E.coli. Nucleic Acid Research 10, 2997–3012 (1982)
Article Google Scholar
Stormo, G.D.: Probing Information Content of DNA-binding Sites. Methods in Enzymology 208, 458–468 (1991)
Google Scholar
Ukkonen, E.: Approximate string-matching with q-grams and maximal matches. Theoretical Computer Science 92, 191–211 (1992)
Article MATH MathSciNet Google Scholar
Wallace, J.C., Henikoff, S.: PATMAT: a Searching and Extraction Program for Sequence, Pattern and Block Queries and Databases. CABIOS 8(3), 249–254 (1992)
Google Scholar
Wu, T.D., Neville-Manning, C.G., Brutlag, D.L.: Fast Probabilistic Analysis of Sequence Function using Scoring Matrices. Bioinformatics 16(3), 233–244 (2000)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and, Helsinki Institute for Information Technology HIIT, P.O Box 68, FIN-00014 University of Helsinki, Finland
Cinzia Pizzi, Pasi Rastas & Esko Ukkonen

Authors

Cinzia Pizzi
View author publications
You can also search for this author in PubMed Google Scholar
Pasi Rastas
View author publications
You can also search for this author in PubMed Google Scholar
Esko Ukkonen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Sepp Hochreiter Roland Wagner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pizzi, C., Rastas, P., Ukkonen, E. (2007). Fast Search Algorithms for Position Specific Scoring Matrices. In: Hochreiter, S., Wagner, R. (eds) Bioinformatics Research and Development. BIRD 2007. Lecture Notes in Computer Science(), vol 4414. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71233-6_19

Download citation

DOI: https://doi.org/10.1007/978-3-540-71233-6_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71232-9
Online ISBN: 978-3-540-71233-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics