Expectation of Strings with Mismatches under Markov Chain Distribution

Pizzi, Cinzia; Bianco, Mauro

doi:10.1007/978-3-642-03784-9_22

Cinzia Pizzi¹⁹ &
Mauro Bianco²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5721))

Included in the following conference series:

International Symposium on String Processing and Information Retrieval

1071 Accesses
2 Citations

Abstract

We study a problem related to the extraction of over-represented words from a given source text x, of length n. The words are allowed to occur with k mismatches, and x is produced by a source over an alphabet Σ according to a Markov chain of order p. We propose an online algorithm to compute the expected number of occurrences of a word y of length m in O(mk |Σ|^p + 1). We also propose an offline algorithm to compute the probability of any word that occurs in the text in O(k|Σ|²) after O(nk |Σ|^p + 1) pre-processing. This algorithm allows us to compute the expectation for all the words in a text of length n in O(kn ²|Σ|² + nk |Σ|^p + 1), rather than in O(n ³ |Σ|^p + 1) that can be obtained with other methods. Although this study was motivated by the motif discovery problem in bioinformatics, the results find their applications in any other domain involving combinatorics on words.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Apostolico, A., Pizzi, C.: Motif Discovery by Monotone Scores. Discrete Applied Mathematics 155(6-7), 695–706 (2007); special issue Computational Molecular Biology Series
Article MathSciNet MATH Google Scholar
Bailey, T.L., Williams, N., Misleh, C., Li, W.W.: MEME: Discovering and Analyzing DNA and Protein Sequence Motifs. NAR 34, W369–W373, (2006)
Article Google Scholar
Brazma, A., Jonassen, I., Ukkonen, E., Vilo, J.: Predicting Gene Regulatory Elements in Silico on a Genomic Scale. Genome Research 11, 1202–1215 (1998)
Google Scholar
Boeva, V., Clément, J., Régnier, M., Vandenbogaert, M.: Assessing the significance of sets of words. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol. 3537, pp. 358–370. Springer, Heidelberg (2005)
Chapter Google Scholar
Régnier, M., Vandenbogaert, M.: Comparison of Statistical Significance Criteria. J. Bioinformatics and Computational Biology 4(2), 537–552 (2006)
Article Google Scholar
Sandve, K., Drablos, F.: A survey of motif discovery methods in an integrated framework. Biology Direct 1(11) (2006)
Google Scholar
Sinha, S., Tompa, M.: YMF: a Program for Discovery of Novel Transcription Factor Binding Sites by Statistical Overrepresentation. NAR 31(13), 3586–3588 (2003)
Article Google Scholar
Stormo, G.D.: DNA Binding Sites: Representation and Discovery. Bioinformatics 16(1), 16–23 (2000)
Article Google Scholar
Tompa, M., et al.: Assessing Computational Tools for the Discovery of Transcription Factor Binding Sites. Nature Biotechnology 23(1), 137–144 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Ingegneria dell’ Informazione, Università di Padova, Italy
Cinzia Pizzi
Department of Computer Science, Texas A&M University, USA
Mauro Bianco

Authors

Cinzia Pizzi
View author publications
You can also search for this author in PubMed Google Scholar
Mauro Bianco
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Swedish Institute of Computer Science, Kista, Sweden
Jussi Karlgren
Department of Computer Science and Engineering, Helsinki University of Technology, P.O. Box 5400, 02015 HUT, Espoo, Finland
Jorma Tarhio
Department of Computer Sciences, University of Tampere, Tampere, Finland
Heikki Hyyrö

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pizzi, C., Bianco, M. (2009). Expectation of Strings with Mismatches under Markov Chain Distribution. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds) String Processing and Information Retrieval. SPIRE 2009. Lecture Notes in Computer Science, vol 5721. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03784-9_22

Download citation

DOI: https://doi.org/10.1007/978-3-642-03784-9_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03783-2
Online ISBN: 978-3-642-03784-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics