Abstract.
Patterns consisting of strings with a bounded number of mismatches are central to coding theory and find multiple applications in text processing and computational biology. In this latter field, the presence of over-represented patterns of this kind has been linked, for instance, to modeling regulatory regions in biosequences. The study and computation of expected number of occurrences and related scores for these patterns is made difficult by the sheer explosion of the roster of candidates that need to be evaluated. In recent work, properties of pattern saturation and score monotonicity have proved capable to mitigate this problem. In such a context, expectation and score monotonicity has been established within the i.i.d. model for all cases of interest except that of a fixed word length with a varying number of mismatches. The present paper completes this investigation by showing that the expected number of occurrences in a textstring for such a word is bi-tonic, that is, behaves as a unimodal function of the number of errors. This extends to this case the time and space savings brought about by discovery algorithms based on pattern saturation.
Similar content being viewed by others
Author information
Authors and Affiliations
Corresponding author
Additional information
Work Supported in part by the Italian Ministry of University and Research under the Bi-National Project FIRB RBIN04BYZ7, and by the Research Program of Georgia Tech. An extended abstract related to this work was presented at the Dagstuhl Seminar Dagstuhl on “Combinatorial and Algorithmic Foundations of Pattern and Association Discovery”, May 14-19, 2006 [3].
Rights and permissions
About this article
Cite this article
Apostolico, A., Pizzi, C. Scoring Unusual Words with Varying Mismatch Errors. Math.comput.sci. 1, 639–653 (2008). https://doi.org/10.1007/s11786-007-0032-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11786-007-0032-4