Scoring Unusual Words with Varying Mismatch Errors

Apostolico, Alberto; Pizzi, Cinzia

doi:10.1007/s11786-007-0032-4

Scoring Unusual Words with Varying Mismatch Errors

Published: 22 May 2008

Volume 1, pages 639–653, (2008)
Cite this article

Mathematics in Computer Science Aims and scope Submit manuscript

Alberto Apostolico^1,2 &
Cinzia Pizzi^1,3

53 Accesses
3 Citations
Explore all metrics

Abstract.

Patterns consisting of strings with a bounded number of mismatches are central to coding theory and find multiple applications in text processing and computational biology. In this latter field, the presence of over-represented patterns of this kind has been linked, for instance, to modeling regulatory regions in biosequences. The study and computation of expected number of occurrences and related scores for these patterns is made difficult by the sheer explosion of the roster of candidates that need to be evaluated. In recent work, properties of pattern saturation and score monotonicity have proved capable to mitigate this problem. In such a context, expectation and score monotonicity has been established within the i.i.d. model for all cases of interest except that of a fixed word length with a varying number of mismatches. The present paper completes this investigation by showing that the expected number of occurrences in a textstring for such a word is bi-tonic, that is, behaves as a unimodal function of the number of errors. This extends to this case the time and space savings brought about by discovery algorithms based on pattern saturation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Space-Efficient Detection of Unusual Words

On Long Words Avoiding Zimin Patterns

Article 14 March 2019

Fast Indexes for Gapped Pattern Matching

Author information

Authors and Affiliations

Dipartimento di Ingegneria dell’ Informazione, Università di Padova, via Gradenigo 6/A, 35131, Padova, Italy
Alberto Apostolico & Cinzia Pizzi
College of Computing, Georgia, Institute of Technology, 801 Atlantic Drive, Atlanta, GA, 30318, USA
Alberto Apostolico
Projet Helix, INRIA Rhône-Alpes and Laboratoire de Biométrie et Biologie Evolutive (UMR 5558), CNRS, Univ. Lyon 1, Lyon, France
Cinzia Pizzi

Authors

Alberto Apostolico
View author publications
You can also search for this author in PubMed Google Scholar
Cinzia Pizzi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alberto Apostolico.

Additional information

Work Supported in part by the Italian Ministry of University and Research under the Bi-National Project FIRB RBIN04BYZ7, and by the Research Program of Georgia Tech. An extended abstract related to this work was presented at the Dagstuhl Seminar Dagstuhl on “Combinatorial and Algorithmic Foundations of Pattern and Association Discovery”, May 14-19, 2006 [3].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Apostolico, A., Pizzi, C. Scoring Unusual Words with Varying Mismatch Errors. Math.comput.sci. 1, 639–653 (2008). https://doi.org/10.1007/s11786-007-0032-4

Download citation

Received: 30 September 2007
Accepted: 25 October 2007
Published: 22 May 2008
Issue Date: June 2008
DOI: https://doi.org/10.1007/s11786-007-0032-4

Mathematics Subject Classification (2000).

Keywords.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scoring Unusual Words with Varying Mismatch Errors

Abstract.

Access this article

Similar content being viewed by others

Space-Efficient Detection of Unusual Words

On Long Words Avoiding Zimin Patterns

Fast Indexes for Gapped Pattern Matching

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Mathematics Subject Classification (2000).

Keywords.

Navigation

Scoring Unusual Words with Varying Mismatch Errors

Abstract.

Access this article

Similar content being viewed by others

Space-Efficient Detection of Unusual Words

On Long Words Avoiding Zimin Patterns

Fast Indexes for Gapped Pattern Matching

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Mathematics Subject Classification (2000).

Keywords.

Search

Navigation