Skip to main content
Log in

Approximate regular expression pattern matching with concave gap penalties

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

Given a sequenceA of lengthM and a regular expressionR of lengthP, an approximate regular expression pattern-matching algorithm computes the score of the optimal alignment betweenA and one of the sequencesB exactly matched byR. An alignment between sequencesA=a1a2 ... aM andB=b1b2... bN is a list of ordered pairs, 〈(i1,j1), (i2j2), ..., (it,jtt)〉 such that ik < ik+1 and jk < jk+1. In this case the alignmentaligns symbols aik and bjk, and leaves blocks of unaligned symbols, orgaps, between them. A scoring schemeS associates costs for each aligned symbol pair and each gap. The alignment's score is the sum of the associated costs, and an optimal alignment is one of minimal score. There are a variety of schemes for scoring alignments. In a concave gap penalty scoring schemeS={δ, w}, a function δ(a, b) gives the score of each aligned pair of symbolsa andb, and aconcave function w(k) gives the score of a gap of lengthk. A function w is concave if and only if it has the property that, for allk > 1, w(k + 1) −w(k) ≤w(k) −w(k −1). In this paper we present an O(MP(logM + log2 P)) algorithm for approximate regular expression matching for an arbitraryδ and any concavew.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aggarwal, A., Klawe, M., Moran, S., Shor, P., and Wilber, R. Geometric Applications of a Matrix-Searching Algorithm.Algorithmica,2 (1987), 195–208.

    Google Scholar 

  2. Allen, F. E. Control Flow Analysis.SIGPLAN Notices,5 (1970), 1–19.

    Google Scholar 

  3. Eppstein, D. Sequence Comparison with Mixed Convex and Concave Costs.J. Algorithms,11 (1990), 85–101.

    Google Scholar 

  4. Eppstein, D., Galil, Z., Giancarlo, R., and Italiano, G. Sparse Dynamic Programming II: Convex and Concave Cost Functions.J. Assoc. Comput. Mach. 39(3) (1992), 546–567.

    Google Scholar 

  5. Galil, Z., and Giancarlo, R. Speeding Up Dynamic Programming with Applications to Molecular Biology.Theoret. Comput. Sci.,64 (1989), 107–118.

    Google Scholar 

  6. Galil, Z., and Park, K. A Linear-Time Algorithm for Concave One-Dimensional Dynamic Programming.Inform. Process. Lett.,33 (1989/90), 309–311.

    Google Scholar 

  7. Hecht, M. S., and Ullman, J. D. A. Simple Algorithm for Global Dataflow Analysis Programs.SIAM J. Comput.,4(4) (1975), 519–532.

    Google Scholar 

  8. Hirschberg, D. S., and Larmore, L. L. The Least Weight Subsequence Problem.SIAM J. Comput.,16(4) (1987), 628–638.

    Google Scholar 

  9. Hopcroft, J. E., and Ullman, J. D.Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, Reading, MA (1979), Chapter 2.

    Google Scholar 

  10. Klawe, M., and Kleitman, D. An Almost Linear Algorithm for Generalized Matrix Searching.SIAM J. Discrete Math.,3 (1990), 81–97.

    Google Scholar 

  11. Knuth, D.Sorting and Searching: The Art of Computer Programming, Vol. 3. Addison-Wesley, Reading, MA, 1973, pp. 463–468.

    Google Scholar 

  12. Miller, W., and Myers, E. W. Sequence Comparison with Concave Weighting Functions.Bull. Math. Biol.,50(2) (1988), 97–120.

    Google Scholar 

  13. Myers, E. W. Efficient Applicative Data Types.Proc. 11th Symp. on the Principles of Programming Languages, 1984, pp. 66–75.

  14. Myers, E. W., and Miller, W. Approximate Matching of Regular Expressions.Bull. Math. Biol.,51(1) (1989), 5–37.

    Google Scholar 

  15. Needleman, S. B., and Wunsch, C. D. A. General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins.J. Molecular Biol.,48 (1970), 443–453.

    Google Scholar 

  16. Sankoff, D. Matching Sequences Under Deletion/Insertion Constraints.Proc. Nat. Acad. Sci. U.S.A.,69 (1972), 4–6.

    Google Scholar 

  17. Sleator, D. D., and Tarjan, R. E. Self-Adjusting Binary Search Trees.J. Assoc. Comput. Mach.,32(3) (1985), 652–686.

    Google Scholar 

  18. Wagner, R. A., and Fischer, M. J. The String-to-String Correction Problem.J. Assoc. Comput. Mach.,21(1) (1974), 168–173.

    Google Scholar 

  19. Waterman, M. S. General Methods of Sequence Comparison.Bull. Math. Biol.,46 (1984), 473–501.

    Google Scholar 

  20. Wilber, R. The Concave Least-Weight Subsequence Problem Revisited.J. Algorithms,9 (1988), 418–425.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Communicated by C. K. Wong.

This work was supported in part by the National Institute of Health under Grant RO1 LM04960.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Knight, J.R., Myers, E.W. Approximate regular expression pattern matching with concave gap penalties. Algorithmica 14, 85–121 (1995). https://doi.org/10.1007/BF01300375

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01300375

Key words

Navigation