Abstract
Given a sequenceA of lengthM and a regular expressionR of lengthP, an approximate regular expression pattern-matching algorithm computes the score of the optimal alignment betweenA and one of the sequencesB exactly matched byR. An alignment between sequencesA=a1a2 ... aM andB=b1b2... bN is a list of ordered pairs, 〈(i1,j1), (i2j2), ..., (it,jtt)〉 such that ik < ik+1 and jk < jk+1. In this case the alignmentaligns symbols aik and bjk, and leaves blocks of unaligned symbols, orgaps, between them. A scoring schemeS associates costs for each aligned symbol pair and each gap. The alignment's score is the sum of the associated costs, and an optimal alignment is one of minimal score. There are a variety of schemes for scoring alignments. In a concave gap penalty scoring schemeS={δ, w}, a function δ(a, b) gives the score of each aligned pair of symbolsa andb, and aconcave function w(k) gives the score of a gap of lengthk. A function w is concave if and only if it has the property that, for allk > 1, w(k + 1) −w(k) ≤w(k) −w(k −1). In this paper we present an O(MP(logM + log2 P)) algorithm for approximate regular expression matching for an arbitraryδ and any concavew.
Similar content being viewed by others
References
Aggarwal, A., Klawe, M., Moran, S., Shor, P., and Wilber, R. Geometric Applications of a Matrix-Searching Algorithm.Algorithmica,2 (1987), 195–208.
Allen, F. E. Control Flow Analysis.SIGPLAN Notices,5 (1970), 1–19.
Eppstein, D. Sequence Comparison with Mixed Convex and Concave Costs.J. Algorithms,11 (1990), 85–101.
Eppstein, D., Galil, Z., Giancarlo, R., and Italiano, G. Sparse Dynamic Programming II: Convex and Concave Cost Functions.J. Assoc. Comput. Mach. 39(3) (1992), 546–567.
Galil, Z., and Giancarlo, R. Speeding Up Dynamic Programming with Applications to Molecular Biology.Theoret. Comput. Sci.,64 (1989), 107–118.
Galil, Z., and Park, K. A Linear-Time Algorithm for Concave One-Dimensional Dynamic Programming.Inform. Process. Lett.,33 (1989/90), 309–311.
Hecht, M. S., and Ullman, J. D. A. Simple Algorithm for Global Dataflow Analysis Programs.SIAM J. Comput.,4(4) (1975), 519–532.
Hirschberg, D. S., and Larmore, L. L. The Least Weight Subsequence Problem.SIAM J. Comput.,16(4) (1987), 628–638.
Hopcroft, J. E., and Ullman, J. D.Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, Reading, MA (1979), Chapter 2.
Klawe, M., and Kleitman, D. An Almost Linear Algorithm for Generalized Matrix Searching.SIAM J. Discrete Math.,3 (1990), 81–97.
Knuth, D.Sorting and Searching: The Art of Computer Programming, Vol. 3. Addison-Wesley, Reading, MA, 1973, pp. 463–468.
Miller, W., and Myers, E. W. Sequence Comparison with Concave Weighting Functions.Bull. Math. Biol.,50(2) (1988), 97–120.
Myers, E. W. Efficient Applicative Data Types.Proc. 11th Symp. on the Principles of Programming Languages, 1984, pp. 66–75.
Myers, E. W., and Miller, W. Approximate Matching of Regular Expressions.Bull. Math. Biol.,51(1) (1989), 5–37.
Needleman, S. B., and Wunsch, C. D. A. General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins.J. Molecular Biol.,48 (1970), 443–453.
Sankoff, D. Matching Sequences Under Deletion/Insertion Constraints.Proc. Nat. Acad. Sci. U.S.A.,69 (1972), 4–6.
Sleator, D. D., and Tarjan, R. E. Self-Adjusting Binary Search Trees.J. Assoc. Comput. Mach.,32(3) (1985), 652–686.
Wagner, R. A., and Fischer, M. J. The String-to-String Correction Problem.J. Assoc. Comput. Mach.,21(1) (1974), 168–173.
Waterman, M. S. General Methods of Sequence Comparison.Bull. Math. Biol.,46 (1984), 473–501.
Wilber, R. The Concave Least-Weight Subsequence Problem Revisited.J. Algorithms,9 (1988), 418–425.
Author information
Authors and Affiliations
Additional information
Communicated by C. K. Wong.
This work was supported in part by the National Institute of Health under Grant RO1 LM04960.
Rights and permissions
About this article
Cite this article
Knight, J.R., Myers, E.W. Approximate regular expression pattern matching with concave gap penalties. Algorithmica 14, 85–121 (1995). https://doi.org/10.1007/BF01300375
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF01300375