Abstract
In this paper we present an efficient subquadratic-time algorithm for matching strings and limited expressions in large texts. Limited expressions are a subset of regular expressions that appear often in practice. The generalization from simple strings to limited expressions has a negligible affect on the speed of our algorithm, yet allows much more flexibility. Our algorithm is similar in spirit to that of Masek and Paterson [MP], but it is much faster in practice. Our experiments show a factor of four to five speedup against the algorithms of Sellers [Se] and Ukkonen [Uk1] independent of the sizes of the input strings. Experiments also reveal our algorithm to be faster, in most cases, than a recent improvement by Chang and Lampe [CL2], especially for small alphabet sizes for which it is two to three times faster.
Similar content being viewed by others
References
K. Abrahamson, Generalized string matching,SIAM J. Comput.,16 (1987), 1039–1051.
A. V. Aho and M. J. Corasick, Efficient string matching: an aid to bibliographic search,Comm. ACM,18 (1975), 333–340.
V. L. Arlazarov, E. A. Dinic, M. A. Kronrod, and I. A. Faradzev, On economic construction of the transitive closure of a directed graph,Dokl. Akad. Nauk SSSR,194 (1970), 487–488 (in Russian). English translation inSoviet Math. Dokl.,11 (1975), 1209–1210.
R. A. Baeza-Yates and G. H. Gonnet, A new approach to text searching,Comm. ACM,35 (1992), 74–82.
R. S. Boyer and J. S. Moore, A fast string searching algorithm,Comm. ACM,20 (1977), 762–772.
W. I. Chang and E. L. Lawler, Approximate string matching in sublinear expected time,Proc. 31st Symp. on Foundations of Computer Science, 1990, pp. 116–124.
W. I. Chang and J. Lampe, Theoretical and empirical comparisons of approximate string matching algorithms,Proc. 3rd Symp. on Combinatorial Pattern Matching, Tucson, AZ, April 1992, pp. 172–181.
B. Commentz-Walter, A string matching algorithm fast on the average,Proc. 6th Internal. Colloq. on Automata, Languages, and Programming, 1979, pp. 118–132.
M. Fischev and M. Paterson, String matching and other products,Proc. 7th SIAM-AMS Symp. on Complexity of Computation, 1974, pp. 113–125.
Z. Galil and K. Park, An improved algorithm for approximate string matching,SIAM J. Comput.,19 (1990), 989–999.
D. E. Knuth, J. H. Morris, and V. R. Pratt, Fast pattern matching in strings,SIAM J. Comput.,6 (1977), 323–350.
G. M. Landau and U. Vishkin, Fast string matching withk differences,J. Comput. System Sci.,37 (1988), 63–78.
W. J. Masek and M. S. Paterson, A faster algorithm for computing string edit distances,J. Comput. System Sci.,20 (1980), 18–31.
E. W. Myers, Incremental Alignment Algorithms and their Applications, Technical Report 86-22, Department of Computer Science, University of Arizona, 1986.
E. W. Myers, A Sublinear Algorithm for Approximate Keywords Searching, Technical Report TR-90-25, Department of Computer Science, University of Arizona, 1990. Also inAlgorithmica,12 (1994), 345–374.
E. W. Myers, A four-Russians algorithm for regular expression pattern matching,J. Assoc. Comput. Mach.,39 (1992), 430–448.
R. Pinter, Efficient string matching with don't-care patterns, inCombinatorial Algorithms on Words (A. Apostilico and Z. Galil, eds.), NATO ASI Series, Vol. F12, Springer-Verlag, New York, 1985, 11–29.
P. H. Sellers, The theory and computations of evolutionary distances: pattern recognition,J. Algorithms,1 (1980), 359–373.
J. Tarhio and E. Ukkonen, Approximate Boyer-Moore string matching,SIAM J. Comput.,22(2) (1993), 243–260.
E. Ukkonen, Finding approximate patterns in strings,J. Algorithms,6 (1985), 132–137.
E. Ukkonen, Approximate string-matching withq-grams and maximal matches,Theoret. Comput. Sci.,92(1992), 191–211.
E. Ukkonen and D. Wood, Approximate string matching with suffix automata,Algorithmica,10 (1993), 353–364.
R. A. Wagner and M. J. Fisher, The string to string correction problem,J. Assoc. Comput. Mach.,21 (1974), 168–173.
S. Wu and U. Manber, Agrep-a fast approximate pattern-matching tool,Proc. Usenix Winter 1992Technical Conference, San Francisco, January 1992, pp. 153–162.
S. Wu and U. Manber, Fast text searching allowing errors,Comm. ACM,35 (1992), 83–91.
S. Wu, U. Manber, and E. W. Myers, A sub-quadratic algorithm for approximate regular expression matching,J. of Algorithms, to appear.
Author information
Authors and Affiliations
Additional information
Communicated by C. K. Wong.
The research of U. Manber was supported in part by a Presidential Young Investigator Award DCR-8451397, with matching funds from AT&T, and by NSF Grant CCR-9001619. G. Myers research was supported in part by NIH Grant LM04960, NSF Grant CCR-9001619, and the Aspen Center for Physics.
Rights and permissions
About this article
Cite this article
Wu, S., Manber, U. & Myers, G. A subquadratic algorithm for approximate limited expression matching. Algorithmica 15, 50–67 (1996). https://doi.org/10.1007/BF01942606
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF01942606