Abstract
Given two words, text T of length n and episode P of length m, the episode matching problem is to find all minimal length substrings of text T that contain episode P as a subsequence. The respective optimization problem is to find the smallest number w, s.t. text T has a subword of length w which contains episode P.
In this paper, we introduce a few efficient off-line as well as on-line algorithms for the entire problem, where by on-line algorithms we mean algorithms which search from left to right consecutive text symbols only once. We present two alphabet independent algorithms which work in time O(nm). The off-line algorithm operates in O(1) additional space while the on-line algorithm pays for its property with O(m) additional space. Two other on-line algorithms have subquadratic time complexity. One of them works in time O(nm/log m) and O(m) additional space. The other one gives a time/space trade-off, i.e., it works in time O(n+s+nm log log s/log(s/m)) when additional space is limited to O(s). Finally, we present two approximation algorithms for the optimization problem. The off-line algorithm is alphabet independent, it has superlinear time complexity O(n/∈+nloglog(n/m)) and it uses only constant space. The on-line algorithm works in time O(n/∈+n) and uses space O(m). Both approximation algorithms achieve 1+∈ approximation ratio, for any ∈>0.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
A. V. Aho, J. E. Hopcroft and J. D. Ullman: The Design and Analysis of Computer Algorithms. Addison-Wesley, 1974.
Z. Galil and K. Park: An improved algorithm for approximate string matching. SIAM J. Comp., 19(6) (Dec. 1990), 989–999.
G. M. Landau and U. Vishkin: Fast parallel and serial approximate string matching. J. Algorithms, 10(2) (June 1989), 157–169.
J. H. van Lint and R. M. Wilson: A Course in Combinatorics. Cambridge University Press, 1992.
H. Mannila and H. Toivonen: Discovering frequent episodes in sequences. Proc. 2nd International Conference on Knowledge Discovery and Data Mining (KDD'96), 146–151. AAAI Press 1996.
H. Mannila, H. Toivonen and A. I. Verkamo: Discovering frequent episodes in sequences. Proc. 1st International Conference on Knowledge Discovery and Data Mining (KDD'95), 210–215. AAAI Press 1995.
W. J. Masek and M. S. Paterson: A faster algorithm for computing string edit distances. J. Comput. System Sci., 20 (1980), 18–31.
S. B. Needleman and C. D. Wunsch: A general method applicable to the search for similarities in the amino acid sequences of two proteins. J. Molecular Biol. 48 (1970), 443–453.
P. H. Sellers: The theory and computation of evolutionary distances: pattern recognition. J. Algorithms, 1(4) (Dec. 1980), 359–373.
H. Toivonen: Discovery of Frequent Patterns in Large Data Collections. Ph.D. Thesis, Report A-1996-5, Department of Computer Science, University of Helsinki, 1996.
E. Ukkonen: Finding approximate patterns in strings. J. Algorithms, 6(1) (May 1985), 132–137.
S. Wu, U. Manber: Agrep — a fast approximate pattern-matching tool. Proc. Usenix Winter 1992 Technical Conference, 153–162. Jan. 1992.
S. Wu, U. Manber and G. Myers: A subquadratic algorithm for approximate limited expression matching. Algorithmica, 15(1) (Jan. 1996), 50–67.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Das, G., Fleischer, R., Gasieniec, L., Gunopulos, D., Kärkkäinen, J. (1997). Episode matching. In: Apostolico, A., Hein, J. (eds) Combinatorial Pattern Matching. CPM 1997. Lecture Notes in Computer Science, vol 1264. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63220-4_46
Download citation
DOI: https://doi.org/10.1007/3-540-63220-4_46
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63220-7
Online ISBN: 978-3-540-69214-0
eBook Packages: Springer Book Archive