Abstract
Two filtration schemes are presented for approximate string matching with k differences. In our approach q-samples, which are non-overlapping q-grams, are drawn from the text, and a text area is checked with dynamic programming, if there are enough exact or slightly distorted q-grams of the pattern in the right order in a short sequence of the q-samples. The filtration schemes are applied to searching both in the text itself and in a q-gram index of the text. The results of preliminary experiments support the applicability of the new methods.
The work was supported by the Academy of Finland.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
V. Arlazarov, E. Dinic, M. Kronrod, and I. Faradzev: On economical construction of the transitive closure of a directed graph. Dokl. Akad. Nauk SSSR 194 (1970), 487–488 (in Russian). English translation in Soviet Math. Dokl. 11 5, 1209–1210.
R. Baeza-Yates and G. Gonnet: A new approach to text searching. Communications of ACM 35, 10 (1992), 74–82.
R. Baeza-Yates, G. Gonnet, and M. Régnier: Analysis of Boyer-Moore-type string searching algorithms. In: Proc. First ACM-SIAM Symposium on Discrete Algorithms, 1990, 328–343.
W. Chang and E. Lawler: Sublinear approximate string matching and biological applications. Algorithmica 12, 4–5 (1994), 327–344.
W. Chang and T. Marr: Approximate string matching and local similarity. In: Combinatorial Pattern Matching, Proceedings of 5th Annual Symposium (ed. M. Crochemore and D. Gusfield), Lecture Notes in Computer Science 807, Springer-Verlag, Berlin, 1994, 259–273.
A. Cobbs: Fast approximate matching using suffix trees. In: Combinatorial Pattern Matching, Proceedings of 5th Annual Symposium (ed. Z. Galil and E. Ukkonen), Lecture Notes in Computer Science 937, Springer, Berlin, 1995, 41–54.
N. Holsti and E. Sutinen: Approximate string matching using q-gram places. Proc. Seventh Finnish Symposium on Computer Science (ed. M. Penttonen), University of Joensuu, 1994, 23–32.
R. Grossi and F. Luccio: Simple and efficient string matching with k mismatches. Information Processing Letters 33 (1989), 113–120.
P. Jokinen, J. Tarhio, and E. Ukkonen: A comparison of approximate string matching algorithms. To appear in Software — Practice and Experience.
P. Jokinen and E. Ukkonen: Two algorithms for approximate string matching in static texts. In: Proceedings of Mathematical Foundations of Computer Science 1991 (ed. A. Tarlecki), Lecture Notes in Computer Science 520, Springer-Verlag, Berlin, 1991, 240–248.
G. Landau and U. Vishkin: Fast string matching with k differences. Journal of Computer and System Sciences 37 (1988), 63–78.
E. Myers: A sublinear algorithm for approximate keyword searching. Algorithmica 12, 4–5 (1994), 345–374.
P. Pevzner and M. Waterman: Multiple filtration and approximate pattern matching. Algorithmica 13 (1995), 135–154.
E. Sutinen and J. Tarhio: On using q-gram locations in approximate string matching. In: Proc. 3rd Annual European Symposium on Algorithms ESA '95 (ed. P. Spirakis), Lecture Notes in Computer Science 979, Springer, Berlin, 1995, 327–340.
T. Takaoka: Approximate pattern matching with samples. Proceedings of ISAAC '94, Lecture Notes in Computer Science 834, Springer-Verlag, Berlin, 1994, 234–242.
J. Tarhio and E. Ukkonen: Approximate Boyer-Moore string matching. SIAM Journal on Computing 22, 2 (1993), 243–260.
E. Ukkonen: Approximate string-matching over suffix trees. In: Combinatorial Pattern Matching, Proceedings of 4th Annual Symposium (ed. A. Apostolico et al.), Lecture Notes in Computer Science 684, Springer-Verlag, Berlin, 1993, 228–243.
E. Ukkonen: Approximate string matching with q-grams and maximal matches. Theoretical Computer Science 92, 1 (1992), 191–211.
E. Ukkonen: Finding approximate patterns in strings. Journal of Algorithms 6 (1985), 132–137.
I. Witten, A. Moffat, and T. Bell: Managing Gigabytes, Van Nostrand Reinhold, New York, 1994.
S. Wu: Approximate pattern matching and its applications. Ph.D. Thesis, Report TR 92-21, Department of Computer Science, University of Arizona, 1992.
S. Wu and U. Manber: Fast text searching allowing errors. Communications of ACM 35, 10 (1992), 83–91.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sutinen, E., Tarhio, J. (1996). Filtration with q-samples in approximate string matching. In: Hirschberg, D., Myers, G. (eds) Combinatorial Pattern Matching. CPM 1996. Lecture Notes in Computer Science, vol 1075. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-61258-0_4
Download citation
DOI: https://doi.org/10.1007/3-540-61258-0_4
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61258-2
Online ISBN: 978-3-540-68390-2
eBook Packages: Springer Book Archive