ABSTRACT
Weighted sequences are widely used in various areas including information retrieval, bioinformatics, and music analysis. In this paper we show how to apply the Boyer-Moore algorithm [2] for weighted sequences. As our algorithm is based on the Boyer-Moore algorithm, it works well when the size of alphabet is huge which is the case in information retrieval. Also, we show how to handle the case with small alphabets by considering two or more characters at once.
- R. A. Baeza-Yates and G. H. Gonnet. A new approach to text searching. CACM, 35(10):74--82, 1992. Google ScholarDigital Library
- R. Boyer and S. Moore. A fast string search algorithm. CACM, 20(10):762--722, 1977. Google ScholarDigital Library
- R. Clifford and B. Sach. Pattern matching in pseudo real-time. J. Discrete Algorithms, 9(1): 67--81 (2011) Google ScholarDigital Library
- V. Freschi and A. Bogliolo. Using sequence compression to speedup probabilistic profile matching. Bioinformatics 15;21(10):2225--2229, 2005. Google ScholarDigital Library
- K. Fredriksson. Faster string matching with Super-alphabets. In Proc. of SPIRE 2002, pages 207--214, 2002. Google ScholarDigital Library
- C. Pizzi, P. Rastas, and E. Ukkonen. Fast search algorithms for position specific scoring matrices. In Proc. of BIRD 2007, pages 239--250, 2007. Google ScholarDigital Library
- S. Rajasekaran, X. Jin, and J. L. Spouge. The efficient computation of position-specific match scores with the Fast Fourier Transform. J. Computational Biology, 9(1). pages 23--33, 2002.Google Scholar
- D. M. Sunday. A very fast substring search algorithm. CACM 33(8):132--142. 1990. Google ScholarDigital Library
Index Terms
- A simple pattern matching algorithm for weighted sequences
Recommendations
Property matching and weighted matching
In many pattern matching applications the text has some properties attached to its various parts. Pattern Matching with Properties (Property Matching, for short), involves a string matching between the pattern and the text, and the requirement that the ...
Palindrome pattern matching
A palindrome is a string that reads the same forward and backward. For a string x, let Pals(x) be the set of all maximal palindromes of x, where each maximal palindrome in Pals(x) is encoded by a pair (c,r) of its center c and its radius r. Given a text ...
On the average-case complexity of pattern matching with wildcards
AbstractPattern matching with wildcards is a string matching problem with the goal of finding all factors of a text t of length n that match a pattern x of length m, where wildcards (characters that match everything) may be present. In this ...
Comments