The wide window string matching algorithm

https://doi.org/10.1016/j.tcs.2004.12.002Get rights and content
Under an Elsevier user license
open archive

Abstract

Generally, current string matching algorithms make use of a window whose size is equal to pattern length. In this paper, we present a novel string matching algorithm named WW (for Wide Window) algorithm, which divides the text into n/m overlapping windows of size 2m-1. In the windows, the algorithm attempts m possible occurrence positions in parallel. It firstly searches pattern suffixes from middle to right with a forward suffix automaton, shifts the window directly when it fails, otherwise, scans the corresponding prefixes backward with a reverse prefix automaton. Theoretical analysis shows that WW has optimal time complexity of O(n) in the worst, O(n/m) best and O(n(logσm)/m) for average case. Experimental comparison of WW with existing algorithms validates our theoretical claims for searching long patterns. It further reveals that WW is also efficient for searching short patterns.

Keywords

Suffix automaton
Reverse prefix automaton
Bit parallelism
Wide window algorithm
String matching

Cited by (0)