Skip to main content
Log in

Dictionary Matching with a Bounded Gap in Pattern or in Text

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

A gap is a sequence of don’t care characters. In this paper, we study two variants of the dictionary matching problem, where gaps may be present in the patterns or in the text. The first variant, called dictionary matching with one gap, considers indexing a collection \({\mathcal D}\) of d one-gap-patterns, where the ith pattern is of the form \(P_i[\alpha _i,\beta _i]Q_i\) with \(P_i\) and \(Q_i\) are strings drawn from an alphabet \(\varSigma \) and \([\alpha _i, \beta _i]\) denote the lower and upper bounds on the gap length. The target is to allow a user to efficiently identify all substrings of a query text T that match with any one-gap-pattern in the collection. We present a linear space solution for answering the above dictionary matching query in time \(O(|T| \gamma \log \lambda \log d+\mathsf {occ})\), where \(\gamma \) denotes the number of distinct gap lengths, \(\lambda \) denotes the number of distinct lower and upper bounds of gap lengths, and the \(\mathsf {occ}\) is the output size. The query time can be improved to \(O(|T|\gamma +\mathsf {occ})\) using \(O(d^{1+\epsilon })\) extra space, where \(\epsilon >0\) is an arbitrarily small constant. Additionally, we show a succinct-space index offering a space–time tradeoff. In the special case where parameters \(\alpha _i\) and \(\beta _i\)’s for all the patterns are same, our results improve upon the work by Amir et al. (Proceedings of annual symposium on combinatorial pattern matching (CPM), 2014, Theor Comput Sci 589:34–46, 2015). The second variant, called dictionary matching with one missing substring, is a new problem in which a gap of bounded length may be present in the text substring when it is being matched. We show that this problem can be solved by using a similar framework. Furthermore, by applying a centroid path decomposition on the failure tree, we obtain a space–time tradeoff result, which will be suitable when the dictionary contains only short patterns, or when index space is a critical concern.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. This is just to facilitate the guaranteeing of some properties.

  2. For a rectangle with left-bottom-lower corner point \((x_\ell , y_\ell , z_\ell )\) and right-top-upper corner point \((x_r, y_r, z_r)\), we may associate it with a 6-dimensional point \((x_\ell , y_\ell , z_\ell , x_r, y_r, z_r)\). Then, this rectangle is stabbed by a point (xyz) if and only if its associated 6-dimensional point in the range \([x_\ell , \infty ] \times [y_\ell , \infty ] \times [z_\ell , \infty ] \times [-\infty , x_r] \times [-\infty , y_r] \times [-\infty , z_r]\).

References

  1. Afshani, P., Arge, L., Larsen, K.G.: Higher-dimensional orthogonal range reporting and rectangle stabbing in the pointer machine model. In: Proceedings of ACM Symposuim on Computational Geometry (SoCG), pp. 323–332 (2012)

  2. Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  3. Amir, A., Farach, M.: Adaptive dictionary matching. In: Proceedings of IEEE Annual Symposium on Foundations of Computer Science (FOCS), pp. 760–766 (1991)

  4. Amir, A., Farach, M., Idury, R.M., Poutré, J.A.L., Schäffer, A.A.: Improved dynamic dictionary matching. Inf. Comput. 119(2), 258–282 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  5. Amir, A., Keselman, D., Landau, G.M., Lewenstein, M., Lewenstein, N., Rodeh, M.: Text indexing and dictionary matching with one error. J. Algorithms 37(2), 309–325 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  6. Amir, A., Levy, A., Porat, E., Shalom, B.R.: Dictionary matching with one gap. In: Proceedings of Annual Symposium on Combinatorial Pattern Matching (CPM), pp. 11–20 (2014)

  7. Amir, A., Levy, A., Porat, E., Shalom, B.R.: Dictionary matching with a few gaps. Theor. Comput. Sci. 589, 34–46 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  8. Belazzougui, D.: Succinct dictionary matching with no slowdown. In: Proceedings of Annual Symposium on Combinatorial Pattern Matching (CPM), pp. 88–100 (2010)

  9. Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Commun. ACM 20(10), 762–772 (1977)

    Article  MATH  Google Scholar 

  10. Chan, T.M., Larsen, K.G., Patrascu, M.: Orthogonal range searching on the RAM, revisited. In: Proceedings of ACM Symposium on Computational Geometry (SoCG), pp. 1–10 (2011)

  11. Chazelle, B.: Filtering search: a new approach to query-answering. SIAM J. Comput. 15(3), 703–724 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  12. Cole, R., Gottlieb, L., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proceedings of Annual ACM Symposium on Theory of Computing, pp. 91–100 (2004)

  13. Feigenblat, G., Porat, E., Shiftan, A.: An improved query time for succinct dynamic dictionary matching. In: Proceedings of Annual Symposium on Combinatorial Pattern (CPM), pp. 120–129 (2014)

  14. Ferragina, P., Manzini, G.: Indexing compressed text. J. ACM 52(4), 552–581 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  15. Fredriksson, K., Grabowski, S.: Efficient algorithms for pattern matching with general gaps, character classes, and transposition invariance. Inf. Retr. 11(4), 335–357 (2008)

    Article  Google Scholar 

  16. Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput. 35(2), 378–407 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  17. Haapasalo, T., Silvasti, P., Sippu, S., Soisalon-Soininen, E.: Online dictionary matching with variable-length gaps. In: Proceedings of Symposium on Experimental Algorithms (SEA), pp. 76–87 (2011)

  18. Hofmann, K., Bucher, P., Falquet, L., Bairoch, A.: The PROSITE database, its status in 1999. Nucleic Acids Res. 27(1), 215–219 (1999)

    Article  Google Scholar 

  19. Hon, W.K., Ku, T.H., Lam, T.W., Shah, R., Tam, S.L., Thankachan, S.V., Vitter, J.S.: Compressing dictionary matching index via sparsification technique. Algorithmica 72(2), 515–538 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  20. Hon, W.K., Ku, T.H., Shah, R., Thankachan, S.V., Vitter, J.S.: Compressed dictionary matching with one error. In: Proceedings of IEEE Data Compression Conference (DCC), pp. 113–122 (2011)

  21. Hon, W.K., Ku, T.H., Shah, R., Thankachan, S.V., Vitter, J.S.: Faster compressed dictionary matching. Theor. Comput. Sci. 475, 113–119 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  22. Hon, W.K., Lam, T.W., Shah, R., Tam, S.L., Vitter, J.S.: Compressed index for dictionary matching. In: Proceedings of IEEE Data Compression Conference (DCC), pp. 23–32 (2008)

  23. Hon, W.K., Lam, T.W., Shah, R., Tam, S.L., Vitter, J.S.: Succinct index for dynamic dictionary matching. In: Proceedings of International Symposium on Algorithms and Computation (ISAAC), pp. 1034–1043 (2009)

  24. Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31(2), 249–260 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  25. Karpinski, M., Nekrich, Y.: Space efficient multi-dimensional range reporting. In: Proceedings of Annual International Conference on Computing and Combinatorics (COCOON), pp. 215–224 (2009)

  26. Knuth, D.E., Morris Jr., J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  27. Kucherov, G., Rusinowitch, M.: Matching a set of strings with variable length don’t cares. Theor. Comput. Sci. 178, 129–154 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  28. Lewenstein, M.: Dictionary matching. In: Encyclopedia of Algorithms, pp. 533–538 (2016)

  29. Manber, U., Myers, E.W.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  30. Mehldau, G., Myers, G.: A system for pattern matching applications on biosequences. Comput. Appl. Biosci. 9(3), 299–314 (1993)

    Google Scholar 

  31. Navarro, G., Raffinot, M.: Fast and simple character classes and bounded gaps pattern matching, with applications to protein searching. J. Comput. Biol. 10(6), 903–923 (2003)

    Article  Google Scholar 

  32. Sleator, D.D., Tarjan, R.E.: A data structure for dynamic trees. J. Comput. Syst. Sci. 26(3), 362–391 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  33. Weiner, P.: Linear pattern matching algorithms. In: Proceedings of Annual Symposium on Switching and Automata, pp. 1–11 (1973)

  34. Zhang, M., Zhang, Y., Hu, L.: A faster algorithm for matching a set of patterns with variable length don’t cares. Inf. Process. Lett. 110(6), 216–220 (2010)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wing-Kai Hon.

Additional information

A preliminary version of the result appeared in the 26th Annual Symposium on Combinatorial Pattern Matching (CPM’15). This research is funded in part by US NSF Grant CCF–1218904, Hong Kong ITF Grant 260900235, and Taiwan MOST Grant 102-2221-E-007-068. Part of this work was done during Y. Yang’s visit at University of Hong Kong.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hon, WK., Lam, TW., Shah, R. et al. Dictionary Matching with a Bounded Gap in Pattern or in Text. Algorithmica 80, 698–713 (2018). https://doi.org/10.1007/s00453-017-0288-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-017-0288-2

Keywords

Navigation