Skip to main content

Dictionary Matching with Uneven Gaps

  • Conference paper
  • First Online:
Combinatorial Pattern Matching (CPM 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9133))

Included in the following conference series:

Abstract

A gap-pattern is a sequence of sub-patterns separated by bounded sequences of don’t care characters (called gaps). A one-gap-pattern is a pattern of the form \(P[\alpha ,\beta ]Q\), where \(P\) and \(Q\) are strings drawn from alphabet \(\varSigma \) and \([\alpha , \beta ]\) are lower and upper bounds on the gap size \(g\). The gap size \(g\) is the number of don’t care characters between \(P\) and \(Q\). The dictionary matching problem with one-gap is to index a collection of one-gap-patterns, so as to identify all sub-strings of a query text \(T\) that match with any one-gap-pattern in the collection. Let \({\mathcal D}\) be such a collection of \(d\) patterns, where \({\mathcal D}=\{P_i[\alpha _i,\beta _i]Q_i\mid 1\le i \le d\}\). Let \(n=\sum _{i=1}^d|P_i|+|Q_i|\). Let \(\gamma \) and \(\lambda \) be two parameters defined on \({\mathcal D}\) as follows: \(\gamma = |\{j\mid j \in [\alpha _i,\beta _i], 1\le i\le d\}|\) and \(\lambda = |\{\alpha _i,\beta _i \mid 1\le i\le d\}|\). Specifically \(\gamma \) is the total number gap lengths possible over all patterns in \({\mathcal D}\) and \(\lambda \) is the number of distinct gap boundaries across all the patterns. We present a linear space solution (i.e., \(O(n)\) words) for answering a dictionary matching query on \({\mathcal D}\) in time \(O(|T| \gamma \log \lambda \log d+occ)\), where \(occ\) is the output size. The query time can be improved to \(O(|T|\gamma +occ)\) using \(O(n+d^{1+\epsilon })\) space, where \(\epsilon >0\) is an arbitrarily small constant. Additionally, we show a compact/succinct space index offering a space-time trade-off. In the special case where parameters \(\alpha _i\) and \(\beta _i\)’s for all the patterns are same, our results improve upon the work by Amir et al. [CPM, 2014]. We also explore several related cases where gaps can occur at arbitrary locations and where gap can be induced in the text rather than pattern.

This research is funded in part by US National Science Foundation (NSF) Grant CCF–1218904 and Taiwan MOST Grant 102-2221-E-007-068. Part of this work was done during Y. Yang’s visit at the University of Hong Kong. This paper is a merger of two independent similar works.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This is just for the ease ensuring some properties.

  2. 2.

    The lower bound \(\alpha _i\) is redundant in this case and is set to zero. Otherwise, we can always omit first \(\alpha _i\) characters from \(Q_i\) obtaining \(Q_i'\) and work with \(P_i[\beta _i-\alpha _i]Q_i'\).

References

  1. Afshani, P., Arge, L., Larsen, K.G.: Higher-dimensional orthogonal range reporting and rectangle stabbing in the pointer machine model. In: Symposuim on Computational Geometry 2012, SoCG 2012, Chapel Hill, NC, USA, pp. 323–332, 17–20 June 2012

    Google Scholar 

  2. Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975)

    Article  MATH  MathSciNet  Google Scholar 

  3. Amir, A., Farach, M.: Adaptive dictionary matching. In: 32nd Annual Symposium on Foundations of Computer Science, San Juan, Puerto Rico, pp. 760–766, 1–4 October 1991

    Google Scholar 

  4. Amir, A., Farach, M., Idury, R.M., Poutré, J.A.L., Schäffer, A.A.: Improved dynamic dictionary matching. Inf. Comput. 119(2), 258–282 (1995)

    Article  MATH  Google Scholar 

  5. Amir, A., Keselman, D., Landau, G.M., Lewenstein, M., Lewenstein, N., Rodeh, M.: Text indexing and dictionary matching with one error. J. Algorithms 37(2), 309–325 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  6. Amir, A., Levy, A., Porat, E., Shalom, B.R.: Dictionary matching with one gap. In: Kulikov, A.S., Kuznetsov, S.O., Pevzner, P. (eds.) CPM 2014. LNCS, vol. 8486, pp. 11–20. Springer, Heidelberg (2014)

    Google Scholar 

  7. Belazzougui, D.: Succinct dictionary matching with no slowdown. In: Amir, A., Parida, L. (eds.) CPM 2010. LNCS, vol. 6129, pp. 88–100. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  8. Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Commun. ACM 20(10), 762–772 (1977)

    Article  MATH  Google Scholar 

  9. Chan, T.M., Larsen, K.G., Patrascu, M.: Orthogonal range searching on the RAM, revisited. In: Proceedings of the 27th ACM Symposium on Computational Geometry, Paris, France, pp. 1–10, 13–15 June 2011

    Google Scholar 

  10. Chazelle, B.: Filtering search: a new approach to query-answering. SIAM J. Comput. 15(3), 703–724 (1986)

    Article  MATH  MathSciNet  Google Scholar 

  11. Cole, R., Gottlieb, L., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proceedings of the 36th Annual ACM Symposium on Theory of Computing, Chicago, IL, USA, pp. 91–100, 13–16 June 2004

    Google Scholar 

  12. Feigenblat, G., Porat, E., Shiftan, A.: An improved query time for succinct dynamic dictionary matching. In: Kulikov, A.S., Kuznetsov, S.O., Pevzner, P. (eds.) CPM 2014. LNCS, vol. 8486, pp. 120–129. Springer, Heidelberg (2014)

    Google Scholar 

  13. Ferragina, P., Manzini, G.: Indexing compressed text. J. ACM 52(4), 552–581 (2005)

    Article  MathSciNet  Google Scholar 

  14. Fredriksson, K., Grabowski, S.: Efficient algorithms for pattern matching with general gaps, character classes, and transposition invariance. Inf. Retr. 11(4), 335–357 (2008)

    Article  Google Scholar 

  15. Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput. 35(2), 378–407 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  16. Hofmann, K., Bucher, P., Falquet, L., Bairoch, A.: The PROSITE database, its status in 1999. Nucleic Acids Res. 27(1), 215–219 (1999)

    Article  Google Scholar 

  17. Hon, W., Ku, T., Shah, R., Thankachan, S.V., Vitter, J.S.: Compressed dictionary matching with one error. In: 2011 Data Compression Conference (DCC 2011), Snowbird, UT, USA, pp. 113–122, 29–31 March 2011

    Google Scholar 

  18. Hon, W., Ku, T., Shah, R., Thankachan, S.V., Vitter, J.S.: Faster compressed dictionary matching. Theor. Comput. Sci. 475, 113–119 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  19. Hon, W., Lam, T.W., Shah, R., Tam, S., Vitter, J.S.: Compressed index for dictionary matching. In: 2008 Data Compression Conference (DCC 2008), Snowbird, UT, USA, pp. 23–32, 25–27 March 2008

    Google Scholar 

  20. Hon, W.-K., Lam, T.-W., Shah, R., Tam, S.-L., Vitter, J.S.: Succinct index for dynamic dictionary matching. In: Dong, Y., Du, D.-Z., Ibarra, O. (eds.) ISAAC 2009. LNCS, vol. 5878, pp. 1034–1043. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  21. Hon, W.-K., Ku, T.-H., Lam, T.-W., Shah, R., Tam, S.-L., Thankachan, S.V., Vitter, J.S.: Compressing dictionary matching index via sparsification technique. Algorithmica 72(2), 515–538 (2015)

    Article  MathSciNet  Google Scholar 

  22. Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31(2), 249–260 (1987)

    Article  MATH  MathSciNet  Google Scholar 

  23. Karpinski, M., Nekrich, Y.: Space efficient multi-dimensional range reporting. In: Ngo, H.Q. (ed.) COCOON 2009. LNCS, vol. 5609, pp. 215–224. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  24. Knuth, D.E., Morris Jr., J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)

    Article  MATH  MathSciNet  Google Scholar 

  25. Lewenstein, M.: Dictionary matching. In: Kao, M.-Y. (ed.) Encyclopedia of Algorithms, pp. 1–6. Springer, US (2015)

    Google Scholar 

  26. Manber, U., Myers, E.W.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  27. Mehldau, G., Myers, G.: A system for pattern matching applications on biosequences. Comput. Appl. Biosci. 9(3), 299–314 (1993)

    Google Scholar 

  28. Navarro, G., Raffinot, M.: Fast and simple character classes and bounded gaps pattern matching, with applications to protein searching. J. Comput. Biol. 10(6), 903–923 (2003)

    Article  Google Scholar 

  29. Sleator, D.D., Tarjan, R.E.: A data structure for dynamic trees. J. Comput. Syst. Sci. 26(3), 362–391 (1983)

    Article  MATH  MathSciNet  Google Scholar 

  30. Weiner, P.: Linear pattern matching algorithms. In: 14th Annual Symposium on Switching and Automata Theory, Iowa City, Iowa, USA, pp. 1–11, 15–17 October 1973

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rahul Shah .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Hon, WK., Lam, TW., Shah, R., Thankachan, S.V., Ting, HF., Yang, Y. (2015). Dictionary Matching with Uneven Gaps. In: Cicalese, F., Porat, E., Vaccaro, U. (eds) Combinatorial Pattern Matching. CPM 2015. Lecture Notes in Computer Science(), vol 9133. Springer, Cham. https://doi.org/10.1007/978-3-319-19929-0_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-19929-0_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-19928-3

  • Online ISBN: 978-3-319-19929-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics