Abstract
A gap-pattern is a sequence of sub-patterns separated by bounded sequences of don’t care characters (called gaps). A one-gap-pattern is a pattern of the form \(P[\alpha ,\beta ]Q\), where \(P\) and \(Q\) are strings drawn from alphabet \(\varSigma \) and \([\alpha , \beta ]\) are lower and upper bounds on the gap size \(g\). The gap size \(g\) is the number of don’t care characters between \(P\) and \(Q\). The dictionary matching problem with one-gap is to index a collection of one-gap-patterns, so as to identify all sub-strings of a query text \(T\) that match with any one-gap-pattern in the collection. Let \({\mathcal D}\) be such a collection of \(d\) patterns, where \({\mathcal D}=\{P_i[\alpha _i,\beta _i]Q_i\mid 1\le i \le d\}\). Let \(n=\sum _{i=1}^d|P_i|+|Q_i|\). Let \(\gamma \) and \(\lambda \) be two parameters defined on \({\mathcal D}\) as follows: \(\gamma = |\{j\mid j \in [\alpha _i,\beta _i], 1\le i\le d\}|\) and \(\lambda = |\{\alpha _i,\beta _i \mid 1\le i\le d\}|\). Specifically \(\gamma \) is the total number gap lengths possible over all patterns in \({\mathcal D}\) and \(\lambda \) is the number of distinct gap boundaries across all the patterns. We present a linear space solution (i.e., \(O(n)\) words) for answering a dictionary matching query on \({\mathcal D}\) in time \(O(|T| \gamma \log \lambda \log d+occ)\), where \(occ\) is the output size. The query time can be improved to \(O(|T|\gamma +occ)\) using \(O(n+d^{1+\epsilon })\) space, where \(\epsilon >0\) is an arbitrarily small constant. Additionally, we show a compact/succinct space index offering a space-time trade-off. In the special case where parameters \(\alpha _i\) and \(\beta _i\)’s for all the patterns are same, our results improve upon the work by Amir et al. [CPM, 2014]. We also explore several related cases where gaps can occur at arbitrary locations and where gap can be induced in the text rather than pattern.
This research is funded in part by US National Science Foundation (NSF) Grant CCF–1218904 and Taiwan MOST Grant 102-2221-E-007-068. Part of this work was done during Y. Yang’s visit at the University of Hong Kong. This paper is a merger of two independent similar works.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
This is just for the ease ensuring some properties.
- 2.
The lower bound \(\alpha _i\) is redundant in this case and is set to zero. Otherwise, we can always omit first \(\alpha _i\) characters from \(Q_i\) obtaining \(Q_i'\) and work with \(P_i[\beta _i-\alpha _i]Q_i'\).
References
Afshani, P., Arge, L., Larsen, K.G.: Higher-dimensional orthogonal range reporting and rectangle stabbing in the pointer machine model. In: Symposuim on Computational Geometry 2012, SoCG 2012, Chapel Hill, NC, USA, pp. 323–332, 17–20 June 2012
Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975)
Amir, A., Farach, M.: Adaptive dictionary matching. In: 32nd Annual Symposium on Foundations of Computer Science, San Juan, Puerto Rico, pp. 760–766, 1–4 October 1991
Amir, A., Farach, M., Idury, R.M., Poutré, J.A.L., Schäffer, A.A.: Improved dynamic dictionary matching. Inf. Comput. 119(2), 258–282 (1995)
Amir, A., Keselman, D., Landau, G.M., Lewenstein, M., Lewenstein, N., Rodeh, M.: Text indexing and dictionary matching with one error. J. Algorithms 37(2), 309–325 (2000)
Amir, A., Levy, A., Porat, E., Shalom, B.R.: Dictionary matching with one gap. In: Kulikov, A.S., Kuznetsov, S.O., Pevzner, P. (eds.) CPM 2014. LNCS, vol. 8486, pp. 11–20. Springer, Heidelberg (2014)
Belazzougui, D.: Succinct dictionary matching with no slowdown. In: Amir, A., Parida, L. (eds.) CPM 2010. LNCS, vol. 6129, pp. 88–100. Springer, Heidelberg (2010)
Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Commun. ACM 20(10), 762–772 (1977)
Chan, T.M., Larsen, K.G., Patrascu, M.: Orthogonal range searching on the RAM, revisited. In: Proceedings of the 27th ACM Symposium on Computational Geometry, Paris, France, pp. 1–10, 13–15 June 2011
Chazelle, B.: Filtering search: a new approach to query-answering. SIAM J. Comput. 15(3), 703–724 (1986)
Cole, R., Gottlieb, L., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proceedings of the 36th Annual ACM Symposium on Theory of Computing, Chicago, IL, USA, pp. 91–100, 13–16 June 2004
Feigenblat, G., Porat, E., Shiftan, A.: An improved query time for succinct dynamic dictionary matching. In: Kulikov, A.S., Kuznetsov, S.O., Pevzner, P. (eds.) CPM 2014. LNCS, vol. 8486, pp. 120–129. Springer, Heidelberg (2014)
Ferragina, P., Manzini, G.: Indexing compressed text. J. ACM 52(4), 552–581 (2005)
Fredriksson, K., Grabowski, S.: Efficient algorithms for pattern matching with general gaps, character classes, and transposition invariance. Inf. Retr. 11(4), 335–357 (2008)
Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput. 35(2), 378–407 (2005)
Hofmann, K., Bucher, P., Falquet, L., Bairoch, A.: The PROSITE database, its status in 1999. Nucleic Acids Res. 27(1), 215–219 (1999)
Hon, W., Ku, T., Shah, R., Thankachan, S.V., Vitter, J.S.: Compressed dictionary matching with one error. In: 2011 Data Compression Conference (DCC 2011), Snowbird, UT, USA, pp. 113–122, 29–31 March 2011
Hon, W., Ku, T., Shah, R., Thankachan, S.V., Vitter, J.S.: Faster compressed dictionary matching. Theor. Comput. Sci. 475, 113–119 (2013)
Hon, W., Lam, T.W., Shah, R., Tam, S., Vitter, J.S.: Compressed index for dictionary matching. In: 2008 Data Compression Conference (DCC 2008), Snowbird, UT, USA, pp. 23–32, 25–27 March 2008
Hon, W.-K., Lam, T.-W., Shah, R., Tam, S.-L., Vitter, J.S.: Succinct index for dynamic dictionary matching. In: Dong, Y., Du, D.-Z., Ibarra, O. (eds.) ISAAC 2009. LNCS, vol. 5878, pp. 1034–1043. Springer, Heidelberg (2009)
Hon, W.-K., Ku, T.-H., Lam, T.-W., Shah, R., Tam, S.-L., Thankachan, S.V., Vitter, J.S.: Compressing dictionary matching index via sparsification technique. Algorithmica 72(2), 515–538 (2015)
Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31(2), 249–260 (1987)
Karpinski, M., Nekrich, Y.: Space efficient multi-dimensional range reporting. In: Ngo, H.Q. (ed.) COCOON 2009. LNCS, vol. 5609, pp. 215–224. Springer, Heidelberg (2009)
Knuth, D.E., Morris Jr., J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)
Lewenstein, M.: Dictionary matching. In: Kao, M.-Y. (ed.) Encyclopedia of Algorithms, pp. 1–6. Springer, US (2015)
Manber, U., Myers, E.W.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)
Mehldau, G., Myers, G.: A system for pattern matching applications on biosequences. Comput. Appl. Biosci. 9(3), 299–314 (1993)
Navarro, G., Raffinot, M.: Fast and simple character classes and bounded gaps pattern matching, with applications to protein searching. J. Comput. Biol. 10(6), 903–923 (2003)
Sleator, D.D., Tarjan, R.E.: A data structure for dynamic trees. J. Comput. Syst. Sci. 26(3), 362–391 (1983)
Weiner, P.: Linear pattern matching algorithms. In: 14th Annual Symposium on Switching and Automata Theory, Iowa City, Iowa, USA, pp. 1–11, 15–17 October 1973
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Hon, WK., Lam, TW., Shah, R., Thankachan, S.V., Ting, HF., Yang, Y. (2015). Dictionary Matching with Uneven Gaps. In: Cicalese, F., Porat, E., Vaccaro, U. (eds) Combinatorial Pattern Matching. CPM 2015. Lecture Notes in Computer Science(), vol 9133. Springer, Cham. https://doi.org/10.1007/978-3-319-19929-0_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-19929-0_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19928-3
Online ISBN: 978-3-319-19929-0
eBook Packages: Computer ScienceComputer Science (R0)