Skip to main content

Less Space: Indexing for Queries with Wildcards

  • Conference paper
Algorithms and Computation (ISAAC 2013)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8283))

Included in the following conference series:

  • 1545 Accesses

Abstract

Text indexing is a fundamental problem in computer science, where the task is to index a given text (string) T[1..n], such that whenever a pattern P[1..p] comes as a query, we can efficiently report all those locations where P occurs as a substring of T. In this paper, we consider the case when P contains wildcard characters (which can match with any other character). The first non-trivial solution for the problem is given by Cole et al. [STOC 2004], where the index space is O(nlogk n) words or O(nlogk + 1 n) bits and the query time is O(p + 2h loglogn + occ), where k is the maximum number of wildcard characters allowed in P, h ≤ k is the number of wildcard characters in P and occ represents the number of occurrences of P in T. Even though many indexes offering different space-time trade-offs were later proposed, a clear improvement on this result is still not known. In this paper, we first propose an O(nlogk + ε n) bits index achieving the same query time as that of Cole et al.’s index, where 0 < ε < 1 is an arbitrary small constant. Then we propose another index of size O(nlogk nlogσ) bits, but with a slightly higher query time of O(p + 2h logn + occ), where σ denotes the alphabet set size.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alstrup, S., Brodal, G.S., Rauhe, T.: New data structures for orthogonal range searching. In: FOCS, pp. 198–207 (2000)

    Google Scholar 

  2. Amir, A., Keselman, D., Landau, G.M., Lewenstein, M., Lewenstein, N., Rodeh, M.: Text indexing and dictionary matching with one error. J. Algorithms 37(2), 309–325 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  3. Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Monotone minimal perfect hashing: searching a sorted table with O(1) accesses. In: SODA, pp. 785–794 (2009)

    Google Scholar 

  4. Belazzougui, D., Navarro, G., Valenzuela, D.: Improved compressed indexes for full-text document retrieval. J. Algorithms 18, 3–13 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  5. Bille, P., Gørtz, I.L.: Substring range reporting. In: Giancarlo, R., Manzini, G. (eds.) CPM 2011. LNCS, vol. 6661, pp. 299–308. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  6. Bille, P., Gørtz, I.L., Vildhøj, H.W., Vind, S.: String indexing for patterns with wildcards. In: Fomin, F.V., Kaski, P. (eds.) SWAT 2012. LNCS, vol. 7357, pp. 283–294. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  7. Bucher, P., Bairoch, A.: A generalized profile syntax for biomolecular sequence motifs and its function in automatic sequence interpretation. In: ISMB, pp. 53–61 (1994)

    Google Scholar 

  8. Chan, H.-L., Lam, T.-W., Sung, W.-K., Tam, S.-L., Wong, S.-S.: Compressed indexes for approximate string matching. Algorithmica 58(2), 263–281 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  9. Chan, T.M., Larsen, K.G., Patrascu, M.: Orthogonal range searching on the RAM, revisited. In: Symposium on Computational Geometry, pp. 1–10 (2011)

    Google Scholar 

  10. Chien, Y.-F., Hon, W.-K., Shah, R., Thankachan, S.V., Vitter, J.S.: Geometric BWT: Compressed text indexing via sparse suffixes and range searching. Algorithmica (2013)

    Google Scholar 

  11. Cole, R., Gottlieb, L.-A., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: STOC, pp. 91–100 (2004)

    Google Scholar 

  12. Golynski, A., Ian Munro, J., Srinivasa Rao, S.: Rank/select operations on large alphabets: a tool for text indexing. In: SODA, pp. 368–373 (2006)

    Google Scholar 

  13. Hofmann, K., Bucher, P., Falquet, L., Bairoch, A.: The prosite database, its status in 1999. Nucleic Acids Research 27(1), 215–219 (1999)

    Article  Google Scholar 

  14. Hon, W.-K., Ku, T.-H., Shah, R., Thankachan, S.V., Vitter, J.S.: Compressed text indexing with wildcards. J. Discrete Algorithms 19, 23–29 (2013)

    Article  MathSciNet  Google Scholar 

  15. Huynh, T.N.D., Hon, W.-K., Lam, T.-W., Sung, W.-K.: Approximate string matching using compressed suffix arrays. Theoretical Comp. Science 352(1), 240–249 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  16. Iliopoulos, C.S., Rahman, M.S.: Indexing factors with gaps. Algorithmica 55(1), 60–70 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  17. Kärkkäinen, J., Puglisi, S.J.: Medium-space algorithms for inverse BWT. ESA (1), 451–462 (2010)

    Google Scholar 

  18. Lam, T.-W., Sung, W.-K., Tam, S.-L., Yiu, S.-M.: Space efficient indexes for string matching with don’t cares. In: Tokuyama, T. (ed.) ISAAC 2007. LNCS, vol. 4835, pp. 846–857. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  19. Lewenstein, M.: Indexing with gaps. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 135–143. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  20. Lewenstein, M.: Orthogonal range searching for text indexing. In: Brodnik, A., López-Ortiz, A., Raman, V., Viola, A. (eds.) Ianfest-66. LNCS, vol. 8066, pp. 267–302. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  21. Manber, U., Myers, E.W.: Suffix arrays: A new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  22. Rahman, M.S., Iliopoulos, C.S.: Pattern matching algorithms with don’t cares. In: SOFSEM (2), pp. 116–126 (2007)

    Google Scholar 

  23. Raman, R., Raman, V., Satti, S.R.: Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Transactions on Algorithms 3(4) (2007)

    Google Scholar 

  24. Sadakane, K., Navarro, G.: Fully-functional succinct trees. In: SODA, pp. 134–149 (2010)

    Google Scholar 

  25. Sleator, D.D., Tarjan, R.E.: A data structure for dynamic trees. J. Comput. Syst. Sci. 26(3), 362–391 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  26. Tam, A., Wu, E., Lam, T.-W., Yiu, S.-M.: Succinct text indexing with wildcards. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 39–50. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  27. Thachuk, C.: Compressed indexes for text with wildcards. Theor. Comput. Sci. 483, 22–35 (2013)

    Article  MathSciNet  Google Scholar 

  28. Weiner, P.: Linear pattern matching algorithms. In: SWAT (FOCS), pp. 1–11 (1973)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lewenstein, M., Munro, J.I., Raman, V., Thankachan, S.V. (2013). Less Space: Indexing for Queries with Wildcards. In: Cai, L., Cheng, SW., Lam, TW. (eds) Algorithms and Computation. ISAAC 2013. Lecture Notes in Computer Science, vol 8283. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45030-3_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-45030-3_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-45029-7

  • Online ISBN: 978-3-642-45030-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics