Skip to main content

Efficient Approximate Dictionary Look-Up for Long Words over Small Alphabets

  • Conference paper
LATIN 2006: Theoretical Informatics (LATIN 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3887))

Included in the following conference series:

Abstract

Given a dictionary \({\mathcal W}\) consisting of n binary strings of length m each, a d-query asks if there exists a string in \({\mathcal W}\) within Hamming distance d of a given binary query string q. The problem was posed by Minsky and Papert in 1969 as a challenge to data structure design. There is a tradeoff between time and space in solving the problem of answering a d-query. Recently developed time-efficient methods for text indexing with errors can be used to answer a d-query in O(m) time. However, these methods use O(nlogd n) (or more) additional space which is not practical for large databases. We present a method for the problem assuming the standard RAM model of computation. We process the dictionary to construct an edge-labelled tree with distinct labels to siblings, and with bounded branching factor and height. Storing the resulting tree does not require asymptotically more space than the size of an ordinary trie that stores the given dictionary. We present an algorithm for the d-query problem that takes O(m(3 log4/3 n ā€“ 1)d (log2 n)dā€‰+ā€‰1) time, and uses only O(m) additional space. We also generalize the results for the case of the problem when a larger alphabet, or edit distance are used. We achieve \(O(m(2|\Sigma|-1)^{d}(log_{(2|\Sigma|-1)}{\it n} -1) ^{d}(log_{2}n)^{d+1})\) time complexity for the problem when Hamming distance is used. The time complexity increases by a factor of \(O(d(2|\Sigma|-1)^d(log_{2}n)^{d})\) when we use edit distance. The algorithms are efficient when the approximate dictionary look-up involves long words defined over small alphabets. The algorithm can be modified such that it allows for words of different lengths as well as different lengths of query strings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arslan, A.N., Eğecioğlu, Ɩ.: Dictionary look-up within small edit distance. Inter. J. of Found. of Comp. Sci.Ā 15(1), 57ā€“71 (2004)

    ArticleĀ  MathSciNetĀ  MATHĀ  Google ScholarĀ 

  2. Brodal, G.S., Gasieniec, L.: Approximate dictionary queries. In: Hirschberg, D.S., Meyers, G. (eds.) CPM 1996. LNCS, vol.Ā 1075, pp. 65ā€“74. Springer, Heidelberg (1996)

    ChapterĀ  Google ScholarĀ 

  3. Brodal, G.S., Velkatesh, S.: Improved bounds for dictionary look-up with one error. IPLĀ 75, 57ā€“59 (2000)

    ArticleĀ  MathSciNetĀ  MATHĀ  Google ScholarĀ 

  4. Cole, R., Gottlieb, L.-A., Lewenstein, N.: Dictionary matching and indexing with errors and donā€™t cares. In: Proc. The 36th ACM STOC, pp. 91ā€“100 (2004)

    Google ScholarĀ 

  5. Dolev, D., Harari, Y., Linial, N., Nisan, N., Parnas, M.: Neighborhood preserving hashing and approximate queries. In: Proc. The Fifth ACM SODA (1994)

    Google ScholarĀ 

  6. Dolev, D., Harari, Y., Parnas, M.: Finding the neighborhood of a query in a dictionary. In: Proc. The Second Israel Symp. on Theory of Comp. and Sys. (1993)

    Google ScholarĀ 

  7. Elias, P.: Efficient storage and retrieval by content and address of static files. J. ACMĀ 21, 246ā€“260 (1974)

    ArticleĀ  MathSciNetĀ  MATHĀ  Google ScholarĀ 

  8. Gusfield, D.: Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, Cambridge (1997)

    BookĀ  MATHĀ  Google ScholarĀ 

  9. MaaƟ, M.G.: Average-case analysis of approximate trie search. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol.Ā 3109, pp. 472ā€“483. Springer, Heidelberg (2004)

    ChapterĀ  Google ScholarĀ 

  10. MaaƟ, M.G., Nowak, J.: Text indexing with errors. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol.Ā 3537, pp. 21ā€“32. Springer, Heidelberg (2005)

    ChapterĀ  Google ScholarĀ 

  11. Manber, U., Wu, S.: An algorithm for approximate membership checking with applications to password security. IPLĀ 50, 191ā€“197 (1994)

    ArticleĀ  MATHĀ  Google ScholarĀ 

  12. Minsky, M., Papert, S.: Perceptrons. MIT Press, Cambridge (1969)

    MATHĀ  Google ScholarĀ 

  13. Navarro, G., Baeza-Yates, R., Sutinen, E., Tarhio, J.: Indexing Methods for Approximate String Matching. IEEE Data Engineering BulletinĀ 24(4), 19ā€“27 (2001)

    Google ScholarĀ 

  14. Shang, H., Merrett, T.H.: Tries for approximate string matching. IEEE Trans. Knowl. Data Eng.Ā 8(4), 540ā€“547 (1996)

    ArticleĀ  Google ScholarĀ 

  15. Ukkonen, E.: Algorithms for Approximate String Matching. Information and ControlĀ 64, 100ā€“118 (1985)

    ArticleĀ  MathSciNetĀ  MATHĀ  Google ScholarĀ 

  16. Yao, A.C., Yao, F.F.: Dictionary look-up with one error. J. of AlgorithmsĀ 25(1), 194ā€“202 (1997)

    ArticleĀ  MathSciNetĀ  MATHĀ  Google ScholarĀ 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Arslan, A.N. (2006). Efficient Approximate Dictionary Look-Up for Long Words over Small Alphabets. In: Correa, J.R., Hevia, A., Kiwi, M. (eds) LATIN 2006: Theoretical Informatics. LATIN 2006. Lecture Notes in Computer Science, vol 3887. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11682462_15

Download citation

  • DOI: https://doi.org/10.1007/11682462_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-32755-4

  • Online ISBN: 978-3-540-32756-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics