Skip to main content

Efficient Approximate Dictionary Look-Up for Long Words over Small Alphabets

  • Conference paper
LATIN 2006: Theoretical Informatics (LATIN 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3887))

Included in the following conference series:

  • 1054 Accesses

Abstract

Given a dictionary \({\mathcal W}\) consisting of n binary strings of length m each, a d-query asks if there exists a string in \({\mathcal W}\) within Hamming distance d of a given binary query string q. The problem was posed by Minsky and Papert in 1969 as a challenge to data structure design. There is a tradeoff between time and space in solving the problem of answering a d-query. Recently developed time-efficient methods for text indexing with errors can be used to answer a d-query in O(m) time. However, these methods use O(nlogd n) (or more) additional space which is not practical for large databases. We present a method for the problem assuming the standard RAM model of computation. We process the dictionary to construct an edge-labelled tree with distinct labels to siblings, and with bounded branching factor and height. Storing the resulting tree does not require asymptotically more space than the size of an ordinary trie that stores the given dictionary. We present an algorithm for the d-query problem that takes O(m(3 log4/3 n ā€“ 1)d (log2 n)dā€‰+ā€‰1) time, and uses only O(m) additional space. We also generalize the results for the case of the problem when a larger alphabet, or edit distance are used. We achieve \(O(m(2|\Sigma|-1)^{d}(log_{(2|\Sigma|-1)}{\it n} -1) ^{d}(log_{2}n)^{d+1})\) time complexity for the problem when Hamming distance is used. The time complexity increases by a factor of \(O(d(2|\Sigma|-1)^d(log_{2}n)^{d})\) when we use edit distance. The algorithms are efficient when the approximate dictionary look-up involves long words defined over small alphabets. The algorithm can be modified such that it allows for words of different lengths as well as different lengths of query strings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Arslan, A.N., Eğecioğlu, Ɩ.: Dictionary look-up within small edit distance. Inter. J. of Found. of Comp. Sci. 15(1), 57ā€“71 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  2. Brodal, G.S., Gasieniec, L.: Approximate dictionary queries. In: Hirschberg, D.S., Meyers, G. (eds.) CPM 1996. LNCS, vol. 1075, pp. 65ā€“74. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  3. Brodal, G.S., Velkatesh, S.: Improved bounds for dictionary look-up with one error. IPL 75, 57ā€“59 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  4. Cole, R., Gottlieb, L.-A., Lewenstein, N.: Dictionary matching and indexing with errors and donā€™t cares. In: Proc. The 36th ACM STOC, pp. 91ā€“100 (2004)

    Google Scholar 

  5. Dolev, D., Harari, Y., Linial, N., Nisan, N., Parnas, M.: Neighborhood preserving hashing and approximate queries. In: Proc. The Fifth ACM SODA (1994)

    Google Scholar 

  6. Dolev, D., Harari, Y., Parnas, M.: Finding the neighborhood of a query in a dictionary. In: Proc. The Second Israel Symp. on Theory of Comp. and Sys. (1993)

    Google Scholar 

  7. Elias, P.: Efficient storage and retrieval by content and address of static files. J. ACM 21, 246ā€“260 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  8. Gusfield, D.: Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, Cambridge (1997)

    Book  MATH  Google Scholar 

  9. MaaƟ, M.G.: Average-case analysis of approximate trie search. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 472ā€“483. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  10. MaaƟ, M.G., Nowak, J.: Text indexing with errors. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol. 3537, pp. 21ā€“32. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  11. Manber, U., Wu, S.: An algorithm for approximate membership checking with applications to password security. IPL 50, 191ā€“197 (1994)

    Article  MATH  Google Scholar 

  12. Minsky, M., Papert, S.: Perceptrons. MIT Press, Cambridge (1969)

    MATH  Google Scholar 

  13. Navarro, G., Baeza-Yates, R., Sutinen, E., Tarhio, J.: Indexing Methods for Approximate String Matching. IEEE Data Engineering Bulletin 24(4), 19ā€“27 (2001)

    Google Scholar 

  14. Shang, H., Merrett, T.H.: Tries for approximate string matching. IEEE Trans. Knowl. Data Eng. 8(4), 540ā€“547 (1996)

    Article  Google Scholar 

  15. Ukkonen, E.: Algorithms for Approximate String Matching. Information and Control 64, 100ā€“118 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  16. Yao, A.C., Yao, F.F.: Dictionary look-up with one error. J. of Algorithms 25(1), 194ā€“202 (1997)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Arslan, A.N. (2006). Efficient Approximate Dictionary Look-Up for Long Words over Small Alphabets. In: Correa, J.R., Hevia, A., Kiwi, M. (eds) LATIN 2006: Theoretical Informatics. LATIN 2006. Lecture Notes in Computer Science, vol 3887. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11682462_15

Download citation

  • DOI: https://doi.org/10.1007/11682462_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-32755-4

  • Online ISBN: 978-3-540-32756-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics