Abstract
Given a dictionary \({\mathcal W}\) consisting of n binary strings of length m each, a d-query asks if there exists a string in \({\mathcal W}\) within Hamming distance d of a given binary query string q. The problem was posed by Minsky and Papert in 1969 as a challenge to data structure design. There is a tradeoff between time and space in solving the problem of answering a d-query. Recently developed time-efficient methods for text indexing with errors can be used to answer a d-query in O(m) time. However, these methods use O(nlogd n) (or more) additional space which is not practical for large databases. We present a method for the problem assuming the standard RAM model of computation. We process the dictionary to construct an edge-labelled tree with distinct labels to siblings, and with bounded branching factor and height. Storing the resulting tree does not require asymptotically more space than the size of an ordinary trie that stores the given dictionary. We present an algorithm for the d-query problem that takes O(m(3 log4/3 n ā 1)d (log2 n)dā+ā1) time, and uses only O(m) additional space. We also generalize the results for the case of the problem when a larger alphabet, or edit distance are used. We achieve \(O(m(2|\Sigma|-1)^{d}(log_{(2|\Sigma|-1)}{\it n} -1) ^{d}(log_{2}n)^{d+1})\) time complexity for the problem when Hamming distance is used. The time complexity increases by a factor of \(O(d(2|\Sigma|-1)^d(log_{2}n)^{d})\) when we use edit distance. The algorithms are efficient when the approximate dictionary look-up involves long words defined over small alphabets. The algorithm can be modified such that it allows for words of different lengths as well as different lengths of query strings.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Arslan, A.N., EÄecioÄlu, Ć.: Dictionary look-up within small edit distance. Inter. J. of Found. of Comp. Sci. 15(1), 57ā71 (2004)
Brodal, G.S., Gasieniec, L.: Approximate dictionary queries. In: Hirschberg, D.S., Meyers, G. (eds.) CPM 1996. LNCS, vol. 1075, pp. 65ā74. Springer, Heidelberg (1996)
Brodal, G.S., Velkatesh, S.: Improved bounds for dictionary look-up with one error. IPL 75, 57ā59 (2000)
Cole, R., Gottlieb, L.-A., Lewenstein, N.: Dictionary matching and indexing with errors and donāt cares. In: Proc. The 36th ACM STOC, pp. 91ā100 (2004)
Dolev, D., Harari, Y., Linial, N., Nisan, N., Parnas, M.: Neighborhood preserving hashing and approximate queries. In: Proc. The Fifth ACM SODA (1994)
Dolev, D., Harari, Y., Parnas, M.: Finding the neighborhood of a query in a dictionary. In: Proc. The Second Israel Symp. on Theory of Comp. and Sys. (1993)
Elias, P.: Efficient storage and retrieval by content and address of static files. J. ACM 21, 246ā260 (1974)
Gusfield, D.: Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, Cambridge (1997)
MaaĆ, M.G.: Average-case analysis of approximate trie search. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 472ā483. Springer, Heidelberg (2004)
MaaĆ, M.G., Nowak, J.: Text indexing with errors. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol. 3537, pp. 21ā32. Springer, Heidelberg (2005)
Manber, U., Wu, S.: An algorithm for approximate membership checking with applications to password security. IPL 50, 191ā197 (1994)
Minsky, M., Papert, S.: Perceptrons. MIT Press, Cambridge (1969)
Navarro, G., Baeza-Yates, R., Sutinen, E., Tarhio, J.: Indexing Methods for Approximate String Matching. IEEE Data Engineering Bulletin 24(4), 19ā27 (2001)
Shang, H., Merrett, T.H.: Tries for approximate string matching. IEEE Trans. Knowl. Data Eng. 8(4), 540ā547 (1996)
Ukkonen, E.: Algorithms for Approximate String Matching. Information and Control 64, 100ā118 (1985)
Yao, A.C., Yao, F.F.: Dictionary look-up with one error. J. of Algorithms 25(1), 194ā202 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Arslan, A.N. (2006). Efficient Approximate Dictionary Look-Up for Long Words over Small Alphabets. In: Correa, J.R., Hevia, A., Kiwi, M. (eds) LATIN 2006: Theoretical Informatics. LATIN 2006. Lecture Notes in Computer Science, vol 3887. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11682462_15
Download citation
DOI: https://doi.org/10.1007/11682462_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32755-4
Online ISBN: 978-3-540-32756-1
eBook Packages: Computer ScienceComputer Science (R0)