Efficient Approximate Dictionary Look-Up for Long Words over Small Alphabets

Arslan, Abdullah N.

doi:10.1007/11682462_15

Abdullah N. Arslan¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3887))

Included in the following conference series:

Latin American Symposium on Theoretical Informatics

1054 Accesses

Abstract

Given a dictionary \({\mathcal W}\) consisting of n binary strings of length m each, a d-query asks if there exists a string in \({\mathcal W}\) within Hamming distance d of a given binary query string q. The problem was posed by Minsky and Papert in 1969 as a challenge to data structure design. There is a tradeoff between time and space in solving the problem of answering a d-query. Recently developed time-efficient methods for text indexing with errors can be used to answer a d-query in O(m) time. However, these methods use O(nlog^d n) (or more) additional space which is not practical for large databases. We present a method for the problem assuming the standard RAM model of computation. We process the dictionary to construct an edge-labelled tree with distinct labels to siblings, and with bounded branching factor and height. Storing the resulting tree does not require asymptotically more space than the size of an ordinary trie that stores the given dictionary. We present an algorithm for the d-query problem that takes O(m(3 log_4/3 n – 1)^d (log₂ n)^d + 1) time, and uses only O(m) additional space. We also generalize the results for the case of the problem when a larger alphabet, or edit distance are used. We achieve \(O(m(2|\Sigma|-1)^{d}(log_{(2|\Sigma|-1)}{\it n} -1) ^{d}(log_{2}n)^{d+1})\) time complexity for the problem when Hamming distance is used. The time complexity increases by a factor of \(O(d(2|\Sigma|-1)^d(log_{2}n)^{d})\) when we use edit distance. The algorithms are efficient when the approximate dictionary look-up involves long words defined over small alphabets. The algorithm can be modified such that it allows for words of different lengths as well as different lengths of query strings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Dictionary Matching with a Bounded Gap in Pattern or in Text

Article 08 February 2017

Fast String Dictionary Lookup with One Error

Dictionary Matching with Uneven Gaps

References

Arslan, A.N., Eğecioğlu, Ö.: Dictionary look-up within small edit distance. Inter. J. of Found. of Comp. Sci. 15(1), 57–71 (2004)
Article MathSciNet MATH Google Scholar
Brodal, G.S., Gasieniec, L.: Approximate dictionary queries. In: Hirschberg, D.S., Meyers, G. (eds.) CPM 1996. LNCS, vol. 1075, pp. 65–74. Springer, Heidelberg (1996)
Chapter Google Scholar
Brodal, G.S., Velkatesh, S.: Improved bounds for dictionary look-up with one error. IPL 75, 57–59 (2000)
Article MathSciNet MATH Google Scholar
Cole, R., Gottlieb, L.-A., Lewenstein, N.: Dictionary matching and indexing with errors and don’t cares. In: Proc. The 36th ACM STOC, pp. 91–100 (2004)
Google Scholar
Dolev, D., Harari, Y., Linial, N., Nisan, N., Parnas, M.: Neighborhood preserving hashing and approximate queries. In: Proc. The Fifth ACM SODA (1994)
Google Scholar
Dolev, D., Harari, Y., Parnas, M.: Finding the neighborhood of a query in a dictionary. In: Proc. The Second Israel Symp. on Theory of Comp. and Sys. (1993)
Google Scholar
Elias, P.: Efficient storage and retrieval by content and address of static files. J. ACM 21, 246–260 (1974)
Article MathSciNet MATH Google Scholar
Gusfield, D.: Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, Cambridge (1997)
Book MATH Google Scholar
Maaß, M.G.: Average-case analysis of approximate trie search. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 472–483. Springer, Heidelberg (2004)
Chapter Google Scholar
Maaß, M.G., Nowak, J.: Text indexing with errors. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol. 3537, pp. 21–32. Springer, Heidelberg (2005)
Chapter Google Scholar
Manber, U., Wu, S.: An algorithm for approximate membership checking with applications to password security. IPL 50, 191–197 (1994)
Article MATH Google Scholar
Minsky, M., Papert, S.: Perceptrons. MIT Press, Cambridge (1969)
MATH Google Scholar
Navarro, G., Baeza-Yates, R., Sutinen, E., Tarhio, J.: Indexing Methods for Approximate String Matching. IEEE Data Engineering Bulletin 24(4), 19–27 (2001)
Google Scholar
Shang, H., Merrett, T.H.: Tries for approximate string matching. IEEE Trans. Knowl. Data Eng. 8(4), 540–547 (1996)
Article Google Scholar
Ukkonen, E.: Algorithms for Approximate String Matching. Information and Control 64, 100–118 (1985)
Article MathSciNet MATH Google Scholar
Yao, A.C., Yao, F.F.: Dictionary look-up with one error. J. of Algorithms 25(1), 194–202 (1997)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Vermont, Burlington, VT, 05405, USA
Abdullah N. Arslan

Authors

Abdullah N. Arslan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Business, Universidad Adolfo Ibáñez, Chile
José R. Correa
Dept. of Computer Science, University of Chile, Blanco Encalada 2120, 3er piso, Santiago, Chile
Alejandro Hevia
Dept. Ing. Matemática & Ctr. de Modelamiento Matemático, UMI 2807 U. Chile–CNRS, Chile
Marcos Kiwi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Arslan, A.N. (2006). Efficient Approximate Dictionary Look-Up for Long Words over Small Alphabets. In: Correa, J.R., Hevia, A., Kiwi, M. (eds) LATIN 2006: Theoretical Informatics. LATIN 2006. Lecture Notes in Computer Science, vol 3887. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11682462_15

Download citation

DOI: https://doi.org/10.1007/11682462_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32755-4
Online ISBN: 978-3-540-32756-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics