Abstract
In this work, the problem we address is text indexing for approximate matching. Given a text \(\mathcal{T}\) which undergoes some preprocessing to generate an index, we can later query this index to identify the places where a string occurs up to a certain number of errors k (edition distance). The indexing structure occupies space \(\mathcal{O}(n\log^kn)\) in the average case, independent of alphabet size. This structure can be used to report the existence of a match with k errors in \(\mathcal{O}(3^k m^{k+1})\) and to report the occurrences in \(\mathcal{O}(3^k m^{k+1} + \mbox{\it ed})\) time, where m is the length of the pattern and ed and the number of matching edit scripts. The construction of the structure has time bound by \(\mathcal{O}(kN|\Sigma|)\), where N is the number of nodes in the index and |Σ| the alphabet size.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Weiner, P.: Linear pattern matching algorithms. In: FOCS, pp. 1–11. IEEE, Los Alamitos (1973)
Navarro, G.: A guided tour to approximate string matching. ACM Computing Surveys 33 (2001)
Maaß, M.G., Nowak, J.: Text indexing with errors. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol. 3537, pp. 21–32. Springer, Heidelberg (2005)
Chattaraj, A., Parida, L.: An inexact-suffix-tree-based algorithm for detecting extensible patterns. Theor. Comput. Sci. 335, 3–14 (2005)
Cole, R., Gottlieb, L.A., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: STOC, pp. 91–100 (2004)
McCreight, E.: A space-economical suffix tree construction algorithm. J. ACM 23, 262–272 (1976)
Apostolico, A., Szpankowski, W.: Self-alignments in words and their applications. J. Algorithms 13, 446–467 (1992)
Gusfield, D.: Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, New York (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Coelho, L.P., Oliveira, A.L. (2006). Dotted Suffix Trees A Structure for Approximate Text Indexing. In: Crestani, F., Ferragina, P., Sanderson, M. (eds) String Processing and Information Retrieval. SPIRE 2006. Lecture Notes in Computer Science, vol 4209. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11880561_27
Download citation
DOI: https://doi.org/10.1007/11880561_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45774-9
Online ISBN: 978-3-540-45775-6
eBook Packages: Computer ScienceComputer Science (R0)