Abstract
We present an efficient algorithm for testing approximate dtd validity modulo the strong tree edit distance. Our algorithm inspects xml documents in a probabilistic manner. It detects with high probability the nonvalidity of xml documents with a large fraction of errors, measured in terms of the strong tree edit distance from the dtd. The run time depends polynomially on the depth of the xml document tree but not on its size, so that it is sublinear in most cases (because in practice XML documents tend to be shallow). Therefore, our algorithm can be used to speed up exact dtd validators that run in linear time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Akutsu, T.: A relation between edit distance for ordered trees and edit distance for Euler strings. Inf. Process. Lett., 105–109 (2006)
Alon, N., Krivelevich, M., Newman, I., Szegedy, M.: Regular Languages are Testable with a Constant Number of Queries. SIAM J. Comput., 1842–1862 (2000)
Alur, R., Madhusudan, P.: Adding nesting structure to words. Journal of the ACM, 1–43 (2009)
BrĂ¼ggemann-Klein, A.: Regular Expressions to Finite Automata. Theoretical Computer Science, 197–213 (1993)
Chockler, H., Kupferman, O.: w-Regular languages are testable with a constant number of queries. Theor. Comput. Sci., 71–92 (2004)
Comon, H., Dauchet, M., Gilleron, R., Löding, C., Jacquemard, F., Lugiez, D., Tison, S., Tommasi, M.: Tree Automata Techniques and Applications (2007)
Fischer, E., Magniez, F., de Rougemont, M.: Approximate satisfiability and equivalence. In: LICS, pp. 421–430 (2006)
Goldreich, O.: Combinatorial property testing (a survey). In: Randomization Methods in Algorithm Design, pp. 45–60 (1998)
Green, T.J., Gupta, A., Miklau, G., Onizuka, M., Suciu, D.: Processing XML streams with deterministic automata and stream indexes. ACM Trans. Database Syst., 752–788 (2004)
Hagenah, C., Muscholl, A.: Computing epsilon-free nfa from regular expressions in \(O(n log^2(n))\) time. ITA, 257–278 (2000)
Martens, W., Neven, F., Schwentick, T., Bex, G.J.: Expressiveness and complexity of XML schema. ACM Transactions of Database Systems, 770–813 (2006)
Ndione, A., Lemay, A., Niehren, J.: Approximate membership for regular languages modulo the edit distance. Theor. Comput. Sci., 37–49 (2013)
Newman, I., Sohler, C.: Every property of hyperfinite graphs is testable. In: STOC, pp. 675–684 (2011)
Pawlik, M., Augsten, N.: RTED: A Robust Algorithm for the Tree Edit Distance. PVLDB, 334–345 (2011)
Ron, D.: Property Testing: A Learning Theory Perspective. Foundations and Trends in Machine Learning, 307–402 (2008)
Schnitger, G.: Regular expressions and NFAs without \(\varepsilon \)-transitions. In: Durand, B., Thomas, W. (eds.) STACS 2006. LNCS, vol. 3884, pp. 432–443. Springer, Heidelberg (2006)
Selkow, S.M.: The Tree-to-Tree Editing Problem. Inf. Process. Lett., 184–186 (1977)
Zhang, K., Shasha, D.: Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems. SIAM J. Comput., 1245–1262 (1989)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Ndione, A., Lemay, A., Niehren, J. (2015). Sublinear DTD Validity. In: Dediu, AH., Formenti, E., MartĂn-Vide, C., Truthe, B. (eds) Language and Automata Theory and Applications. LATA 2015. Lecture Notes in Computer Science(), vol 8977. Springer, Cham. https://doi.org/10.1007/978-3-319-15579-1_58
Download citation
DOI: https://doi.org/10.1007/978-3-319-15579-1_58
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-15578-4
Online ISBN: 978-3-319-15579-1
eBook Packages: Computer ScienceComputer Science (R0)