Skip to main content

Sublinear DTD Validity

  • Conference paper
  • First Online:
Language and Automata Theory and Applications (LATA 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8977))

  • 1400 Accesses

Abstract

We present an efficient algorithm for testing approximate dtd validity modulo the strong tree edit distance. Our algorithm inspects xml documents in a probabilistic manner. It detects with high probability the nonvalidity of xml documents with a large fraction of errors, measured in terms of the strong tree edit distance from the dtd. The run time depends polynomially on the depth of the xml document tree but not on its size, so that it is sublinear in most cases (because in practice XML documents tend to be shallow). Therefore, our algorithm can be used to speed up exact dtd validators that run in linear time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Akutsu, T.: A relation between edit distance for ordered trees and edit distance for Euler strings. Inf. Process. Lett., 105–109 (2006)

    Google Scholar 

  2. Alon, N., Krivelevich, M., Newman, I., Szegedy, M.: Regular Languages are Testable with a Constant Number of Queries. SIAM J. Comput., 1842–1862 (2000)

    Google Scholar 

  3. Alur, R., Madhusudan, P.: Adding nesting structure to words. Journal of the ACM, 1–43 (2009)

    Google Scholar 

  4. BrĂ¼ggemann-Klein, A.: Regular Expressions to Finite Automata. Theoretical Computer Science, 197–213 (1993)

    Google Scholar 

  5. Chockler, H., Kupferman, O.: w-Regular languages are testable with a constant number of queries. Theor. Comput. Sci., 71–92 (2004)

    Google Scholar 

  6. Comon, H., Dauchet, M., Gilleron, R., Löding, C., Jacquemard, F., Lugiez, D., Tison, S., Tommasi, M.: Tree Automata Techniques and Applications (2007)

    Google Scholar 

  7. Fischer, E., Magniez, F., de Rougemont, M.: Approximate satisfiability and equivalence. In: LICS, pp. 421–430 (2006)

    Google Scholar 

  8. Goldreich, O.: Combinatorial property testing (a survey). In: Randomization Methods in Algorithm Design, pp. 45–60 (1998)

    Google Scholar 

  9. Green, T.J., Gupta, A., Miklau, G., Onizuka, M., Suciu, D.: Processing XML streams with deterministic automata and stream indexes. ACM Trans. Database Syst., 752–788 (2004)

    Google Scholar 

  10. Hagenah, C., Muscholl, A.: Computing epsilon-free nfa from regular expressions in \(O(n log^2(n))\) time. ITA, 257–278 (2000)

    Google Scholar 

  11. Martens, W., Neven, F., Schwentick, T., Bex, G.J.: Expressiveness and complexity of XML schema. ACM Transactions of Database Systems, 770–813 (2006)

    Google Scholar 

  12. Ndione, A., Lemay, A., Niehren, J.: Approximate membership for regular languages modulo the edit distance. Theor. Comput. Sci., 37–49 (2013)

    Google Scholar 

  13. Newman, I., Sohler, C.: Every property of hyperfinite graphs is testable. In: STOC, pp. 675–684 (2011)

    Google Scholar 

  14. Pawlik, M., Augsten, N.: RTED: A Robust Algorithm for the Tree Edit Distance. PVLDB, 334–345 (2011)

    Google Scholar 

  15. Ron, D.: Property Testing: A Learning Theory Perspective. Foundations and Trends in Machine Learning, 307–402 (2008)

    Google Scholar 

  16. Schnitger, G.: Regular expressions and NFAs without \(\varepsilon \)-transitions. In: Durand, B., Thomas, W. (eds.) STACS 2006. LNCS, vol. 3884, pp. 432–443. Springer, Heidelberg (2006)

    Google Scholar 

  17. Selkow, S.M.: The Tree-to-Tree Editing Problem. Inf. Process. Lett., 184–186 (1977)

    Google Scholar 

  18. Zhang, K., Shasha, D.: Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems. SIAM J. Comput., 1245–1262 (1989)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antoine Ndione .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Ndione, A., Lemay, A., Niehren, J. (2015). Sublinear DTD Validity. In: Dediu, AH., Formenti, E., MartĂ­n-Vide, C., Truthe, B. (eds) Language and Automata Theory and Applications. LATA 2015. Lecture Notes in Computer Science(), vol 8977. Springer, Cham. https://doi.org/10.1007/978-3-319-15579-1_58

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-15579-1_58

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-15578-4

  • Online ISBN: 978-3-319-15579-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics