Skip to main content

Optimizing Schema Languages for XML: Numerical Constraints and Interleaving

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4353))

Abstract

The presence of a schema offers many advantages in processing, translating, querying, and storage of XML data. Basic decision problems like equivalence, inclusion, and non-emptiness of intersection of schemas form the basic building blocks for schema optimization and integration, and algorithms for static analysis of transformations. It is thereby paramount to establish the exact complexity of these problems. Most common schema languages for XML can be adequately modeled by some kind of grammar with regular expressions at right-hand sides. In this paper, we observe that apart from the usual regular operators of union, concatenation and Kleene-star, schema languages also allow numerical occurrence constraints and interleaving operators. Although the expressiveness of these operators remain within the regular languages, their presence or absence has significant impact on the complexity of the basic decision problems. We present a complete overview of the complexity of the basic decision problems for DTDs, XSDs and Relax NG with regular expressions incorporating numerical occurrence constraints and interleaving. We also discuss chain regular expressions and the complexity of the schema simplification problem incorporating the new operators.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abiteboul, S., Buneman, P., Suciu, D.: Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  2. Benedikt, M., Fan, W., Geerts, F.: XPath satisfiability in the presence of DTDs. In: PODS 2005, pp. 25–36 (2005)

    Google Scholar 

  3. Bex, G.J., Neven, F., Schwentick, T., Tuyls, K.: Inference of concise DTDs from XML data. In: VLDB 2006, pp. 115–126 (2006)

    Google Scholar 

  4. Bex, G.J., Neven, F., Van den Bussche, J.: DTDs versus XML schema: A practical study. In: WebDB 2004, pp. 79–84 (2004)

    Google Scholar 

  5. Brüggemann-Klein, A.: Unambiguity of extended regular expressions in SGML document grammars. In: Lengauer, T. (ed.) ESA 1993. LNCS, vol. 726, pp. 73–84. Springer, Heidelberg (1993)

    Google Scholar 

  6. Brüggemann-Klein, A., Murata, M., Wood, D.: Regular tree and regular hedge languages over unranked alphabets: Version 1 (April 3, 2001); Technical Report HKUST-TCSC-2001-0, The Hongkong University of Science and Technology (2001)

    Google Scholar 

  7. Brüggemann-Klein, A., Wood, D.: One-unambiguous regular languages. Information and Computation 142(2), 182–206 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  8. Clark, J., Murata, M.: RELAX NG Specification. OASIS (December 2001)

    Google Scholar 

  9. Cristau, J., Löding, C., Thomas, W.: Deterministic automata on unranked trees. In: Liśkiewicz, M., Reischuk, R. (eds.) FCT 2005. LNCS, vol. 3623, pp. 68–79. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  10. Dal-Zilio, S., Lugiez, D.: XML schema, tree logic and sheaves automata. In: RTA, pp. 246–263 (2003)

    Google Scholar 

  11. Deutsch, A., Fernandez, M.F., Suciu, D.: Storing Semistructured Data with STORED. In: SIGMOD 1999, pp. 431–442 (1999)

    Google Scholar 

  12. Fürer, M.: The complexity of the inequivalence problem for regular expressions with intersection. In: de Bakker, J.W., van Leeuwen, J. (eds.) ICALP 1980. LNCS, vol. 85, pp. 234–245. Springer, Heidelberg (1980)

    Google Scholar 

  13. Hemaspaandra, L., Ogihara, M.: Complexity Theory Companion. Springer, Heidelberg (2002)

    MATH  Google Scholar 

  14. Hopcroft, J.E., Motwani, R., Ullman, J.D.: Introduction to Automata Theory, Languages, and Computation, 2nd edn. Addison-Wesley, Reading (2001)

    MATH  Google Scholar 

  15. Hosoya, H., Pierce, B.C.: XDuce: A statically typed XML processing language. ACM Trans. Inter. Tech. 3(2), 117–148 (2003)

    Article  Google Scholar 

  16. Jȩdrzejowicz, J., Szepietowski, A.: Shuffle languages are in P. Theoretical Computer Science 250(1-2), 31–53 (2001)

    Article  MathSciNet  Google Scholar 

  17. Kilpeläinen, P.: Inclusion of unambiguous #REs is NP-hard, University of Kuopio, Finland (May 2004) (unpublished note)

    Google Scholar 

  18. Kilpeläinen, P., Tuhkanen, R.: One-unambiguity of regular expressions with numeric occurrence indicators. Tech. Rep. A/2006/2, Univ. Kuopio, Finland (2006)

    Google Scholar 

  19. Kilpeläinen, P., Tuhkanen, R.: Towards efficient implementation of XML schema content models. In: DOCENG 2004, pp. 239–241. ACM Press, New York (2004)

    Chapter  Google Scholar 

  20. Koch, C., Scherzinger, S., Schweikardt, N., Stegmaier, B.: Schema-based scheduling of event processors and buffer minimization for queries on structured data streams. In: VLDB 2004, pp. 228–239 (2004)

    Google Scholar 

  21. Kozen, D.: Lower bounds for natural proof systems. In: FOCS 1977, pp. 254–266. IEEE, Los Alamitos (1977)

    Google Scholar 

  22. Mani, M.: Keeping chess alive — Do we need 1-unambiguous content models? In: Extreme Markup Languages, Montreal, Canada (2001)

    Google Scholar 

  23. Manolescu, I., Florescu, D., Kossmann, D.: Answering XML Queries on Heterogeneous Data Sources. In: VLDB 2001, pp. 241–250 (2001)

    Google Scholar 

  24. Martens, W., Neven, F.: Frontiers of tractability for typechecking simple XML transformations. Journal of Computer and System Sciences (to appear, 2006)

    Google Scholar 

  25. Martens, W., Neven, F., Schwentick, T.: Complexity of decision problems for simple regular expressions. In: Fiala, J., Koubek, V., Kratochvíl, J. (eds.) MFCS 2004. LNCS, vol. 3153, pp. 889–900. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  26. Martens, W., Neven, F., Schwentick, T., Bex, G.J.: Expressiveness and complexity of XML schema. ACM Trans. Database Systems 31(3) (to appear, 2006)

    Google Scholar 

  27. Martens, W., Niehren, J.: Minimizing tree automata for unranked trees. In: Bierman, G., Koch, C. (eds.) DBPL 2005. LNCS, vol. 3774, pp. 232–246. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  28. Mayer, A.J., Stockmeyer, L.J.: Word problems — this time with interleaving. Information and Computation 115(2), 293–311 (1994)

    Article  MathSciNet  Google Scholar 

  29. Murata, M., Lee, D., Mani, M., Kawaguchi, K.: Taxonomy of XML schema languages using formal language theory. ACM Trans. Inter. Tech. 5(4), 1–45 (2005)

    Article  Google Scholar 

  30. Neven, F., Schwentick, T.: XPath containment in the presence of disjunction, DTDs, and variables. Logical Methods in Computer Science (to appear, 2006)

    Google Scholar 

  31. Papakonstantinou, Y., Vianu, V.: DTD inference for views of XML data. In: PODS 2000, pp. 35–46. ACM Press, New York (2000)

    Chapter  Google Scholar 

  32. Reuter, F.: An enhanced W3C XML Schema-based language binding for object oriented programming languages (2006) manuscript

    Google Scholar 

  33. Seidl, H.: Deciding equivalence of finite tree automata. SIAM Journal on Computing 19(3), 424–437 (1990)

    Article  MATH  MathSciNet  Google Scholar 

  34. Seidl, H.: Haskell overloading is DEXPTIME-complete. Information Processing Letters 52(2), 57–60 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  35. Sperberg-McQueen, C.M.: XML Schema 1.0: A language for document grammars. In: XML 2003 (2003)

    Google Scholar 

  36. Sperberg-McQueen, C.M., Thompson, H.: XML Schema (2005), http://www.w3.org/XML/Schema

  37. Stockmeyer, L.J., Meyer, A.R.: Word problems requiring exponential time: Preliminary report. In: STOC 1973, pp. 1–9. ACM Press, New York (1973)

    Chapter  Google Scholar 

  38. van der Vlist, E.: XML Schema. O’Reilly, Sebastopol (2002)

    MATH  Google Scholar 

  39. van Emde Boas, P.: The convenience of tilings. In: Complexity, Logic and Recursion Theory. Lec. Notes in Pure and App. Math., vol. 187, pp. 331–363 (1997)

    Google Scholar 

  40. Wang, G., Liu, M., Yu, J.X., Sun, B., Yu, G., Lv, J., Lu, H.: Effective schema-based XML query optimization techniques. In: IDEAS 2003, pp. 230–235 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gelade, W., Martens, W., Neven, F. (2006). Optimizing Schema Languages for XML: Numerical Constraints and Interleaving. In: Schwentick, T., Suciu, D. (eds) Database Theory – ICDT 2007. ICDT 2007. Lecture Notes in Computer Science, vol 4353. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11965893_19

Download citation

  • DOI: https://doi.org/10.1007/11965893_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69269-0

  • Online ISBN: 978-3-540-69270-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics