Abstract
The presence of a schema offers many advantages in processing, translating, querying, and storage of XML data. Basic decision problems like equivalence, inclusion, and non-emptiness of intersection of schemas form the basic building blocks for schema optimization and integration, and algorithms for static analysis of transformations. It is thereby paramount to establish the exact complexity of these problems. Most common schema languages for XML can be adequately modeled by some kind of grammar with regular expressions at right-hand sides. In this paper, we observe that apart from the usual regular operators of union, concatenation and Kleene-star, schema languages also allow numerical occurrence constraints and interleaving operators. Although the expressiveness of these operators remain within the regular languages, their presence or absence has significant impact on the complexity of the basic decision problems. We present a complete overview of the complexity of the basic decision problems for DTDs, XSDs and Relax NG with regular expressions incorporating numerical occurrence constraints and interleaving. We also discuss chain regular expressions and the complexity of the schema simplification problem incorporating the new operators.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abiteboul, S., Buneman, P., Suciu, D.: Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufmann, San Francisco (1999)
Benedikt, M., Fan, W., Geerts, F.: XPath satisfiability in the presence of DTDs. In: PODS 2005, pp. 25–36 (2005)
Bex, G.J., Neven, F., Schwentick, T., Tuyls, K.: Inference of concise DTDs from XML data. In: VLDB 2006, pp. 115–126 (2006)
Bex, G.J., Neven, F., Van den Bussche, J.: DTDs versus XML schema: A practical study. In: WebDB 2004, pp. 79–84 (2004)
Brüggemann-Klein, A.: Unambiguity of extended regular expressions in SGML document grammars. In: Lengauer, T. (ed.) ESA 1993. LNCS, vol. 726, pp. 73–84. Springer, Heidelberg (1993)
Brüggemann-Klein, A., Murata, M., Wood, D.: Regular tree and regular hedge languages over unranked alphabets: Version 1 (April 3, 2001); Technical Report HKUST-TCSC-2001-0, The Hongkong University of Science and Technology (2001)
Brüggemann-Klein, A., Wood, D.: One-unambiguous regular languages. Information and Computation 142(2), 182–206 (1998)
Clark, J., Murata, M.: RELAX NG Specification. OASIS (December 2001)
Cristau, J., Löding, C., Thomas, W.: Deterministic automata on unranked trees. In: Liśkiewicz, M., Reischuk, R. (eds.) FCT 2005. LNCS, vol. 3623, pp. 68–79. Springer, Heidelberg (2005)
Dal-Zilio, S., Lugiez, D.: XML schema, tree logic and sheaves automata. In: RTA, pp. 246–263 (2003)
Deutsch, A., Fernandez, M.F., Suciu, D.: Storing Semistructured Data with STORED. In: SIGMOD 1999, pp. 431–442 (1999)
Fürer, M.: The complexity of the inequivalence problem for regular expressions with intersection. In: de Bakker, J.W., van Leeuwen, J. (eds.) ICALP 1980. LNCS, vol. 85, pp. 234–245. Springer, Heidelberg (1980)
Hemaspaandra, L., Ogihara, M.: Complexity Theory Companion. Springer, Heidelberg (2002)
Hopcroft, J.E., Motwani, R., Ullman, J.D.: Introduction to Automata Theory, Languages, and Computation, 2nd edn. Addison-Wesley, Reading (2001)
Hosoya, H., Pierce, B.C.: XDuce: A statically typed XML processing language. ACM Trans. Inter. Tech. 3(2), 117–148 (2003)
Jȩdrzejowicz, J., Szepietowski, A.: Shuffle languages are in P. Theoretical Computer Science 250(1-2), 31–53 (2001)
Kilpeläinen, P.: Inclusion of unambiguous #REs is NP-hard, University of Kuopio, Finland (May 2004) (unpublished note)
Kilpeläinen, P., Tuhkanen, R.: One-unambiguity of regular expressions with numeric occurrence indicators. Tech. Rep. A/2006/2, Univ. Kuopio, Finland (2006)
Kilpeläinen, P., Tuhkanen, R.: Towards efficient implementation of XML schema content models. In: DOCENG 2004, pp. 239–241. ACM Press, New York (2004)
Koch, C., Scherzinger, S., Schweikardt, N., Stegmaier, B.: Schema-based scheduling of event processors and buffer minimization for queries on structured data streams. In: VLDB 2004, pp. 228–239 (2004)
Kozen, D.: Lower bounds for natural proof systems. In: FOCS 1977, pp. 254–266. IEEE, Los Alamitos (1977)
Mani, M.: Keeping chess alive — Do we need 1-unambiguous content models? In: Extreme Markup Languages, Montreal, Canada (2001)
Manolescu, I., Florescu, D., Kossmann, D.: Answering XML Queries on Heterogeneous Data Sources. In: VLDB 2001, pp. 241–250 (2001)
Martens, W., Neven, F.: Frontiers of tractability for typechecking simple XML transformations. Journal of Computer and System Sciences (to appear, 2006)
Martens, W., Neven, F., Schwentick, T.: Complexity of decision problems for simple regular expressions. In: Fiala, J., Koubek, V., Kratochvíl, J. (eds.) MFCS 2004. LNCS, vol. 3153, pp. 889–900. Springer, Heidelberg (2004)
Martens, W., Neven, F., Schwentick, T., Bex, G.J.: Expressiveness and complexity of XML schema. ACM Trans. Database Systems 31(3) (to appear, 2006)
Martens, W., Niehren, J.: Minimizing tree automata for unranked trees. In: Bierman, G., Koch, C. (eds.) DBPL 2005. LNCS, vol. 3774, pp. 232–246. Springer, Heidelberg (2005)
Mayer, A.J., Stockmeyer, L.J.: Word problems — this time with interleaving. Information and Computation 115(2), 293–311 (1994)
Murata, M., Lee, D., Mani, M., Kawaguchi, K.: Taxonomy of XML schema languages using formal language theory. ACM Trans. Inter. Tech. 5(4), 1–45 (2005)
Neven, F., Schwentick, T.: XPath containment in the presence of disjunction, DTDs, and variables. Logical Methods in Computer Science (to appear, 2006)
Papakonstantinou, Y., Vianu, V.: DTD inference for views of XML data. In: PODS 2000, pp. 35–46. ACM Press, New York (2000)
Reuter, F.: An enhanced W3C XML Schema-based language binding for object oriented programming languages (2006) manuscript
Seidl, H.: Deciding equivalence of finite tree automata. SIAM Journal on Computing 19(3), 424–437 (1990)
Seidl, H.: Haskell overloading is DEXPTIME-complete. Information Processing Letters 52(2), 57–60 (1994)
Sperberg-McQueen, C.M.: XML Schema 1.0: A language for document grammars. In: XML 2003 (2003)
Sperberg-McQueen, C.M., Thompson, H.: XML Schema (2005), http://www.w3.org/XML/Schema
Stockmeyer, L.J., Meyer, A.R.: Word problems requiring exponential time: Preliminary report. In: STOC 1973, pp. 1–9. ACM Press, New York (1973)
van der Vlist, E.: XML Schema. O’Reilly, Sebastopol (2002)
van Emde Boas, P.: The convenience of tilings. In: Complexity, Logic and Recursion Theory. Lec. Notes in Pure and App. Math., vol. 187, pp. 331–363 (1997)
Wang, G., Liu, M., Yu, J.X., Sun, B., Yu, G., Lv, J., Lu, H.: Effective schema-based XML query optimization techniques. In: IDEAS 2003, pp. 230–235 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gelade, W., Martens, W., Neven, F. (2006). Optimizing Schema Languages for XML: Numerical Constraints and Interleaving. In: Schwentick, T., Suciu, D. (eds) Database Theory – ICDT 2007. ICDT 2007. Lecture Notes in Computer Science, vol 4353. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11965893_19
Download citation
DOI: https://doi.org/10.1007/11965893_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69269-0
Online ISBN: 978-3-540-69270-6
eBook Packages: Computer ScienceComputer Science (R0)