skip to main content
10.1145/1807085.1807118acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
research-article

Simplifying XML schema: single-type approximations of regular tree languages

Published: 06 June 2010 Publication History

Abstract

XML Schema Definitions (XSDs) can be adequately abstracted by the single-type regular tree languages. It is well-known, that these form a strict subclass of the robust class of regular unranked tree languages. Sadly, in this respect, XSDs are not closed under the basic operations of union and set difference, complicating important tasks in schema integration and evolution. The purpose of this paper is to investigate how the union and difference of two XSDs can be approximated within the framework of single-type regular tree languages. We consider both optimal lower and upper approximations. We also address the more general question of how to approximate an arbitrary regular tree language by an XSD and consider the complexity of associated decision problems.

References

[1]
J. Albert, D. Giammerresi, and D. Wood. Normal form algorithms for extended context free grammars. Theoretical Computer Science, 267(1-2):35--47, 2001.
[2]
D. Barbosa, L. Mignet, and P. Veltri. Studying the XML Web: Gathering statistics from an XML sample. World Wide Web, 8(4):413--438, 2005.
[3]
P. A. Bernstein. Applying model management to classical meta data problems. In Conference on Innovative Data Systems Research (CIDR), 2003.
[4]
G. J. Bex, W. Gelade, W. Martens, and F. Neven. Simplifying XML Schema: e ffortless handling of nondeterministic regular expressions. In SIGMOD, pages 731--744, 2009.
[5]
G. J. Bex, F. Neven, and S. Vansummeren. Inferring XML Schema Definitions from XML data. In International Conference on Very Large Data Bases (VLDB), pages 998--1009, 2007.
[6]
Mikolaj Bojanczyk. Forest expressions. In Jacques Duparc and Thomas A. Henzinger, editors, CSL, volume 4646 of Lecture Notes in Computer Science, pages 146--160. Springer, 2007.
[7]
A. Bruggemann-Klein, M. Murata, and D. Wood. Regular tree and regular hedge languages over unranked alphabets: Version 1, april 3, 2001. Technical Report HKUST-TCSC-2001-0, The Hong kong University of Science and Technology, 2001.
[8]
K. Ciesielski. Set Theory for the Working Mathematician. Cambridge University Press, 1997.
[9]
J. Clark and M. Murata. Relax NG specification. http://www.relaxng.org/spec-20011203.html, December 2001.
[10]
S. Gao, C. M. Sperberg-McQueen, H.S. Thompson, N. Mendelsohn, D. Beech, and M. Maloney. W3C XML Schema Definition Language (XSD) 1.1 Part 1: Structures. W3C, April 2009.
[11]
W. Gelade and F. Neven. Succinctness of the complement and intersection of regular expressions. In Annual Symposium on Theoretical Aspects of Computer Science (STACS), pages 325--336, 2008.
[12]
J.E. Hopcroft and J.D. Ullman. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, 1979.
[13]
W. Martens, F. Neven, and T. Schwentick. Simple off the shelf abstractions for XML Schema. Sigmod RECORD, 36(3):15--22, 2007.
[14]
W. Martens, F. Neven, and T. Schwentick. Complexity of decision problems for XML schemas and chain regular expressions. Siam Journal on Computing, 39(4):1486--1530, 2009.
[15]
W. Martens, F. Neven, T. Schwentick, and G.J. Bex. Expressiveness and complexity of XML Schema. ACM Transactions on Database Systems, 31(3):770--813, 2006.
[16]
W. Martens and J. Niehren. On the minimization of xml schemas and tree automata for unranked trees. Journal of Computer and System Sciences, 73(4):550--583, 2007.
[17]
M. Murata, D. Lee, M. Mani, and K. Kawaguchi. Taxonomy of XML schema languages using formal language theory. ACM Transactions on Internet Technology, 5(4):660--704, 2005.
[18]
Y. Papakonstantinou and V. Vianu. DTD inference for views of XML data. In International Symposium on Principles of Database Systems (PODS), pages 35--46, 2000.
[19]
A. Sahuguet. Everything you ever wanted to know about DTDs, but were afraid to ask. In International Workshop on the Web and Databases (WebDB), pages 69--74, 2000.
[20]
H. Seidl. Deciding equivalence of finite tree automata. Siam Journal on Computing, 19(3):424--437, 1990.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PODS '10: Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
June 2010
350 pages
ISBN:9781450300339
DOI:10.1145/1807085
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. approximation
  2. complexity
  3. xml
  4. xml schema

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '10
Sponsor:
SIGMOD/PODS '10: International Conference on Management of Data
June 6 - 11, 2010
Indiana, Indianapolis, USA

Acceptance Rates

PODS '10 Paper Acceptance Rate 27 of 113 submissions, 24%;
Overall Acceptance Rate 642 of 2,707 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2015)Deciding Twig-definability of Node Selecting Tree AutomataTheory of Computing Systems10.1007/s00224-015-9623-757:4(967-1007)Online publication date: 1-Nov-2015
  • (2015)Optimal Probabilistic Generation of XML DocumentsTheory of Computing Systems10.1007/s00224-014-9581-557:4(806-842)Online publication date: 1-Nov-2015
  • (2013)Validity of Tree Pattern Queries with Respect to Schema InformationMathematical Foundations of Computer Science 201310.1007/978-3-642-40313-2_17(171-182)Online publication date: 2013
  • (2013)Conservative Type Extensions for XML DataTransactions on Large-Scale Data- and Knowledge-Centered Systems IX10.1007/978-3-642-40069-8_4(65-94)Online publication date: 2013

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media