Filtering unsatisfiable XPath queries

https://doi.org/10.1016/j.datak.2007.06.018Get rights and content

Abstract

The satisfiability test checks, whether or not the evaluation of a query returns the empty set for any input document, and can be used in query optimization for avoiding the submission and the computation of unsatisfiable queries. Thus, applying the satisfiability test before executing a query can save processing time and query costs. We focus on the satisfiability problem for queries formulated in the XML query language XPath, and propose a schema-based approach to the satisfiability test of XPath queries, which checks whether or not an XPath query conforms to the constraints in a given schema. If an XPath query does not conform to the constraints given in the schema, the evaluation of the query will return an empty result for any valid XML document. Thus, the XPath query is unsatisfiable. We present a complexity analysis of our approach, which proves that our approach is efficient for typical cases. We present an experimental analysis of our developed prototype, which shows the optimization potential of avoiding the evaluation of unsatisfiable queries.

Introduction

XPath (see [29], [30]) is either a standalone XML query language or is embedded in other XML languages (e.g. XSLT, XQuery, XLink and XPointer) for specifying node sets in XML documents. An important issue in XPath evaluation is the satisfiability problem of XPath queries. An XPath query Q is unsatisfiable, if the evaluation of Q on any XML document returns every time an empty result. Therefore, the satisfiability test of XPath queries plays a critical role in query optimization. The application of the satisfiability test can avoid the submission and the unnecessary evaluation of unsatisfiable queries, and thus saving processing time and query cost. As well as for query optimization, the XPath satisfiablity test is also important for consistency problems, e.g. XML access control [6] and type-checking of transformations [19]. Therefore, many research efforts focus on the satisfiability test of XPath queries with or without respect to schemas, e.g. [2], [10], [11], [14], [17], [18].

In the absence of schemas, the satisfiability test can detect that the structure properties of XPath queries are inconsistent with the XML data model (e.g. [14]). For example, the XPath query Q1 = /parent::a is unsatisfiable, because the document node has no parent node according to the XML data model. The query Q2 = //regions/america is tested as a satisfiable XPath query without respect to a schema. However, according to a given schema, e.g. the schema given in [8], the element regions can have children, which are called namerica and samerica, but does not have children with name america. Therefore, Q2 is unsatisfiable with respect to the given schema. Thus, we can detect more errors in XPath queries if we additionally consider schema information. Therefore, we focus on the satisfiability test of XPath in the presence of schemas.

The most widely used schema languages are XML Schema (see [31], [32]) and DTD (see [28]). In this paper, we focus on XML Schema for the definition of schemas. As well as imposing the constraints of the structure and semantics on XML documents as DTDs do, the XML Schema language provides powerful capabilities for specifying data types on elements and attributes, most of which are not expressible in DTDs. The XML Schema language provides a large number of built in simple types and allows deriving new types for the values of elements and attributes, which are only specified to be character data in DTDs. Thus, if the types of values of elements or attributes in an XPath query do not conform to constraints specified in the XML Schema definition, the XPath query selects an empty set of nodes for any valid XML document. For example, the query meeting[@date = ˈ01-05-06ˈ] does not retrieve anything if the type of the attribute date is declared to have the format DD-MM-YYYY. Therefore, the powerful data-typing facilities supported by XML Schema provide another dimension for the satisfiability test of XPath queries. Since XML Schema can express more restrictions than a DTD, a DTD can be easily transformed into an XML Schema representation, but in general, an XML Schema definition cannot be transformed into a DTD without loosing information. To the best of our knowledge, existing work only deals with DTDs except our previous contributions (see [10], [11]).

Our schema-based approach checks whether or not an XPath query Q conforms to the structure, semantics, data type and occurrence constraints in a given XML schema definition S by evaluating Q on S rather than the instance documents of S. If Q does not conform to the constraints of S, Q cannot be evaluated completely on S, and thus Q is unsatisfiable. For schemas, our approach supports recursive as well as non-recursive schemas, considers a significant part of the XML Schema language and allows arbitrary nesting and references of model groups. For XPath, our approach allows all XPath axes and negation operations in predicates. The satisfiability test for the XPath subset supported by our approach in the presence of the schemas supported by our approach is undecidable (see [2]). Therefore, we present an incomplete, but fast satisfiability tester, i.e. if our tester returns unsatisfiable, then we are sure that the XPath query is unsatisfiable, but if our tester returns maybe satisfiable, then the XPath query may be satisfiable or may be unsatisfiable. Note that we do not loose correctness in the proposed application scenarios of our satisfiability tester when using an incomplete tester.

This paper is an extended version of [10], [11]. We extend the contributions of [10], [11] by significantly extending the supported subset (see Section 2.2) of the XML Schema language, allowing various content models of elements and arbitrary nesting of model groups; by supporting the type-checking of values of elements and attributes (see Section 4.5) and the checking of occurrence constraints (see Section 4.6); by integrating all new contributions into the prototype of [11] and by additional experiments (see Section 6).

The rest of the paper is organized as follows: Section 2 describes the supported subsets of XPath and XML Schema. Section 3 develops a data model for XML Schema. This data model for XML Schema is the basis for our XPath–XSchema evaluator (see Section 4), which evaluates XPath queries on XML Schema definitions in order to compute the schema paths of the queries. Section 4 also includes a complexity analysis of the approach. Section 5 discusses the satisfiability test of XPath based on the schema paths. We present a comprehensive performance analysis in Section 6. Section 7 deals with further related work. We end up with the summary and conclusions in Section 8.

Section snippets

XPath and XML Schema

In this section, we present the subset of the XPath language and the subset of XML Schema language supported in this work.

Data model for the XML Schema language

Based-on the data model for the XML language given by Waddler [24], we develop a data model for XML Schema for identifying the navigation paths of XPath queries on an XML Schema definition.

XPath–XSchema evaluator

A common XPath evaluator is typically constructed to evaluate XPath queries on XML instance documents. Our approach evaluates XPath queries on XML Schema definitions rather than on the instance documents of schemas in order to test the satisfiability of XPath with respect to schemas. Therefore, we name our XPath evaluator XPath–XSchema evaluator.

Satisfiability tester

Definition 10 Satisfiability of XPath queries

A given XPath query Q is satisfiable according to a given XML Schema definition XSD, if there exists an XML document D, which is valid according to XSD, and the evaluation of Q on D returns a non-empty result. Otherwise, Q is unsatisfiable according to XSD.

Proposition 1 Unsatisfiable XPath queries

If the evaluation of an XPath query Q on a given XML Schema definition XSD by the XPath–XSchema evaluator generates an empty set of schema paths, then Q is unsatisfiable according to XSD.

Proof

The XPath–XSchema evaluator is constructed in such a

Performance analysis

We have implemented a prototype of our approach in order to verify the correctness of our approach and to demonstrate the optimization potential for avoiding the evaluation of unsatisfiable XPath queries. The performance analysis focuses on the detection of unsatisfiable XPath queries by our approach and the evaluation of these unsatisfiable queries by common XPath evaluators. We also study the overhead of evaluating satisfiable XPath queries by our approach, where we compare the time of

Further related work

Many research efforts are dedicated to the satisfiability problem of XPath queries. Benedikt et al. [2] theoretically studies the complexity problem of XPath satisfiability in the presence of DTDs, and shows that the complexity of XPath satisfiability depends on the considered subsets of XPath queries and DTDs. We present a practical algorithm for testing the satisfiability of XPath queries. Hidders [14] investigates the problem of XPath satisfiability in the absence of schemas. Lakshmanan et

Summary and conclusions

We have proposed a data model for the XML Schema language, which identifies the navigation paths of XPath queries on XML Schema definitions. Based on the data model, we have developed an XPath–XSchema evaluator, which evaluates XPath queries on an XML Schema definition in order to check whether or not the queries conform to the constraints imposed by the schema definition, where we also consider the powerful data typing capabilities of XML Schema. When an XPath query does not conform to the

Jinghua Groppe earned her Bachelor degree in Computer Science and Applications from the Beijing Polytechnic University in 1989 and her Master degree in Computer Science from the University of Amsterdam in 2001. She worked as Software Engineer in the Chinese Academy of Launch Vehicle Technology/China Aerospace Corporation from 1989 to 1999. She was Scientific Employee in the Department of Computer Science/University of Paderborn from 2001 to 2005 and in the Institute of Computer

References (32)

  • S. Amer-Uahis, S. Cho, L.K.S. Laksmanan, D. Srivastava, Mininization of tree pattern queries, in: SIGMOD...
  • M. Benedikt, W. Fan, F. Geerts, XPath Satisfiability in the presence of DTDs, in: PODS...
  • M. Benedikt, W. Fan, G.M. Kuper, Structural properties of XPath fragments, in: ICDT...
  • N. Bruno, N. Koudas, D. Srivastava, Holistic twig joins: optimal XML pattern matching, in: SIGMOD...
  • C.Y. Chan, W. Fan, Y. Zeng, Taming XPath Queries by Minimizing Wildcard Steps, in: VLDB...
  • W. Fan, C. Chan, M. Garofalakis, Secure XML querying with security views, in: SIGMOD...
  • X. Franc, Qizx/open version 0.4p1, http://www.xfra.net/qizxopen/,...
  • M. Franceschet, XPathMark – An XPath benchmark for XMark. Research Report PP-2005-04, University of Amsterdam, the...
  • S. Groppe

    XML Query Reformulation for XPath, XSLT and XQuery

    (2005)
  • J. Groppe, S. Groppe, Filtering Unsatisfiabile XPath Queries, in: ICEIS...
  • J. Groppe, S. Groppe, A prototype of a schema-based XPath satisfiability tester, in: DEXA...
  • S. Groppe, S. Böttcher, J. Groppe, XPath Query simplification with regard to the elimination of intersect and except...
  • G. Gottlob, C. Koch, R. Pichler, Efficient algorithms for processing XPath queries, in: VLDB...
  • J. Hidders, Satisfiability of XPath expressions, DBPL 2003, LNCS, vol. 2921, pp....
  • H. Jiang, W. Wang, H. Lu, J.X. Yu, 2003. Holistic twig joins on indexed XML documents, in: VLDB...
  • M.H. Kay, Saxon – The XSLT and XQuery Processor, http://saxon.sourceforge.net,...
  • Cited by (13)

    • Securing native XML database-driven web applications from XQuery injection vulnerabilities

      2016, Journal of Systems and Software
      Citation Excerpt :

      New unknown attacks cannot be detected using the signature-based approach, even if they have only small variations from a known payload (Rosa et al., 2013). (c) Schema-based approach: Groppe and Groppe (2008) identified XPath queries that do not satisfy the constraints defined in the schema, and forbid the queries from being executed for preventing attacks. Lampesberger (2013) detected anomalies in XML documents from grammatical-inference of the documents by constructing a visibly pushdown automaton.

    • On the feasibility of using conceptual modeling constructs for the design and analysis of XML data

      2012, Data and Knowledge Engineering
      Citation Excerpt :

      The user further validates this preliminary schema in the validation phase to generate the definitive conceptual schema. Rule-based translation of XML specification to XML Schema using XSLT stylesheets [13]has also been developed [14,15]. In these techniques, logical models can be used to define XML Schema, and stylesheets can be used to dynamically determine the output schemas of the stylesheet specifications.

    • Declarative Debugging of XML Queries

      2021, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    • Detecting unsatisfiable pattern queries under shape expression schema

      2020, WEBIST 2020 - Proceedings of the 16th International Conference on Web Information Systems and Technologies
    • Automatic property-based testing and path validation of XQuery programs

      2017, Software Testing Verification and Reliability
    • Automatic validation of XQuery programs

      2015, 17th International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2015 - Proceedings
    View all citing articles on Scopus

    Jinghua Groppe earned her Bachelor degree in Computer Science and Applications from the Beijing Polytechnic University in 1989 and her Master degree in Computer Science from the University of Amsterdam in 2001. She worked as Software Engineer in the Chinese Academy of Launch Vehicle Technology/China Aerospace Corporation from 1989 to 1999. She was Scientific Employee in the Department of Computer Science/University of Paderborn from 2001 to 2005 and in the Institute of Computer Science/University of Innsbruck from 2005 to 2006. She is currently working on her Doctorate thesis. She worked in the projects EUQOS, UBISEC, E-Colleg, VHE and ASG. All projects were funded by the European Union. Her research interests include XML and semi-structured data, satisfiability tester, containment tester, profiles, Semantic Web, caching and mobile devices.

    Sven Groppe earned his diploma degree in Informatik (Computer Science) from the University of Paderborn in 2002 and his Doctor degree from the University of Paderborn in 2005. From 2005 to 2007, he worked as postdoc in the University of Innsbruck. He is currently working as postdoc in the University of Lübeck. In 2001/2002, he worked in the project B2B-ECOM, which dealt with distributed internet market places for the electrical industry. From 2002 to 2004, he worked in the project MEMPHIS in the area of premium services. From 2005 to 2006, he worked in the projects ASG and TripCom in the areas of Semantic Web Services. All projects were funded by the European Union. His research interests include XML and semi-structured data, query reformulation, data integration in heterogenous environments, Semantic Web, SPARQL, distributed systems, electronic market places, web services and mobile devices.

    View full text