Model checking XSL transformations
Introduction
The eXtensible Markup Language (XML) [1] is a flexible tagged text format derived from the Standard Generalised Markup Language (SGML) [2], proposed in 1996 by the W3C Consortium [3] as a tool to standardise the format of all the documents used on the Web and to meet the challenges of large-scale electronic publishing. The XML language is extremely simple and versatile: the information is organised in a tree-structured fashion, which is both human readable and machine processable.
This W3C proposal had immediate success, becoming a de facto standard, and it is now being widely adopted for the description and the manipulation of data. Indeed, XML documents represent data structures in a clear, cross-platform and open format, which is ideal for the Web. This trend has been further pushed by the fact that applications can easily manipulate XML-structured data using specialised tools and languages like the XSL Transformations (XSLT [4]).
XSLT is a key technology for accessing XML data: indeed, by using XSLT it is possible to
- •
Perform complex data manipulation that would traditionally require server side programming.
- •
Transform XML data in any text or XML based format.
- •
Achieve a strong separation between the data and its representation, which could be otherwise obtained only using methodologies like the HTML templates [5], [6].
These features make XSLT suitable to many scenarios, like data manipulation and presentation in XML databases [7], [8], [9], [10], [11], [12], publishing of XML data through XHTML [13] or XSL-FO [14], [15] and conversion between XML data structures [7], [16], [17], [18]. Moreover, the introduction of XML-based standards in many research communities led to the adoption of XSLT for data manipulation tasks that were previously performed using traditional programming languages: for example, the XMI [19] standard made XSLT transformations a powerful tool in the software engineering field [20], [21], [22].
More recently, in XML databases, new languages like XQuery [23] are being used to perform XML data manipulation and transformation. XQuery and XSLT have the same expressive power, so both formalisms can be used alternatively [24].
XSLT is a powerful and widely used XML technology, and this motivates the adoption of the highest quality standards in the design and development of XSL transformations. This, in turn, implies big efforts to optimise the XSLT processing performances [25] and to check the correctness of XSLT transformations, i.e., verify that the transformation output has the expected format and properties.
In general, certifying the correctness of a system requires some kind of formal proof. Indeed, manually checking a system against a set of test cases, as it is usually done in the common programming practice, does not ensure that it would perform as expected in any situation. On the other hand, a formal proof guarantees that the system will always behave as specified.
The formal checking of XSLT has been already addressed in the literature: most of the current works (see Section 8) validate XSL transformations using static approaches like type checking or flow analysis, where the code is statically analysed without executing it. In the software/hardware programming practice, such kinds of approaches are always the best choice, when applicable, since they are usually very fast and effective: for instance, strictly typed programming languages prevent the user from making a lot of common mistakes.
However, often the code complexity makes such techniques not suitable for the complete formal validation of an application׳s behaviour. Indeed, when applied on complex code structures, the static source analysis often requires too much resources or turns out to be very approximated. These considerations are also valid for XSLT validation, due to the complexity of the XSLT/XPath constructs. Indeed, XSLT is Turing complete [26], so we know that its static validation is mathematically undecidable.
An answer to the static validation problems above comes from model checking: if we cannot certify that a given code is correct by statically reasoning on its structure, we can run (a suitably simplified model of) it and dynamically analyse its behaviour.
In particular, model checking techniques are used to exhaustively verify a system by automatically checking all its possible states against a set of user-defined assertions. The result of this verification is a certification that has the same value of a formal proof of correctness.
Indeed, model checking is being widely adopted in the development of critical systems, like embedded hardware [27], as well as in the verification of code written in statically typed programming languages, such as Java [28], [29], [30]. In other words, an effective framework that results in complete, in-depth code verification requires the integration of different, static and dynamic techniques, such as type checking and model checking.
To this aim, in this paper we show how model checking can be applied to XSLT validation. In particular, we describe the design and implementation of a complete model checking based XSLT verification process. The core of this process is the XSLToMurphi algorithm, which is able to translate XSLT stylesheets into models that can be verified through the CMurphi tool [31].
By exploiting techniques designed to handle large-scale software systems, our model checking approach can better deal with the complexity of the XSLT formalism, thus achieving a more precise and complete validation while keeping the whole process under reasonable complexity limitations.
Moreover, our approach is also able to check the transformation robustness, i.e., verify if it handles correctly (small) deviances from the expected input structure, and generate detailed descriptions of the input sequences that lead to transformation errors, to possibly detect and block them at runtime.
Finally, to make our approach suitable to real-world development processes, we designed it to be completely automatic and accessible to non-experts, too. Indeed, the underlying methodology is completely hidden to the user, and the validation outcome is either a success message or a set of clear and detailed “debugger-style” error messages that can be effectively used to locate and fix the transformation errors.
The paper is organised as follows. Section 2 gives a brief introduction to the main elements of XSLT and their semantics, whereas Section 3 introduces the model checking techniques and, in particular, the CMurphi model checker. Section 4 describes how model checking can be adapted to validate XSLT stylesheets and Section 5 shows the XSLToMurphi algorithm that implements the core of XSLT model checking. Section 6 shows our XSLT model checking process working on a case study, whereas Section 7 illustrates and discusses the tool experimentation. Finally, Section 8 offers an overview on the XSLT validation works in the current literature and Section 9 outlines the paper conclusions and illustrates our planned future work on XSLToMurphi.
Section snippets
The eXtensible Stylesheet Language Transformations
The eXtensible Stylesheet Language Transformations (XSLT) is a W3C recommendation [4] and, in general, is used to transform an XML document into another XML document. This is done by defining transformation rules that describe the XML markup to be generated when certain elements are found in the transformation input. XSLT not only has a functional flavour, but also contains many common programming constructs like loops, conditional expressions and parametric function calls. All these statements
Model checking techniques
Generally speaking, model checking [34], [35], [36], [37], [38], [39] can be defined as the formal process of verifying the validity of a set of assertions on the (formal) model of a (software) system. Model checking is applied in a scenario like the one depicted in Fig. 2.
Given a software or a hardware system, we derive from it a system model, i.e., a suitable abstraction of the system functionalities that we want to verify. This is the most critical step of the whole process: indeed, the
Model checking applied to XSLT
In the following section of the paper we will illustrate a technique and a tool that allow to perform model checking on XSL transformations.
Model checking is usually applied to verify that a system always behaves as specified. In our case, the system is coded in the XSLT language. We may also have a formal description of the input the transformation is supposed to work on, given by a schema that defines the class of valid input documents. In this situation, we may want to embed the schema
The XSLToMurphi algorithm
Once we have defined a suitable way to model an XSL transformation, we can describe the algorithm that, given an XSLT stylesheet, creates the corresponding abstraction and formally encodes it in order to apply model checking. Moreover, we should define a methodology to create a set of properties (to be verified) from the output constraints illustrated in Section 4.2. Since our target verifier is CMurphi, the stylesheet and constraints encoding should produce a valid program written in the
The verification process
In this section we show the complete verification process involving the XSLToMurphi algorithm and the CMurphi verifier. To this aim, we reuse the running example of Section 5, namely the transformation in Fig. 1 and the XML Schema shown in Fig. 4.
To perform the verification, the user simply runs a shell script provided with XSLToMurphi, having set the appropriate environment variables to point to the CMurphi and XSLToMurphi distribution directories. Of course both tools must have been
Experimentation
The current prototype of the XSLToMurphi tool has been first tested using several “malicious” ad hoc created stylesheets, as the running example of this paper (Fig. 1), to stress the most complex XSLT aspects and embed both typical and hard-to-catch output errors. The output languages for these stylesheets were XHTML, WSDL [45], SVG [46] as well as several custom XML languages. These experiments were mainly used to debug the automatic XSLT modelling and abstraction algorithm shown in Section 5.1
Related work
In this section we give some related work in the field of XSLT validation. For information about the model checking techniques, the reader may refer to Section 3.
The validation of XSL transformations is addressed by several works in the literature. This task is generally considered very complex: indeed (see [47], [48]), as non-trivial subsets of XSLT are addressed, the validation problem quickly becomes EXPTIME-hard.
The world of model checking and the one of XML are actually already in touch.
Conclusions
The benefits deriving from XSLT validation are widely testified by the number of research works and tools in this field (see Section 8). In this paper we described a new technique to validate XSL transformations by means of a standard technology, i.e., model checking. In particular, our work focused on the full automatisation of the XSLT model checking process through an algorithm called XSLToMurphi, which completely hides the complexity of the model checking technology to the user.
Indeed, we
References (70)
- et al.
Transforming XSLT stylesheets into XQuery expressions and vice versa
Comput Lang Syst Struct
(2011) - et al.
Symbolic model checking: 1020 states and beyond
Inf Comput
(1992) - W3C. Extensible markup language (XML), 〈http://www.w3.org/XML〉;...
- ISO. Information processing—text and office systems—standard generalized markup language (SGML, ISO 8879:1986),...
- W3C. World Wide Web consortium website, 〈http://www.w3.org〉;...
- W3C. XSL transformations version 1.0. W3C recommendation, 〈http://www.w3.org/TR/xslt〉;...
- Parr TJ. Enforcing strict model-view separation in template engines. In: WWW ׳04: Proceedings of the 13th international...
- Mohr M. Smarty template engine, 〈http://www.smarty.net/〉;...
- Bailey J. Transformation and reaction rules for data on the web. In: ADC ׳05: Proceedings of the 16th Australasian...
- Groppe S, Buttcher S. Xpath query transformation based on XSLT stylesheets. In: WIDM ׳03: Proceedings of the fifth ACM...