Skip to main content

Patterns and Types for Querying XML Documents

  • Conference paper
Database and XML Technologies (XSym 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3671))

Included in the following conference series:

Abstract

In order to manipulate XML data, a programming or query language should provide some primitives to deconstruct them, in particular to pinpoint and capture some subparts of the data.

Among various proposals for primitives for deconstructing XML data, two different and complementary approaches seem to clearly stem from practise: path expressions (usually XPath paths [7], but also the “dot” navigation of Cω [3]) and regular expression patterns [13].

Path expressions are navigational primitives that point out where to capture data substructures. They (and those of Cω, in particular) closely resemble the homonymous primitives used by OQL [9] in the contexts of OODB query languages with the difference that instead of sets of objects they return sets or sequences of elements: more precisely all elements that can be reached following the path at issue. These primitives are at the basis of standard languages such as XSLT [8] or XQuery [4].

More recently, a new kind of deconstructing primitives was proposed, regular expression patterns [13], which extend by regular expressions the pattern matching primitive as popularised by functional languages such as ML and Haskell. Regular expression patterns were first introduced in the XDuce [12] programming language and are becoming more and more popular, since they are being adopted by such quite different languages as ℂDuce [1] (a general purpose extension of the XDuce language) and its query language ℂQL [2], Xtatic [10] (an extension of C#), Scala [15] (a general purpose Java-like object-oriented language that compiles into Java bytecode), XHaskell [14] as well as the extension of Haskell proposed in [5].

The two kinds of primitives are not antagonists, but rather orthogonal and complementary. Path expressions implement a “vertical” exploration of data as they capture elements that may be at different depths, while patterns perform a “horizontal” exploration of data since they are able to perform finer grained decomposition on sequences of elements. The two kinds of primitives are quite useful and they mutually complement nicely. Therefore, it would seem natural to integrate both of them in a query or programming language for XML. Despite of that, we are aware of just two works in which both primitives are embedded (and, yet, loosely coupled): in ℂQL it is possible to write select-from-where expressions, where regular expression patterns are applied in the from clause to sequences that are returned by XPath-like expressions; Gapeyev and Pierce [11] show how it is possible to use regular expression patterns with an all match semantics to encode a subset of XPath and plan to use this encoding to add XPath to the Xtatic programming language.

The reason for the lack of study of the integration of these two primitives may be due to the fact that each of them is adopted by a different community: regular patterns are almost confined to the programming language community while XPath expressions are pervasive in the database community.

The goal of this lecture is to give a brief presentation of the regular pattern expressions style together with the type system to which they are tightly connected, that is the semantic subtyping based type systems [6]. We are not promoting the use of these to the detriment of path expressions, since we think that the two approaches should be integrated in the same language and we see in that a great opportunity of collaboration between the database and the programming languages communities. Since the author belongs to latter, this lecture tries to describe the pattern approach addressing some points that should be of interest to the database community as well. In particular, after a general overview of regular expression patterns and types in which we show how to embed patterns in a select_from_where expression, we discuss several usages of these patterns/types, going from the classic use for partial correctness and schema specification to the definition of new data iterators, from the specification of efficient run-time to the definition of logical pattern-specific query optimisations.

Joint talk with DBPL 2005. Full version available in the Proc. of the 10th Intl. Symp. on Database Programming Languages, G. Bierman and C. Koch eds., LNCS, Springer, 2005.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Benzaken, V., Castagna, G., Frisch, A.: CDuce: an XML-friendly general purpose language. In: ICFP 2003, 8th ACM International Conference on Functional Programming, Uppsala, Sweden, pp. 51–63. ACM Press, New York (2003)

    Chapter  Google Scholar 

  2. Benzaken, V., Castagna, G., Miachon, C.: A full pattern-based paradigm for XML query processing. In: Hermenegildo, M.V., Cabeza, D. (eds.) PADL 2004. LNCS, vol. 3350, pp. 235–252. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  3. Bierman, G., Meijer, E., Schulte, W.: The essence of data access in Cw. In: Black, A.P. (ed.) ECOOP 2005. LNCS, vol. 3586, pp. 287–311. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  4. Boag, S., Chamberlin, D., Fernandez, M., Florescu, D., Robie, J., Siméon, J., Stefanescu, M.: XQuery 1.0: An XML Query Language. W3C Working Draft (May 2003), http://www.w3.org/TR/xquery/

  5. Broberg, N., Farre, A., Svenningsson, J.: Regular expression patterns. In: ICFP 2004: Proceedings of the ninth ACM SIGPLAN international conference on Functional programming, New York, NY, USA, pp. 67–78. ACM Press, New York (2004)

    Chapter  Google Scholar 

  6. Castagna, G., Frisch, A.: A gentle introduction to semantic subtyping. In: Caires, L., Italiano, G.F., Monteiro, L., Palamidessi, C., Yung, M. (eds.) ICALP 2005. LNCS, vol. 3580, pp. 30–34. Springer, Heidelberg (2005); Joint ICALP-PPDP keynote talk

    Chapter  Google Scholar 

  7. Clark, J., DeRose, S.: XML Path Language (XPath). W3C Recommendation (November 1999), http://www.w3.org/TR/xpath/

  8. Clark, J.: XSL Transformations (XSLT). W3C Recommendation (November 1999), http://www.w3.org/TR/xslt/

  9. Cluet, S.: Designing OQL: allowing objects to be queried. Inf. Syst. 23(5), 279–305 (1998)

    Article  Google Scholar 

  10. Gapeyev, V., Pierce, B.C.: Regular object types. In: Cardelli, L. (ed.) ECOOP 2003. LNCS, vol. 2743, Springer, Heidelberg (2003)

    Google Scholar 

  11. Gapeyev, V., Pierce, B.C.: Paths into patterns. Technical Report MS-CIS- 04-25, University of Pennsylvania (October 2004)

    Google Scholar 

  12. Hosoya, H., Pierce, B.C.: XDuce: A typed XML processing language. In: Suciu, D., Vossen, G. (eds.) WebDB 2000. LNCS, vol. 1997, p. 226. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  13. Hosoya, H., Pierce, B.C.: Regular expression pattern matching for XML. In: The 25th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (2001)

    Google Scholar 

  14. Zhuo Ming Lu, K., Sulzmann, M.: An implementation of subtyping among regular expression types. In: Chin, W.-N. (ed.) APLAS 2004. LNCS, vol. 3302, pp. 57–73. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  15. Odersky, M., Altherr, P., Cremet, V., Emir, B., Maneth, S., Micheloud, S., Mihaylov, N., Schinz, M., Stenman, E., Zenger, M.: An overview of the scala programming language. Technical Report IC/2004/64, École Polytechnique Fédérale de Lausanne (2004), Latest version at http://scala.epfl.ch

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Castagna, G. (2005). Patterns and Types for Querying XML Documents. In: Bressan, S., et al. Database and XML Technologies. XSym 2005. Lecture Notes in Computer Science, vol 3671. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11547273_1

Download citation

  • DOI: https://doi.org/10.1007/11547273_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28583-0

  • Online ISBN: 978-3-540-31968-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics