Abstract
The XQuery language was initially developed as an SQL equivalent for XML data, but its roots in functional programming make it also a perfect choice for processing almost any kind of structured and semi-structured data. Apart from standard XML processing, however, advanced language features make it hard to efficiently implement the complete language for large data volumes. This work proposes a novel compilation strategy that provides both flexibility and efficiency to unleash XQuery’s potential as data programming language. It combines the simplicity and versatility of a storage-independent data abstraction with the scalability advantages of set-oriented processing. Expensive iterative sections in a query are unrolled to a pipeline of relational-style operators, which is open for optimized join processing, index use, and parallelization. The remaining aspects of the language are processed in a standard fashion, yet can be compiled anytime to more efficient native operations of the actual runtime environment. This hybrid compilation mechanism yields an efficient and highly flexible query engine that is able to drive any computation from simple XML transformation to complex data analysis, even on non-XML data. Experiments with our prototype and state-of-the-art competitors in classic XML query processing and business analytics over relational data attest the generality and efficiency of the design.
Similar content being viewed by others
Notes
New in XQuery 3.0: http://www.w3.org/TR/xquery-30/
Note, we explicitly distinguish between iterators to iterate over the items of an XQuery sequence, and cursors to iterate over streams of context tuples.
The aggregation specification for non-grouping variables of a GroupBy operator is represented as comma-separated list of XQuery-like expressions in the subscript. The specification $c:($c), for example, says that variable $c is aggregated using the sequence constructor (). The asterisk serves as wildcard for specifying a default aggregation expression.
Source code available at http://brackit.org
References
Draper D, Dyck M, Fankhauser P, Fernández MF, Malhotra A, Rose K, Rys M, Siméon J, Wadler P (2010) XQuery 1.0 and XPath 2.0 Formal semantics (2nd Edition)–W3C recommendation 14 December 2010. http://www.w3.org/TR/xquery-semantics/. Accessed 12 May 2014
Bamford R, Borkar VR, Brantner M, Fischer PM, Florescu D, Graf DA, Kossmann D, Kraska T, Muresan D, Nasoi S, Zacharioudaki M (2009) XQuery reloaded. PVLDB 2(2):1342–1353
Kay M (2008) Ten reasons Wwhy Saxon XQuery is fast. IEEE Data Eng Bull 31(4):65–74
Meier W (2003) eXist: An open source native XML database. WWsDS 2593:169–183. http://link.springer.com/chapter/10.1007%2F3-540-36560-5_13
Grün C (2010) Storing and querying large XML instances. Dissertation, University of Konstanz
Mathis M (2009) Storing, indexing, and querying XML documents in native database management systems. Dissertation, TU Kaiserslautern
May N, Helmer S, Moerkotte G (2004) Nested queries and quantifiers in an ordered context. ICDE 239-250
Grust T, Rittinger J, Teubner J (2008) Pathfinder: XQuery off the relational shelf. IEEE Data Eng Bull 31(4):7–14
Robie J, Chamberlin D, Dyck M, Snelson J (2010) XQuery 3.0: An XML query language – W3C working draft 14 December 2010. http://www.w3.org/TR/xquery-30/. Accessed 12 May 2014
Graefe G (1993) Query evaluation techniques for large databases. ACM Comput Surv 25(2):73–170
Weiner AM (2011) Advanced cardinality estimation in the XML query graph model. BTW 180:207–226. http://www.bibsonomy.org/bibtex/2de252266ade0e6d8c56ddcd5bfdf5729/dblp
Bruno N, Koudas N, Srivastava D (2002) Holistic twig joins: optimal XML pattern matching. SIGMOD 310-321
Abelson H, Sussman GJ, Sussman J (1985) Structure and interpretation of computer programs. MIT Press Cambridge, MA, USA. http://dl.acm.org/citation.cfm?id=26777
Peyton Jones SL, Lester DR (1992) Implementing functional languages: a tutorial. Prentice Hall, New York
Re C, Siméon J, Fernández MF (2006) A complete and efficient algebraic compiler for XQuery. IEEE Computer Society ICDE 14. http://dl.acm.org/citation.cfm?id=1129874
Bächle S (2013) Separating key concerns in query processing – set-orientation, physical data independence, and parallelism. Dissertation, TU Kaiserslautern
Mathis M, Härder T, Schmidt K (2009) Storing and indexing XML documents upside down. CSRD 24(1-2):51–68
Sauer C, Bächle S, Härder T (2013) Versatile XQuery processing in MapReduce. ADBIS 8133:204–217
Boncz PA, Grust T, van Keulen M, Manegold S, Rittinger J, Teubner J (2006) MonetDB/XQuery: a fast XQuery processor powered by a relational engine. SIGMOD 479-490
Grust T, Mayr M, Rittinger J (2009) XQuery join graph isolation: celebrating 30+ years of XQuery processing technology. ICDE 1167-1170
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bächle, S., Sauer, C. Unleashing XQuery for Data-Independent Programming. Datenbank Spektrum 14, 135–150 (2014). https://doi.org/10.1007/s13222-014-0160-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13222-014-0160-3