Storing and indexing XML documents upside down

Mathis, Christian; Härder, Theo; Schmidt, Karsten

doi:10.1007/s00450-009-0056-x

Storing and indexing XML documents upside down

Special Issue Paper
Published: 15 April 2009

Volume 24, pages 51–68, (2009)
Cite this article

Computer Science - Research and Development

Christian Mathis¹,
Theo Härder¹ &
Karsten Schmidt¹

67 Accesses
3 Citations
Explore all metrics

Abstract

XML documents contain substantial redundancy in their structure part, because each path from the root node to a leaf node is explicitly represented and typically large sets of such path instances belong to a path class, i.e., the nodes of the path instances are labeled by the same sequence of element (or attribute) names. To save storage space and I/O cost, we want to get rid of this structural redundancy to the extent possible. While all known methods for the physical representation (storage) of XML documents proceed from the root via the element/attribute hierarchy (internal nodes) down to the leaves (values), we follow an upside-down approach which explicitly stores the values and only reconstructs the internal nodes, if needed. The cornerstones for such a solution are suitable node labels and a path synopsis which efficiently represents all path classes of an XML document. As a solution, we propose a compact internal storage format for native XML database systems where the inner structure of the stored documents is virtualized. Because this elementless storage format provides an efficient reconstruction of a document using its path synopsis, all processing properties are preserved and the semantics of navigational and declarative operations of XML languages remains unchanged. Adjusted indexes support the full spectrum of so-called content-and-structure single path queries. Apart from greatly reduced storage consumption, our approach demonstrates its superiority, compared to competing methods, not only for a substantial fraction of those queries, but also for storing, reconstructing, and navigating XML documents.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

History-Offset Implementation Scheme of XML Documents and Its Evaluations

D2-Index: A Dynamic Index Method for Querying XML and Semi-Structured Data

Querying Compressed XML Data

References

Al-Khalifa S, Jagadish HV, Patel JM, Wu Y, Koudas N, Srivastava D (2002) Structural Joins: A Primitive for Efficient XML Query Pattern Matching. Proc. Int. Conf. on Data Engineering (ICDE), 141–152
Arion A, Bonifati A, Manolescu I, Pugliese A (2008) Path Summaries and Path Partitioning in Modern XML Databases. World Wide Web 11(1):117–151
Article Google Scholar
Beyer KS, Cochrane R, Josifovski V, Kleewein J, Lapis G, Lohman GM, Lyle R, Özcan F, Pirahesh H, Seemann N, Truong TC, Van der Linden B, Vickery B, Zhang C (2005) System RX: One Part Relational, One Part XML, Proc. ACM SIGMOD Conf., 374–358
Bloom BH (1970) Space/time trade-offs in hash coding with allowable errors. Commun ACM 13(7):422–426
Article MATH Google Scholar
Böhme T, Rahm E (2004) Supporting Efficient Streaming and Insertion of XML Data in RDBMS. Proc. 3rd DIWeb Workshop, 70–81
Bruno N, Koudas N, Srivastava D (2002) Holistic Twig Joins: Optimal XML Pattern Matching. Proc. ACM SIGMOD Conf., 310–321
Christophides V, Plexousakis D, Scholl M, Tourtounis S (2003) On Labeling Schemes for the Semantic Web. Proc. 12th Int. WWW Conf., 544–555
Fiebig T, Helmer S, Kanne C-C, Moerkotte G, Neumann J, Schiele R, Westmann T (2003) Natix: A Technology Overview. Lecture Notes in Computer Science 2593:12–33, Springer
Florescu D, Kossmann D (1999) Storing and querying XML data using an RDBMS. IEEE Data Eng Bull 22:27–34
Google Scholar
Georgiadis H, Vassalos V (2007) XPath on Steroids: Exploiting Relational Engines for XPath Performance. Proc. ACM SIGMOD Conf., 317–328
Goldman R, Widom J (1997) DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases. Proc. Int. Conf. on Very Large Data Bases (VLDB), 436–445
Graefe G, Larson P-A (2001) B-Tree Indexes and CPU Caches. Proc. Int. Conf. on Data Engineering (ICDE), 349–358
Grust T, van Keulen M, Teubner J (2003) Staircase Join: Teach a Relational DBMS to Watch its (Axis) Steps. Proc. Int. Conf. on Very Large Data Bases (VLDB), 524–525
Härder T, Haustein MP, Mathis C, Wagner M (2007) Node Labeling Schemes for Dynamic XML Documents Reconsidered. Data Knowl Eng 60(1):126–149
Article Google Scholar
Härder T, Mathis C, Schmidt K (2007) Comparison of Complete and Elementless Native Storage of XML Documents. Proc. Int. Database Engineering and Applications Symposium (IDEAS), 102–113
Haustein MP, Härder T (2007) An efficient infrastructure for native transactional XML processing. Data Knowl Eng 61(3):500–523
Article Google Scholar
Haustein MP, Härder T (2008) Optimizing lock protocols for native XML processing. Data Knowl Eng 65(1):147–173
Google Scholar
Izadi K, Härder T, Haghjoo M (2009) S³: Evaluation of tree-pattern queries supported by structural summaries. Data Knowl Eng 68(1):126–145
Article Google Scholar
Jiang H, Wang W, Lu H, Xu Yu J (2003) Holistic Twig Joins on Indexed XML Documents. Proc. Int. Conf. on Very Large Data Bases (VLDB), 273–284
Kaushik R, Shenoy P, Bohannon P, Gudes E (2002) Exploiting Local Similarity for Indexing Paths in Graph-Structured Data. Proc. Int. Conf. on Data Engineering (ICDE), 129–140
Kaushik R, Krishnamurthy R, Naughton JF, Ramakrishnan R (2004) On the Integration of Structure Indexes and Inverted Lists. Proc. ACM SIGMOD Conf., 779–790
Li H-G, Aghili SA, Agrawal D, El Abbadi A (2006) FLUX: Content and Structure Matching of XPath Queries with Range Predicates. Proc. Int. XML Database Symposium (XSym), Lecture Notes in Computer Science, 4156, 61–76
Li C, Ling TW, Hu M (2008) Efficient updates in dynamic XML data: from binary string to quaternary string. VLDB J 17(3):573–601
Article Google Scholar
Liefke H, Suciu D (2000) XMill: An Efficient Compressor for XML Data. Proc. ACM SIGMOD Conf., 153–164
Loeser H (2008) XML Storage – It’s the Flexibility, Stupid!. Computer Science colloquium, University of Kaiserslautern
Loeser H, Nicola M, Fitzgerald J (2009) Index Challenges in Native XML Database systems. in: Proc. German National Database Conf. (BTW), Münster, Lecture Notes in Informatics, GI-Edition
Lu J, Ling TW, Chan CY, Chen T (2005) From region encoding to extended Dewey: on efficient processing of XML twig pattern matching. Proc. Int. Conf. on Very Large Data Bases (VLDB), 193–204
Mathis C (2009) Storing, Indexing, and Processing XML Documents in Native XML Database Management Systems. Ph.D. thesis, University of Kaiserslautern
McHugh J, Widom J, Abiteboul S, Luo Q, Rajaraman A (1998) Indexing Semistructured Data. Technical report, Stanford University
Meier W (2002) eXist: An Open Source Native XML Database. Lecture Notes in Computer Science 2593:169–183, Springer
Mignet L, Barbosa D, Veltri P (2003) The XML Web: a First Study. Proc. 12th Int. WWW Conf., Budapest). http://www.cs.toronto.edu/ mignet/Publications/www2003.pdf
Miklau G (2006) XML Data Repository, http://www.cs.washington.edu/research/xmldatasets
Milo T, Suciu D (1999) Index Structures for Path Expressions. Proc. Int. Conf. on Database Theory (ICDT), 277–295
Ng W, Lam WY, Cheng J (2006) Comparative analysis of XML compression technologies. World Wide Web 9(1):5–33
Article Google Scholar
O’Neil PE, O’Neil EJ, Pal S, Cseri I, Schaller G, Westbury N (2004) OrdPaths: Insert-Friendly XML Node Labels. Proc. ACM SIGMOD Conf., 903–908
Sample N, Cooper BF, Franklin MJ, Hjaltason GR, Shadmon M, Cohe L (2002) Managing Complex and Varied Data with the IndexFabric(tm). Proc. Int. Conf. on Data Engineering (ICDE), 492–493
Schmidt AR, Waas F, Kersten ML, Carey MJ, Manolescu I, Busse R (2002) XMark: A Benchmark for XML Data Management. Proc. Int. Conf. on Very Large Data Bases (VLDB), 974–985
Skibinski P, Swacha J (2007) Combining Efficient XML Compression with Query Processing, Proc. East European Conf. on Advances in Databases and Information Systems (ADBIS), 330–342
Staken K (2005) Xindice 1.1 User Guide
W3C Recommendations (2004) http://www.w3c.org
XML Path Language (XPath), Version 1.0. W3C Recommendation (Nov. 1999)
XQuery 1.0: An XML Query Language. W3C Recommendation (Jan. 2007)
Yoshikawa M, Amagasa T, Shimura T, Uemura S (2001) XRel: A Path-Based Approach to Storage and Retrieval of XML Documents Using Relational Databases. ACM Trans Internet Technol (TOIT) 1:110–141
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, University of Kaiserslautern, 67663, Kaiserslautern, Germany
Christian Mathis, Theo Härder & Karsten Schmidt

Authors

Christian Mathis
View author publications
You can also search for this author in PubMed Google Scholar
Theo Härder
View author publications
You can also search for this author in PubMed Google Scholar
Karsten Schmidt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christian Mathis.

Additional information

Financial support by the Research Center (CM) ² of the University of Kaiserslautern is acknowledged ( http://cmcm.uni-kl.de ).

CR subject classification

E.2, H.2.2, H.2.4

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mathis, C., Härder, T. & Schmidt, K. Storing and indexing XML documents upside down . Comp. Sci. Res. Dev. 24, 51–68 (2009). https://doi.org/10.1007/s00450-009-0056-x

Download citation

Received: 29 October 2008
Accepted: 19 January 2009
Published: 15 April 2009
Issue Date: September 2009
DOI: https://doi.org/10.1007/s00450-009-0056-x

Keywords

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Storing and indexing XML documents upside down

Abstract

Access this article

Similar content being viewed by others

History-Offset Implementation Scheme of XML Documents and Its Evaluations

D2-Index: A Dynamic Index Method for Querying XML and Semi-Structured Data

Querying Compressed XML Data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

CR subject classification

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Storing and indexing XML documents upside down

Abstract

Access this article

Similar content being viewed by others

History-Offset Implementation Scheme of XML Documents and Its Evaluations

D2-Index: A Dynamic Index Method for Querying XML and Semi-Structured Data

Querying Compressed XML Data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

CR subject classification

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation