skip to main content
article

XQueC: A query-conscious compressed XML database

Published: 01 May 2007 Publication History

Abstract

XML compression has gained prominence recently because it counters the disadvantage of the verbose representation XML gives to data. In many applications, such as data exchange and data archiving, entirely compressing and decompressing a document is acceptable. In other applications, where queries must be run over compressed documents, compression may not be beneficial since the performance penalty in running the query processor over compressed data outweighs the data compression benefits. While balancing the interests of compression and query processing has received significant attention in the domain of relational databases, these results do not immediately translate to XML data.
In this article, we address the problem of embedding compression into XML databases without degrading query performance. Since the setting is rather different from relational databases, the choice of compression granularity and compression algorithms must be revisited. Query execution in the compressed domain must also be rethought in the framework of XML query processing due to the richer structure of XML data. Indeed, a proper storage design for the compressed data plays a crucial role here.
The XQueC system (XQuery Processor and Compressor) covers a wide set of XQuery queries in the compressed domain and relies on a workload-based cost model to perform the choices of the compression granules and of their corresponding compression algorithms. As a consequence, XQueC provides efficient query processing on compressed XML data. An extensive experimental assessment is presented, showing the effectiveness of the cost model, the compression ratios, and the query execution times.

References

[1]
Al-Khalifa, S., Jagadish, H., Patel, J., Wu, Y., Koudas, N., and Srivastava, D. 2002. Structural joins: A primitive for efficient XML query pattern matching. In Proceedings of the 18th International Conference on Data Engineering. IEEE, 141--152.
[2]
Amer-Yahia, S. and Johnson, T. 2000. Optimizing queries on compressed bitmaps. In Proceedings of 26th International Conference on Very Large Data Bases. ACM, 329--338.
[3]
Antoshenkov, G. 1997. Dictionary-based order-preserving string compression. VLDB J. 6, 1, 26--39.
[4]
Antoshenkov, G., Lomet, D., and Murray, J. 1996. Order preserving string compression. In Proceedings of the 12th International Conference on Data Engineering. IEEE, 655--663.
[5]
APA 2004. Apache custom log format. http://www.apache.org/docs/mod/mod_log_config.html.
[6]
Arion, A., Benzaken, V., Manolescu, I., Papakonstantinou, Y., and Vijay, R. 2006. Algebra-based identification of tree patterns in XQuery. In Proceedings of the International Conference on Flexible Query Answering Systems. 13--25.
[7]
Arion, A., Bonifati, A., Costa, G., D'Aguanno, S., Manolescu, I., and Pugliese, A. 2004. Efficient query evaluation over compressed XML data. In Proceedings of the International Conference on Extending Database Technologies. Heraklion, Grece, 200--218.
[8]
Arion, A., Bonifati, A., Manolescu, I., and Pugliese, A. 2006. Path summaries and path partitioning in modern XML databases. In Proceedings of the International World Wide Web Conference. 1077--1078.
[9]
BER 2003. Berkeley DB Data Store. http://www.sleepycat.com/products/data.shtml.
[10]
Bohannon, P., Freire, J., Roy, P., and Simeon, J. 2002. From XML schema to relations: A cost-based approach to XML storage. In Proceedings of the 18th International Conference on Data Engineering. IEEE, 64--76.
[11]
Buneman, P., Grohe, M., and Koch, C. 2003. Path queries on compressed XML. In Proceedings of 29th International Conference on Very Large Data Bases. Morgan Kaufmann, 141--152.
[12]
Busatto, G., Lohrey, M., and Maneth, S. 2005. Efficient memory representation of XML documents. In Proceedings of the 10th International Symposium on Database Programming Languages. Trondheim, Norway, 199--216.
[13]
BZIP2 2002. The bzip2 and libbzip2 Official Home Page. http://sources.redhat.com/bzip2/.
[14]
Chen, Z., Gehrke, J., and Korn, F. 2000. Query optimization in compressed database systems. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. ACM, 271--282.
[15]
Chen, Z., Jagadish, H., Lakshmanan, L., and Paparizos, S. 2003. From tree patterns to generalized tree patterns: On efficient evaluation of XQuery. In Proceedings of 29th International Conference on Very Large Data Bases. Morgan Kaufmann, 237--248.
[16]
Cheney, J. 2001. Compressing XML with multiplexed hierarchical PPM models. In Data Compression Conference. IEEE Computer Society, 163--172.
[17]
Cheney, J. 2005. An empirical evaluation of simple DTD-conscious compression techniques. In WebDB. 43--48.
[18]
Cheng, J. and Ng, W. 2004. XQzip: Querying compressed XML using structural indexing. In Proceedings of the International Conference on Extending Database Technologies. Heraklion, Greece, 219--236.
[19]
Fiebig, T., Helmer, S., Kanne, C., Moerkotte, G., Neumann, J., Schiele, R., and Westmann, T. 2002. Anatomy of a native XML base management system. VLDB J. 11, 4, 292--314.
[20]
Galax 2006. Galax: An implementation of XQuery. Available at www.galaxquery.org.
[21]
Goldman, R. and Widom, J. 1997. DataGuides: Enabling query formulation and optimization in semistructured databases. In Proceedings of 23rd International Conference on Very Large Data Bases. Morgan Kaufman, 436--445.
[22]
Goldstein, J., Ramakrishnan, R., and Shaft, U. 1998. Compressing relations and indexes. In Proceedings of the 14th International Conference on Data Engineering. IEEE, 370--379.
[23]
Graefe, G. 1993. Query evaluation techniques for large databases. ACM Compu. Surv. 25, 2, 73--170.
[24]
Greer, R. 1999. Daytona and the fourth-generation language Cymbal. In Proceedings ACM SIGMOD International Conference on Management of Data. ACM, 525--526.
[25]
Grust, T. 2002. Accelerating XPath location steps. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data. ACM, 109--120.
[26]
Halverson, A., Burger, J., Galanis, L., Kini, A., Krishnamurthy, R., Rao, A., Tian, F., Viglas, S., Wang, Y., Naughton, J., and DeWitt, D. 2003. Mixed mode XML query processing. In Proceedings of 29th International Conference on Very Large Data Bases. Morgan Kaufmann, 225--236.
[27]
Hu, T. C. and Tucker, A. C. 1971. Optimal computer search trees and variable-length alphabetical codes. SIAM J. of App. Mathe. 21, 4, 514--532.
[28]
Huffman, D. A. 1952. A method for construction of minimum-redundancy codes. In Proceedings of the IRE. 1098--1101.
[29]
IBIBLIO 2004. Ibiblio.org web site. Available at www.ibiblio.org/xml/books/biblegold/examples/baseball/.
[30]
INEX 2004. INitiative for the Evaluation of XML retrieval. inex.is.informatik.uni-duisburg.de: 2004.
[31]
Jagadish, H. V., Al-Khalifa, S., Chapman, A., Lakshmanan, L. V., Nierman, A., Paparizos, S., Patel, J., Srivastava, D., Wiwatwattana, N., Wu, Y., and Yu., C. 2002. Timber: A native XML database. VLDB J. 11, 4, 274--291.
[32]
Jagadish, H. V., Ng, R., Ooi, B. C., and Tung, A. K. H. 2004. ItCompress: An iterative semantic compression algorithm. In Proceedings of the International Conference on Data Engineering. IEEE Computer Society, 646--658.
[33]
Jain, A. K., Murty, M. N., and Flynn, P. J. 1999. Data clustering: A review. ACM Comput. Surv. 31, 3, 264--323.
[34]
Liefke, H. and Suciu, D. 2000. XMILL: An efficient compressor for XML Data. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. ACM, 153--164.
[35]
Miklau, G. and Suciu, D. 2002. Containment and equivalence for an XPath fragment. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Conference on the Principles of Database Systems. 65--76.
[36]
Milo, T. and Suciu, D. 1999. Index structures for path expressions. In Proceedings of the International Conference on Database Theory (ICDT). 277--295.
[37]
Min, J. K., Park, M., and Chung, C. 2003. XPRESS: A queriable compression for XML data. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data. ACM, 122--133.
[38]
Min, J. K., Park, M., and Chung, C. 2006. A compressor for effective archiving, retrieval, and update of XML documents. ACM Trans. Intern. Techn. 6, 3.
[39]
Moffat, A. and Zobel, J. 1992. Coding for compression in full-text retrieval systems. In Proceedings of the Data Compression Conference (DCC). 72--81.
[40]
Moura, E. D., Navarro, G., Ziviani, N., and Baeza-Yates, R. 2000. Fast and flexible word searching on compressed text. ACM Trans. Inform. Syst. 18, 2 (April), 113--139.
[41]
Ng, W., Lam, Y. W., and Cheng, J. 2006. Comparative analysis of XML compression technologies. WWW J. 9, 1, 5--33.
[42]
Ng, W., Lam, Y. W., Wood, P., and Levene, M. 2006. XCQ: A queriable XML compression system. Inter. J. Knowl. Inform. Syst. To appear.
[43]
Paparizos, S., Al-Khalifa, S., Chapman, A., Jagadish, H. V., Lakshmanan, L. V. S., Nierman, A., Patel, J. M., Srivastava, D., Wiwatwattana, N., Wu, Y., and Yu, C. 2003. TIMBER: A native system for querying XML. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data. ACM, 672.
[44]
Poess, M. and Potapov, D. 2003. Data compression in Oracle. In Proceedings of 29th International Conference on Very Large Data Bases. Morgan Kaufmann, 937--947.
[45]
Roy, P., Seshadri, S., Sudarshan, S., and Bhobe, S. 2000. Efficient and extensible algorithms for multi query optimization. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. Dallas, TX. 249--260.
[46]
Schmidt, A., Waas, F., Kersten, M., Carey, M., Manolescu, I., and Busse, R. 2002. XMark: A benchmark for XML data management. In Proceedings of 28th International Conference on Very Large Data Bases. Morgan Kaufmann, 974--985.
[47]
Tolani, P. and Haritsa, J. 2002. XGRIND: A Query-friendly XML Compressor. In Proceedings of the 18th International Conference on Data Engineering. IEEE, 225--235.
[48]
Transaction Processing Performance Council. 1999. TPC-H benchmark database. http://www.tcp.org.
[49]
UWXML 2004. University of Washington's XML repository. Available at www.cs.washington.edu/research/xmldatasets.
[50]
Westmann, T., Kossmann, D., Helmer, S., and Moerkotte, G. 2000. The implementation and performance of compressed databases. ACM SIGMOD Rec. 29, 3, 55--67.
[51]
Witten, I. H. 1987. Arithmetic coding For data compression. Comm. ACM, 857--865.
[52]
XMLZIP 1999. XMLZip XML compressor. Available at http://www.xmls.com/products/xmlzip/xmlzip.html.
[53]
XQUE 2004. The XML query language. http://www.w3.org/XML/Query.

Cited By

View all
  • (2023)AN IMPROVED INDEXING METHOD FOR QUERYING BIG XML FILESJournal of Computer Science and Cybernetics10.15625/1813-9663/19018(323-342)Online publication date: 25-Dec-2023
  • (2023)Spatiotemporally Adaptive Compression for Scientific Dataset with Feature Preservation – A Case Study on Simulation Data with Extreme Climate Events Analysis2023 IEEE 19th International Conference on e-Science (e-Science)10.1109/e-Science58273.2023.10254796(1-10)Online publication date: 9-Oct-2023
  • (2022)Toward Quantity-of-Interest Preserving Lossy Compression for Scientific DataProceedings of the VLDB Endowment10.14778/3574245.357425516:4(697-710)Online publication date: 1-Dec-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Internet Technology
ACM Transactions on Internet Technology  Volume 7, Issue 2
May 2007
152 pages
ISSN:1533-5399
EISSN:1557-6051
DOI:10.1145/1239971
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2007
Published in TOIT Volume 7, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. XML compression
  2. XML data management
  3. XML databases
  4. XQuery

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)AN IMPROVED INDEXING METHOD FOR QUERYING BIG XML FILESJournal of Computer Science and Cybernetics10.15625/1813-9663/19018(323-342)Online publication date: 25-Dec-2023
  • (2023)Spatiotemporally Adaptive Compression for Scientific Dataset with Feature Preservation – A Case Study on Simulation Data with Extreme Climate Events Analysis2023 IEEE 19th International Conference on e-Science (e-Science)10.1109/e-Science58273.2023.10254796(1-10)Online publication date: 9-Oct-2023
  • (2022)Toward Quantity-of-Interest Preserving Lossy Compression for Scientific DataProceedings of the VLDB Endowment10.14778/3574245.357425516:4(697-710)Online publication date: 1-Dec-2022
  • (2018)XML CompressionEncyclopedia of Database Systems10.1007/978-1-4614-8265-9_783(4742-4748)Online publication date: 7-Dec-2018
  • (2017)A data processing scheme based on CXML in mobile commerceJournal of Computational Methods in Sciences and Engineering10.3233/JCM-16070116:4(913-925)Online publication date: 17-Jan-2017
  • (2017)XML CompressionEncyclopedia of Database Systems10.1007/978-1-4899-7993-3_783-2(1-6)Online publication date: 25-Jan-2017
  • (2015)QRFXFreeze: Queryable Compressor for RFXThe Scientific World Journal10.1155/2015/8647502015(1-8)Online publication date: 2015
  • (2015)XML Compression via Directed Acyclic GraphsTheory of Computing Systems10.1007/s00224-014-9544-x57:4(1322-1371)Online publication date: 1-Nov-2015
  • (2015)Permutation Based XML CompressionWeb Information Systems and Technologies10.1007/978-3-319-27030-2_5(63-78)Online publication date: 16-Dec-2015
  • (2014)Difference computation for grammar-compressed XML dataProceedings of the Second Australasian Web Conference - Volume 15510.5555/2667702.2667703(3-12)Online publication date: 20-Jan-2014
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media