Skip to main content

XML Compression

  • Reference work entry
  • First Online:
  • 25 Accesses

Definition

XML is an extremely verbose data format, with a high degree of redundant information, due to the same tags being repeated over and over for multiple data items, and due to both tags and data values being represented as strings. Viewed in relational database terms, XML stores the “schema” with each and every “record” in the repository. The size increase incurred by publishing data in XML format is estimated to be as much as 400 % [14], making it a prime target for compression. While standard general-purpose compressors, such as zip, gzip or bzip, typically compress XML data reasonably well, specialized XML compressors have been developed over the last decade that exploit the specific structural aspects of XML data. These new techniques fall into two classes: (i) Compression-oriented, where the goal is to maximize the compression ratio of the data, typically up to a factor of two better than the general-purpose compressors; and (ii) Query-oriented, where the goal is to...

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Recommended Reading

  1. Arion A, Bonifati A, Manolescu I, Pugliese A. XQueC: a query-conscious compressed XML database. ACM Trans Internet Technol. 2007;7(2):1–35.

    Article  Google Scholar 

  2. Cheney J. Compressing XML with multiplexed hierarchical PPM models. In: Proceedings of the Data Compression Conference; 2001. p. 163–72

    Google Scholar 

  3. Ferragina P, Luccio F, Manzini G, Muthukrishnan M. Compressing and searching XML data via two zips. In: Proceeding of the 15th International World Wide Web Conference; 2006. p. 751–60.

    Google Scholar 

  4. Girardot M, Sundaresan N. Millau: an encoding format for efficient representation and exchange of XML over the Web. In: Proceedings of the 9th International World Wide Web Conference; 2000.

    Google Scholar 

  5. Liefke H, Suciu D. An extensible compressor for XML data. ACM SIGMOD Rec. 2000;29(1):57–62.

    Article  Google Scholar 

  6. Liefke H, Suciu D. XMill: an efficent compressor for XML data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2000. p. 153–64.

    Google Scholar 

  7. Min JK, Park M, Chung C. XPRESS: a queriable compression for XML data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2003. p. 122–33.

    Google Scholar 

  8. Min JK, Park M, Chung C. XPRESS: a compressor for effective archiving, retrieval, and update of XML documents. ACM Trans Internet Technol. 2006;6(3):223–58.

    Article  Google Scholar 

  9. Tolani P, Haritsa J. XGRIND: a query-friendly XML compressor. In: Proceedings of the 18th International Conference on Data Engineering; 2002. p. 225–35.

    Google Scholar 

  10. Ziv J, Lempel A. A universal algorithm for sequential data compression. IEEE Trans Inf Theory. 1977;23(3):337–43.

    Article  MathSciNet  MATH  Google Scholar 

  11. www.dbxml.com

  12. www.ebi.ac.uk

  13. www.ictcompress.com

  14. www.ictcompress.com/xml.html

  15. www.xmlzip.com

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dan Suciu .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Suciu, D., Haritsa, J.R. (2018). XML Compression. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_783

Download citation

Publish with us

Policies and ethics