skip to main content
10.1145/1135777.1135891acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
Article

Compressing and searching XML data via two zips

Published:23 May 2006Publication History

ABSTRACT

XML is fast becoming the standard format to store, exchange and publish over the web, and is getting embedded in applications. Two challenges in handling XML are its size (the XML representation of a document is significantly larger than its native state) and the complexity of its search (XML search involves path and content searches on labeled tree structures). We address the basic problems of compression, navigation and searching of XML documents. In particular, we adopt recently proposed theoretical algorithms [11] for succinct tree representations to design and implement a compressed index for XML, called XBZIPiNDEX, in which the XML document is maintained in a highly compressed format, and both navigation and searching can be done uncompressing only a tiny fraction of the data. This solution relies on compressing and indexing two arrays derived from the XML data. With detailed experiments we compare this with other compressed XML indexing and searching engines to show that XBZIPiNDEX has compression ratio up to 35% better than the ones achievable by those other tools, and its time performance on some path and content search operations is order of magnitudes faster: few milliseconds over hundreds of MBs of XML files versus tens of seconds, on standard XML data sources.

References

  1. http://xml.coverpages.org/xml.html.]]Google ScholarGoogle Scholar
  2. J. Adiego, P. de la Fuente, and G. Navarro. Lempel-Ziv compression of structured text. In IEEE Data Compression Conference, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Adiego, P. de la Fuente, and G. Navarro. Merging prediction by partial matching with structural contexts model. In IEEE Data Compression Conference, page 522, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Arion, A. Bonifati, G. Costa, S. D'Aguanno, I. Manolescu, and A. Pugliese. XQueC: pushing queries to compressed XML data. In VLDB, 2003.]]Google ScholarGoogle Scholar
  5. D. Benoit, E. Demaine, I. Munro, R. Raman, V. Raman, and S. Rao. Representing trees of higher degree. Algorithmica, 2005.]]Google ScholarGoogle ScholarCross RefCross Ref
  6. B. Catania, A. Maddalena, and A. Vakali. XML document indexes: a classification. In IEEE Internet Computing, pages 64--71, September-October 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. Chen, J. Lu, and T. W. Lin. On boosting holism in XML twig pattern matching using structural indexing techniques. In ACM Sigmod, pages 455--466, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Cheney. Compressing XML with multiplexed hierarchical PPM models. In IEEE Data Compression Conference, pages 163--172, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Cheney. An empirical evaluation of simple DTD-conscious compression techniques. In WebDB, 2005.]]Google ScholarGoogle Scholar
  10. J. Cheng and W. Ng. XQzip: Querying compressed XML using structural indexing. In International Conference on Extending Database Technology, pages 219--236, 2004.]]Google ScholarGoogle ScholarCross RefCross Ref
  11. P. Ferragina, F. Luccio, G. Manzini, and S. Muthukrishnan. Structuring labeled trees for optimal succinctness, and beyond. In IEEE Focs, pages 184--193, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. Ferragina and G. Manzini. An experimental study of a compressed index. Information Sciences, 135:13--28, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Ferragina and G. Manzini. Indexing compressed text. Journal of the ACM, 52(4):552--581, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. F. Geary, R. Raman, and V. Raman. Succinct ordinal trees with level-ancestor queries. In ACM-SIAM Soda, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Geer. Will binary XML speed network traffic? IEEE Computer, pages 16--18, April 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Goldman and J. Widom. Dataguides: enabling query formulation and optimization in semistructured databases. In VLDB, pages 436--445, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Golinsky, I. Munro, and S. Rao. Rank/Select operations on large alphabets: a tool for text indexing. In ACM-SIAM SODA, 2006.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Kaushik, R. Krishnamurthy, J. Naughton, and R. Ramakrishnan. On the integration of structure indexes and inverted lists. In ACM Sigmod, pages 779--790, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Kaushik, R. Krishnamurthy, J. F. Naughton, and R. Ramakrishnan. On the integration of structure indexes and inverted lists. In ACM Sigmod, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. W. Y. Lam, W. Ng, P. T. Wood, and M. Levene. XCQ: XML compression and querying system. In WWW, 2003.]]Google ScholarGoogle Scholar
  21. H. Liefke and D. Suciu. XMILL: An efficient compressor for XML data. In ACM Sigmod, pages 153--164, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. Milo and D. Suciu. Index structures for path expressions. In ICDT, pages 277--295, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Jun-Ki Min, Myung-Jae Park, and Chin-Wan Chung. Xpress: A queriable compression for XML data. In ACM Sigmod, pages 122--133, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. E. Moura, G. Navarro, N. Ziviani, and R. Baeza-Yates. Fast and flexible word searching on compressed text. ACM Transactions on Information Systems, 18(2):113--139, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. P. R. Raw and B. Moon. PRIX: Indexing and querying XML using Pr"ufer sequences. In ICDE, pages 288--300, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D. Shkarin. PPM: One step to practicality. In IEEE Data Compression Conference, pages 202--211, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. P. M. Tolani and J. R. Haritsa. XGRIND: A query-friendly XML compressor. In ICDE, pages 225--234, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. H. Wang, S. Park, W. Fan, and P. S. Yu. ViST: a dynamic index methd for querying XML data by tree structures. In ACM Sigmod, pages 110--121, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. W. Wang, H. Wang, H. Lu, H. Jang, X. Lin, and J. Li. Efficient processing of XML path queries using the disk-based F&B index. In VLDB, pages 145--156, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Compressing and searching XML data via two zips

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                WWW '06: Proceedings of the 15th international conference on World Wide Web
                May 2006
                1102 pages
                ISBN:1595933239
                DOI:10.1145/1135777

                Copyright © 2006 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 23 May 2006

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • Article

                Acceptance Rates

                Overall Acceptance Rate1,899of8,196submissions,23%

                Upcoming Conference

                WWW '24
                The ACM Web Conference 2024
                May 13 - 17, 2024
                Singapore , Singapore

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader