Skip to main content

An Extended XML Compression Technique for XML Element Retrieval

  • Conference paper
  • First Online:
Proceedings of the International Conference on IT Convergence and Security 2011

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 120))

Abstract

The standard of data transfer and exchange between organizations through web information has become exceedingly popular, especially in electronic commerce. Data transferred and protocol through internet is in the form of XML. Even though XML tag can be beneficial, it can be spaced and time consuming as well and also influences the efficiency and the effectiveness of the system as a whole. In this paper, we report experimental results of our approach for retrieval large-scale XML collection, to improve efficiency of XML Retrieval. We propose new XML compression algorithm that allows supporting Absolute Document XPath Indexing (ADXPI) and Score Sharing function by a top down scheme approach which we call the extended XML compression of ADXPI (ecADXPI). It has been discovered that these steps reduce the size of the data down by 90.19% compare to GPX, and reduce the length of Score Sharing function processing time down by 37.12% when compared to before the compression. In addition, our system is able to support CAS queries that allow us to directly retrieval the path in the compressed data. Since the data volumes reduced, such compressing of data path may be even faster than the original data system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Extensible Markup Language (XML) 1.1 (2nd Edn). http://www.w3.org/TR/xml11/

  2. INitiative for the Evaluation of XML Retrieval (INEX). https://inex.mmci.uni-saarland.de/

  3. Geva S et al. (2009) Overview of INEX 2009 ad hoc track. In: The INEX 2009 workshop pre-proceeding. Schloss Dagstuhl, Germany, pp 16–50

    Google Scholar 

  4. Kamps J (2009) Indexing units. In: Liu L, Tamer Özsu M (eds) Encyclopedia of database systems (EDS). Springer, Heidelberg, pp 1467–1471

    Google Scholar 

  5. Ogilvie P, Callan J (2005) Hierarchical language models for XML component retrieval. In: INEX 2004, Lecture notes in computer science, vol 3493

    Google Scholar 

  6. Geva S (2005) GPX–Gardens point XML information retrieval INEX 2004. In: Fuhr N, Lalmas M, Malik S, Szlavik Z (eds) Advances in XML information retrieval: 3rd international workshop of the initiative for the evaluation of XML. Lecture notes in computer science LNCS, Springer, pp 211–223

    Google Scholar 

  7. Tanioka H (2008) A fast retrieval algorithm for large-scale XML data, focused access to XML documents, vol 4862., LNCSSpringer, Heidelberg, pp 129–137

    Book  Google Scholar 

  8. Mass Y, Mandelbrod M (2005) Component ranking and automatic query refinement for XML retrieval. In: INEX 2004, Lecture notes in computer science, Springer-Verlag GmbH, vol 3493

    Google Scholar 

  9. Liefke H, Suciu D (2000) XMill: an efficient compressor for XML data. In: Proceeding of the 2000 ACM SIGMOD international conference on management of data, pp 153–164, May 2000

    Google Scholar 

  10. Gailly JL, Adler M gzip: the compressor data. Available at http://www.gzip.org/

  11. Tolani PM, Haritsa JR (2002) XGRIND: a query-friendly XML compressor. In: Proceedings of 18th international conference on databases engineering, Feb 2002

    Google Scholar 

  12. Min J-K, Park M-J, Chung C-W (2003) XPRESS: a queriable compression for XML data. In: Proceeding of the 2003 ACM SIGMOD international conference on management of data, pp 122–133, 9–12 June 2003

    Google Scholar 

  13. Maireang K, Pleurmpitiwiriyavach C (2003) XPACK: a grammar-based XML document compression. In: Proceeding of NCSEC2003 the 7th national computer science and engineering conference, 28–30 Oct 2003

    Google Scholar 

  14. Wichaiwong T, Jaruskulchai C (2007) Improve XML web services’ performance by compressing XML schema tag. In: The 4th international technical conference on electrical engineering/electronics, computer, telecommunications and information technology, Thailand, 9–12 May 2007

    Google Scholar 

  15. Wichaiwong T, Jaruskulchai C (2011) XML retrieval more efficient using ADXPI indexing scheme. In: The 4th international symposium on mining and web, Biopolis, Singapore, 22–25 March 2011

    Google Scholar 

  16. Wichaiwong T, Jaruskulchai C (2011) MEXIR: An implementation of high performance and high precision XML information retrieval. Computer technology and application, vol 2(4), David Publishing Company, April 2011

    Google Scholar 

  17. Hinz S et al. (2009) MySQL full-text search functions. http://dev.mysql.com

  18. Aksyonoff A et al. (2009) Sphinx open source search server. Available source http://www.sphinxsearch.com/

  19. Wichaiwong T, Jaruskulchai C (2010) A simple approach to optimize XML retrieval. In: The 6th international conference on next generation web services practices, Goa, India, 23–25 Nov 2010

    Google Scholar 

  20. Denoyer L, Gallinari P (2006) The wikipedia XML corpus. SIGIR forum, pp 64–69

    Google Scholar 

  21. Schenkel R, Suchanek FM, Kasneci G (2007) YAWN: a semantically annotated wikipedia XML corpus. In: 12. GI-Fachtagung f¨ur Datenbanksysteme in Business, Technologie und Web (BTW 2007), pp 277–291

    Google Scholar 

  22. Mathias G, Christine L, Franck T (2008) Ujm at INEX 2008: pre impacting of tags weights. In: INEX-2008, pp 46–53

    Google Scholar 

Download references

Acknowledgments

This work was supported by budget for overseas academic conference from the faculty of science, Kasetsart University and the graduate school Kasetsart University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tanakorn Wichaiwong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media B.V.

About this paper

Cite this paper

Wichaiwong, T., Jaruskulchai, C. (2012). An Extended XML Compression Technique for XML Element Retrieval. In: Kim, K., Ahn, S. (eds) Proceedings of the International Conference on IT Convergence and Security 2011. Lecture Notes in Electrical Engineering, vol 120. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-2911-7_51

Download citation

  • DOI: https://doi.org/10.1007/978-94-007-2911-7_51

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-007-2910-0

  • Online ISBN: 978-94-007-2911-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics