Abstract
The standard of data transfer and exchange between organizations through web information has become exceedingly popular, especially in electronic commerce. Data transferred and protocol through internet is in the form of XML. Even though XML tag can be beneficial, it can be spaced and time consuming as well and also influences the efficiency and the effectiveness of the system as a whole. In this paper, we report experimental results of our approach for retrieval large-scale XML collection, to improve efficiency of XML Retrieval. We propose new XML compression algorithm that allows supporting Absolute Document XPath Indexing (ADXPI) and Score Sharing function by a top down scheme approach which we call the extended XML compression of ADXPI (ecADXPI). It has been discovered that these steps reduce the size of the data down by 90.19% compare to GPX, and reduce the length of Score Sharing function processing time down by 37.12% when compared to before the compression. In addition, our system is able to support CAS queries that allow us to directly retrieval the path in the compressed data. Since the data volumes reduced, such compressing of data path may be even faster than the original data system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Extensible Markup Language (XML) 1.1 (2nd Edn). http://www.w3.org/TR/xml11/
INitiative for the Evaluation of XML Retrieval (INEX). https://inex.mmci.uni-saarland.de/
Geva S et al. (2009) Overview of INEX 2009 ad hoc track. In: The INEX 2009 workshop pre-proceeding. Schloss Dagstuhl, Germany, pp 16–50
Kamps J (2009) Indexing units. In: Liu L, Tamer Özsu M (eds) Encyclopedia of database systems (EDS). Springer, Heidelberg, pp 1467–1471
Ogilvie P, Callan J (2005) Hierarchical language models for XML component retrieval. In: INEX 2004, Lecture notes in computer science, vol 3493
Geva S (2005) GPX–Gardens point XML information retrieval INEX 2004. In: Fuhr N, Lalmas M, Malik S, Szlavik Z (eds) Advances in XML information retrieval: 3rd international workshop of the initiative for the evaluation of XML. Lecture notes in computer science LNCS, Springer, pp 211–223
Tanioka H (2008) A fast retrieval algorithm for large-scale XML data, focused access to XML documents, vol 4862., LNCSSpringer, Heidelberg, pp 129–137
Mass Y, Mandelbrod M (2005) Component ranking and automatic query refinement for XML retrieval. In: INEX 2004, Lecture notes in computer science, Springer-Verlag GmbH, vol 3493
Liefke H, Suciu D (2000) XMill: an efficient compressor for XML data. In: Proceeding of the 2000 ACM SIGMOD international conference on management of data, pp 153–164, May 2000
Gailly JL, Adler M gzip: the compressor data. Available at http://www.gzip.org/
Tolani PM, Haritsa JR (2002) XGRIND: a query-friendly XML compressor. In: Proceedings of 18th international conference on databases engineering, Feb 2002
Min J-K, Park M-J, Chung C-W (2003) XPRESS: a queriable compression for XML data. In: Proceeding of the 2003 ACM SIGMOD international conference on management of data, pp 122–133, 9–12 June 2003
Maireang K, Pleurmpitiwiriyavach C (2003) XPACK: a grammar-based XML document compression. In: Proceeding of NCSEC2003 the 7th national computer science and engineering conference, 28–30 Oct 2003
Wichaiwong T, Jaruskulchai C (2007) Improve XML web services’ performance by compressing XML schema tag. In: The 4th international technical conference on electrical engineering/electronics, computer, telecommunications and information technology, Thailand, 9–12 May 2007
Wichaiwong T, Jaruskulchai C (2011) XML retrieval more efficient using ADXPI indexing scheme. In: The 4th international symposium on mining and web, Biopolis, Singapore, 22–25 March 2011
Wichaiwong T, Jaruskulchai C (2011) MEXIR: An implementation of high performance and high precision XML information retrieval. Computer technology and application, vol 2(4), David Publishing Company, April 2011
Hinz S et al. (2009) MySQL full-text search functions. http://dev.mysql.com
Aksyonoff A et al. (2009) Sphinx open source search server. Available source http://www.sphinxsearch.com/
Wichaiwong T, Jaruskulchai C (2010) A simple approach to optimize XML retrieval. In: The 6th international conference on next generation web services practices, Goa, India, 23–25 Nov 2010
Denoyer L, Gallinari P (2006) The wikipedia XML corpus. SIGIR forum, pp 64–69
Schenkel R, Suchanek FM, Kasneci G (2007) YAWN: a semantically annotated wikipedia XML corpus. In: 12. GI-Fachtagung f¨ur Datenbanksysteme in Business, Technologie und Web (BTW 2007), pp 277–291
Mathias G, Christine L, Franck T (2008) Ujm at INEX 2008: pre impacting of tags weights. In: INEX-2008, pp 46–53
Acknowledgments
This work was supported by budget for overseas academic conference from the faculty of science, Kasetsart University and the graduate school Kasetsart University.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media B.V.
About this paper
Cite this paper
Wichaiwong, T., Jaruskulchai, C. (2012). An Extended XML Compression Technique for XML Element Retrieval. In: Kim, K., Ahn, S. (eds) Proceedings of the International Conference on IT Convergence and Security 2011. Lecture Notes in Electrical Engineering, vol 120. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-2911-7_51
Download citation
DOI: https://doi.org/10.1007/978-94-007-2911-7_51
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-2910-0
Online ISBN: 978-94-007-2911-7
eBook Packages: EngineeringEngineering (R0)