To read this content please select one of the options below:

Sibling‐First Data Organization for Parse‐Free XML Data Processing

Hooman Homayounfar (Department of Computing and Information Science, University of Guelph, Guelph, Ontario, Canada email: hhomayou@uoguelph.ca)
Fangju Wang (Department of Computing and Information Science, University of Guelph, Guelph, Ontario, Canada email: hhomayou@uoguelph.ca)

International Journal of Web Information Systems

ISSN: 1744-0084

Article publication date: 31 December 2006

222

Abstract

XML is becoming one of the most important structures for data exchange on the web. Despite having many advantages, XML structure imposes several major obstacles to large document processing. Inconsistency between the linear nature of the current algorithms (e.g. for caching and prefetch) used in operating systems and databases, and the non‐linear structure of XML data makes XML processing more costly. In addition to verbosity (e.g. tag redundancy), interpreting (i.e. parsing) depthfirst (DF) structure of XML documents is a significant overhead to processing applications (e.g. query engines). Recent research on XML query processing has learned that sibling clustering can improve performance significantly. However, the existing clustering methods are not able to avoid parsing overhead as they are limited by larger document sizes. In this research, We have developed a better data organization for native XML databases, named sibling‐first (SF) format that improves query performance significantly. SF uses an embedded index for fast accessing to child nodes. It also compresses documents by eliminating extra information from the original DF format. The converted SF documents can be processed for XPath query purposes without being parsed. We have implemented the SF storage in virtual memory as well as a format on disk. Experimental results with real data have showed that significantly higher performance can be achieved when XPath queries are conducted on very large SF documents.

Keywords

Citation

Homayounfar, H. and Wang, F. (2006), "Sibling‐First Data Organization for Parse‐Free XML Data Processing", International Journal of Web Information Systems, Vol. 2 No. 3/4, pp. 176-186. https://doi.org/10.1108/17440080780000298

Publisher

:

Emerald Group Publishing Limited

Copyright © 2006, Emerald Group Publishing Limited

Related articles