Abstract
XML has become the standard format for data representation and exchange in domains ranging from Web to desktop applications. However, wide adoption of XML is hindered by inefficient document-parsing methods. Recent work on lazy parsing is a major step towards alleviating this problem. However, lazy parsers must still read the entire XML document in order to extract the overall document structure, due to the lack of internal navigation pointers inside XML documents. Further, these parsers must load and parse the entire virtual document tree into memory during XML query processing. These overheads significantly degrade the performance of navigation operations. We have developed a framework for efficient XML parsing based on the idea of placing internal physical pointers within the document, which allows skipping large portions of the document during parsing. The internal pointers are generated in a way that optimizes parsing for common navigation patterns. A double-Lazy Parser (2LP) is then used to parse the document that exploits the internal pointers. To create the internal pointers, we use constructs supported by the current W3C XML standard. We study our pointer generation and parsing algorithms both theoretically and experimentally, and show that they perform considerably better than existing approaches.
This project was supported in part by the National Science Foundation Grant IIS-0534530 and by the Department of Energy Grant ER25739.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abramsky, S.: The Lazy Lambda Calculus. In: Turner, D. (ed.) Research Topics in Functional Programming, AddisonWesley, London (1990)
Document Object Model (DOM) (2006), http://www.w3.org/DOM/
Dimitrijevic, Z., Rangaswami, R.: Quality of Service Support for Real-time Storage Systems. In: IPSI (2003)
Franceschet, M.: XPathMark: An XPath Benchmark for the XMark Generated Data. In: XSym (2005)
Farfán, F., Hristidis, V., Rangaswami, R.: Beyond Lazy XML Parsing Extended Version (2007), http://www.cs.fiu.edu/SSS/beyondLazyExt.pdf
Gil, J., Itai, A.: How to pack trees. Journal of Algorithms 32(2), 108–132 (1999)
Gottlob, G., Koch, C., Pichler, R.: Efficient Algorithms for Processing XPath Queries. In: VLDB (2002)
Geography Markup Language (2006), http://opengis.net/gml/
Green, T.J., Miklau, G., Onizuka, M., Suciu, D.: Processing XML streams with deterministic automata. In: ICDT (2003)
Health Level Seven XML (2006), http://www.hl7.org/special/Committees/xml/xml.htm
Kiselyov, O.: A Better XML Parser Through Functional Programming. In: Krishnamurthi, S., Ramakrishnan, C.R. (eds.) PADL 2002. LNCS, vol. 2257, pp. 209–224. Springer, Heidelberg (2002)
Kenji, M., Hiroyuki, S.: Static Optimization of XSLT Stylesheets: Template Instantiation Optimization and Lazy XML Parsing. In: DocEng (2005)
Kanne, C.C., Moerkoette, G.: Efficient storage of XML data. In: ICDE 1998 (1999)
Kanne, C.C., Moerkoette, G.: A Linear-Time Algorithm for Optimal Tree Sibling Partitioning and its Application to XML Data Stores. In: VLDB (2006)
van Lunteren, J., Engbersen, T., Bostian, J., Carey, B., Larsson, C.: XML Accelerator Engine. In: First International Workshop on High Performance XML Processing (2004)
Mars Reference: Version 0.7. Adobe Systems Inc., http://download.macromedoa.com/pub/labs/mars/mars_reference.pdf
Medical Markup Language (2006), http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=10984873&dopt=Abstract
Natix (2006), http://www.dataexmachina.de/
Nicola, M., John, J.: XML Parsing: a Threat to Database Performance. In: CIKM (2003)
Noga, M., Schott, S., Löwe, W.: Lazy XML Processing. In: ACM DocEng, ACM Press, New York (2002)
OpenDocument Specification v1.0 (2006), http://www.oasis-open.org/committees/download.php/12572/OpenDocument-v1.0-os.pdf
Simple API for XML (SAX) (2006), http://www.saxproject.org/
Schott, S., Noga, M.: Lazy XSL Transformations. In: ACM DocEng, ACM Press, New York (2003)
Schmidt, A., Waas, F., Kersten, M.L., Carey, M.J., Manolescu, I., Busse, R.: XMark: A Benchmark for XML Data Management. In: VLDB (2002)
Apache Xerces2 Java Parser: Apache XML Project (2006), http://xml.apache.org/xerces-j/
XML Inclusion (2006), http://www.w3.org/TR/xinclude/
XML Pull Parsing (2006), http://www.xmlpull.org/index.shtml
XML Pull Parser (2006), http://www.extreme.indiana.edu/xgws/xsoap/xpp/
XML Pointer Language Version 1.0 (2006), http://www.w3.org/TR/WD-xptr
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Farfán, F., Hristidis, V., Rangaswami, R. (2007). Beyond Lazy XML Parsing. In: Wagner, R., Revell, N., Pernul, G. (eds) Database and Expert Systems Applications. DEXA 2007. Lecture Notes in Computer Science, vol 4653. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74469-6_9
Download citation
DOI: https://doi.org/10.1007/978-3-540-74469-6_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74467-2
Online ISBN: 978-3-540-74469-6
eBook Packages: Computer ScienceComputer Science (R0)