Abstract
Path expression is an important component in querying XML data. The extended preorder numbering scheme enables us to quickly determine the ancestor-descendant relationship between elements in the hierarchy of XML data. Using the numbering scheme, a path expression can be evaluated by join operations to avoid potentially high cost of tree traversals. In this paper, we first formulate XML path queries as range-point join queries. Then we discuss the partition based algorithms that can utilize the range containment property to efficiently process the range-point join queries. Under the partition based framework, we propose three algorithms, namely Descendant partition join, Segment-tree partition join and Ancestor Link partition join, which can be chosen by a query optimizer for different input data characteristics. The experimental results show that the partition based algorithms can make better use of the buffer memory than sort-merge algorithms, and the proposed Ancestor Link join algorithm yields the best performance by using small in-memory data structures and by taking advantage of unevenly sized inputs.
This work was sponsored in part by the National Science Foundation CAREER Award (IIS-9876037), NSF Grant No. IIS-0100436, and NSF Research Infrastructure program EIA-0080123. The authors assume all responsibility for the contents of the paper.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Chaudhuri, S., Motwani, R., Narasayya, V.R.: Random sampling for histogram construction: How much is enough? In: SIGMOD 1998, Seattle, Washington, USA, June 2–4, pp. 436–447 (1998)
DeWitt, D.J., Naughton, J.F., Scheneider, D.A.: An evaluation of nonequijoin algorithms. In: VLDB 1991, Barcelona, Spain (September 1991)
Goldman, R., Widom, J.: DataGuides: Enabling query formulation and optimization in semistructured databases. In: VLDB 1997, Athens, Greece (September 1997)
Graefe, G., Linville, A., Shapiro, L.D.: Sort versus hash revisited. IEEE Transactions on Knowledge and Data Engineering 6(6), 934–944 (1994)
Gunadhi, H., Segev, A.: Query processing algorithms for temporal intersection joins. In: ICDE 1991, Kobe, Japan (April 1991)
Li, Q., Moon, B.: Indexing and querying xml data for regular path expressions. In: VLDB 2001, Rome, Italy (September 2001)
Lipton, R.J., Naughton, J.F., Schneider, D.A., Seshadri, S.: Efficient sampling strategies for relational database operations. Theoretical Computer Science 116, 195–226 (1993)
McHugh, J., Widom, J.: Query optimization for XML. In: VLDB 1999, Edinburgh, Scotland, pp. 315–326 (September 1999)
Ley, M.: DBLP Computer Science Biblography (February 2003), http://www.informatik.uni-trier.de/~ley/db/index.html
Preparata, F.P., Shamos, M.I.: Computational Geometry - An Introduction. Springer, Berlin (1985)
Soo, M.D., Snodgrass, R.T., Jensen, C.S.: Efficient evaluation of the valid-time natural join. In: ICDE 1994, Houston, Texas, USA, February 14-18 (1994)
Srivastava, D., Al-Khalifa, S., Jagadish, H.V., Koudas, N., Patel, J.M., Wu, Y.: Structural joins: A primitive for efficient xml query pattern matching. In: ICDE 2002, San Jose, California (February 2002)
Zhang, C., Naughton, J., DeWitt, D., Luo, Q., Lohman, G.: On supporting containment queries in relational database management systems. In: SIGMOD 2001, Santa Barbara, CA (May 2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, Q., Moon, B. (2003). Partition Based Path Join Algorithms for XML Data. In: Mařík, V., Retschitzegger, W., Štěpánková, O. (eds) Database and Expert Systems Applications. DEXA 2003. Lecture Notes in Computer Science, vol 2736. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45227-0_17
Download citation
DOI: https://doi.org/10.1007/978-3-540-45227-0_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40806-2
Online ISBN: 978-3-540-45227-0
eBook Packages: Springer Book Archive