Abstract
In this paper, we present the design and performance of XCube, a tag-based system for managing XML data in a hypercube overlay network. In XCube, each node in a d-dimensional hypercube is identified by a d-bit vector. A peer manages a smaller hypercube with dimension d′ < d. An XML document is compactly represented as a structure summary and a content summary. The structure summary comprises a d-bit vector derived from the distinct tag names in the document and a synopsis capturing the structure of the document. The content summary consists of a bit map that summarizes the document content. The metadata of a document, i.e., owner IP, document identifier, structure summary and content summary, is indexed at its anchor peer (the peer that manages the node with matching bit vector). In addition, the structure summary is further indexed at all peers that manages nodes whose bit vectors are covered by the document’s bit vector. An XPath query is processed in four phases. In phase 1, the query is routed to its anchor peer according to the bit vector of the query. In phase 2, the query is evaluated against all the synopses stored in its anchor peer and forwarded to the anchor peers of the matching synopses. In phase 3, the anchor peer of each related synopsis examines the query on the related bit maps and forwards the query to the related owner peers. Finally in phase 4, the owner peers evaluate the query on the XML documents and return answers to the querying peer. We also present a scheme that dynamically partitions the hypercube to balance the load across peers. We further exploit the partition history to remove redundant messages. We conduct a comprehensive experimental study and the results show the efficiency of XCube.
Similar content being viewed by others
Notes
A structure/path query can be mapped into a tag-based query by ignoring the structure.
We will use the terms document structure, synopsis, and XML tree interchangeably in the paper.
If a document can be summarized with multiple bit maps, the bit maps can be built with more bits, and thus they are more accurate.
The number of edges in a tree is (T − 1), where T is the number of nodes in the tree. Therefore, the number of 1-bits in the bit vector is bounded by 2N.
References
Aberer K (2001) P-Grid: a self-organizing access structure for P2P information systems. In: Proceedings of the 6th CoopIS conference, pp 179–194
Abiteboul S, Manolescu I, Preda N (2004) Constructing and querying a peer-to-peer warehouse of XML resources. In: Semantic web and databases workshop, pp 219–225
Bonifati A, Matrangolo U, Cuzzocrea A, Jain M (2004) XPath lookup queries in P2P networks. In: Proceedings of WIDM’04, pp 48–55
Crespo A, Garcia-Molina H (2002) Routing indices for peer-to-peer systems. In: Proceedings of ICDCS’02, p 23, July
Galanis L, Wang Y, Jeffery S, DeWitt D (2003) Locating data sources in large distributed systems. In: Proceedings of VLDB’03. Berlin, Germany, pp 874–885
Galanis L, Wang Y, Jeffery SR, Dewitt DJ (2003) Processing queries in a large peer-to-peer system. In: Proceedings of the 16th CAiSE conference, pp 273–288
Ganesan P, Bawa M, Garcia-Molina H (2004) Online balancing of range-partitioned data with applications to peer-to-peer systems. In: Proceedings of VLDB’04, pp 444–455
Goldman R, Widom J (1997) Dataguides: enabling query formulation and optimization in semistructured databases. In: Proceedings of VLDB’97, pp 436–445
Joung YJ, Fang CT, Yang LW (2005) Keyword search in DHT-based peer-to-peer networks. In: Proceedings of ICDCS’05, pp 339–348
Kaushik R, Bohannon P, Naughton JF, Korth HF (2002) Covering indexes for branching path queries. In: Proceedings of ACM SIGMOD’02, pp 133–144
Koloniari G, Pitoura E (2004) Content-based routing of path queries in peer-to-peer systems. In: Proceedings of the EDBT conference, pp 29–47
Polyzotis N, Garofalakis M (2006) XSKETCH synopses for XML data graphs. ACM Trans Database Syst 31(3):1014–1063
Ratnasamy S, Francis P, Handley M, Karp R, Shenker S (2001) A scalable content-addressable network. In: Proceedings of SIGCOMM’01, pp 161–172
Saroiu S, Gummadi PK, Gribble SD (2002) A measurement study of peer-to-peer file sharing systems. In: Proc. of multimedia computing and networking
Sartiani C, Manghi P, Ghelli G, Conforti G (2004) XPeer: a self-organizing XML P2P database system. In: Proceedings of the first EDBT workshop on P2P and databases
Schlosser M, Sintek M, Decker S, Nejdl W (2002) A scalable and ontology-based p2p infrastructure for semantic web services. In: Proceedings of the second international conference on peer-to-peer computing. IEEE Computer Society, Washington, DC, USA, pp 104–111
Skobeltsyn G, Hauswirth M, Aberer K (2005) Efficient processing of XPath queries with structured overlay networks. In: OTM conferences, pp 1243–1260
Stoica I, Morris R, Karger D, Kaashoek F, Balakrishnan H (2001) Chord: a scalable peer-to-peer lookup service for internet applications. In: Proceedings of SIGCOMM’01, pp 17–32
Wang Q, Özsu MT (2004) A data locating mechanism for distributed XML data over P2P networks. In: Technical report CS-2004-45, University of Waterloo
Yao BB, Özsu MT, Khandelwal N (2004) XBench benchmark and performance testing of XML DBMSs. In: Proceedings of ICDE’04, p 621
Zhang N, Özsu MT, Aboulnaga A, Ilyas IF (2006) XSEED: accurate and fast cardinality estimation for XPath queries. In: Proceedings of ICDE’06, p 61
Acknowledgements
Both Yingguang Li and Kian-Lee Tan are partially supported by a university research grant R-252-000-237-112.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, Y., Özsu, M.T. & Tan, KL. XCube: Processing XPath queries in a hypercube overlay network. Peer-to-Peer Netw. Appl. 2, 128–145 (2009). https://doi.org/10.1007/s12083-008-0025-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12083-008-0025-3