Abstract
Unstructured peer-to-peer infrastructure has been widely employed to support large-scale distributed applications. Many of these applications, such as location-based services and multimedia content distribution, require the support of range selection queries. Under the widely-adopted query shipping protocols, the cost of query processing is affected by the number of result copies or replicas in the system. Since range queries can return results that include poorly-replicated data items, the cost of these queries is usually dominated by the retrieval cost of these data items. In this work, we propose a popularity-aware prefetch-based approach that can effectively facilitate the caching of poorly-replicated data items that are potentially requested in subsequent range queries, resulting in substantial cost savings. We prove that the performance of retrieving poorly-replicated data items is guaranteed to improve under an increasing query load. Extensive experiments show that the overall range query processing cost decreases significantly under various query load settings.
Similar content being viewed by others
Notes
For conciseness, we use range query and range selection query interchangeably in the remainder of the paper.
We define “correlation” in Section 3.1.
In this work, we focus on the query shipping cost to locate query results, ignoring local processing cost.
The actual number of replicas equals the square root of the corresponding query load size multiplied by a constant factor [4].
This does not conflict with the focus on range query processing since range queries may include multiple point values.
In this experiment, each epoch lasts 5×106 ms.
References
Balke W, Nejdl W, Siberski W, Thaden U (2005) Progressive distributed top-k retrieval in peer-to-peer networks. In: Proc int conf on data engineering, pp 174–185
Barabási A-L, Albert R (1999) Emergence of scaling in random networks. Science 286:509–512
Cheng B, Liu X, Zhang Z, Jin H (2007) A measurement study of a peer-to-peer video-on-demand system. In: Peer-to-peer systems, first international workshop
Cohen E, Shenker S (2002) Replication strategies in unstructured peer-to-peer networks. In: Proc ACM SIGCOMM, pp 177–190
Crainiceanu A, Linga P, Gehrke J, Shanmugasundaram J (2004) Querying peer-to-peer networks using P-Trees. In: Proc 7th int workshop on the world wide web and databases (WebDB), pp 25–30
Edwards HM (1974) Riemann’s zeta function. Academic, London
Gkantsidis C, Mihail M, Saberi A (2004) Random walks in peer-to-peer networks. In: Proc 23rd annual joint conference of the IEEE computer and communications societies
Gopalakrishnan V, Silaghi B, Bhattacharjee B, Keleher P (2004) Adaptive replication in peer-to-peer systems. In: Proc 24th int conf on distributed computing systems, pp 360–369
Huebsch R, Hellerstein JM, Lanham N, Loo BT, Shenker S, Stoica I (2003) Querying the internet with PIER. In: Proc 29th int conf on very large data bases, pp 321–332
Iyer S, Rowstron AIT, Druschel P (2002) Squirrel: a decentralized peer-to-peer web cache. In: Proc ACM SIGACT-SIGOPS symp on principles of dist comp, pp 213–222
Jagadish HV, Ooi BC, Vu QH (2005) BATON: a balanced tree structure for peer-to-peer networks. In: Proc 31th int conf on very large data bases
Jelasity M, Voulgaris S, Guerraoui R, Kermarrec A-M, van Steen M (2007) Gossip-based peer sampling. ACM Trans Comput Syst 25(3)
Kothari A, Agrawal D, Gupta A, Suri S (2003) Range addressable network: a P2P cache architecture for data ranges. In: Peer-to-peer computing, pp 14–22
Ramabhadran S, Ratnasamy S, Hellerstein JM, Shenker S (2004) Brief announcement: prefix hash tree. In: Proc ACM SIGACT-SIGOPS symp on principles of dist comp
Ramakrishnan R, Gehrke J (2002) Database management systems. McGraw-Hill, New York
Ramasubramanian V, Sirer EG (2004) The design and implementation of a next generation name service for the internet. In: Proc ACM SIGCOMM, pp 331–342
Sahin OD, Gupta A, Agrawal D, Abbadi AE (2004) A peer-to-peer framework for caching range queries. In: Proc 20th int conf on data engineering, pp 165–176
Scott D (1992) Multivariate density estimation: theory, practice and visualization. Wiley, New York
Stallings W (2004) Operating systems: internals and design principles. Prentice Hall, Englewood Cliffs
Terpstra WW, Kangasharju J, Leng C, Buchmann AP (2007) Bubblestorm: resilient, probabilistic, and exhaustive peer-to-peer search. In: Proc ACM SIGCOMM, pp 49–60
Valduriez P, Pacitti E (2004) Data management in large-scale P2P systems. In: High performance computing for computational science—VECPAR 2004, 6th international conference, pp 104–118
Wang C, Xiao L, Liu Y, Zheng P (2006) DiCAS: an efficient distributed caching mechanism for P2P systems. IEEE Trans Parallel Distrib Syst 17(10):1097–1109
Yang B, Garcia-Molina H (2002) Improving search in peer-to-peer networks. In: Proc 22nd int conf on distributed computing systems, pp 5–12
Zhang R, Hu YC (2005) Assisted peer-to-peer search with partial indexing. In: The 24st annual joint conference of the IEEE computer and communications societies, pp 1514–1525
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, Q., Daudjee, K. & Özsu, M.T. Popularity-aware prefetch in P2P range caching. Peer-to-Peer Netw. Appl. 3, 145–160 (2010). https://doi.org/10.1007/s12083-009-0054-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12083-009-0054-6