Abstract
Recently, a number of query processors has been proposed for the evaluation of relational queries in structured P2P systems. However, as these approaches do not consider peer or link failures, they cannot be deployed without extensions for real-world applications. We show that typical failures in structured P2P systems can have an unpredictable impact on the correctness of the result. In particular stateful operators that store intermediate results on peers, e.g., the distributed hash join, must protect such results against failures. Although many replication schemes for P2P systems exist, they cannot replicate operator states while the query is processed. In this paper we propose an in-query replication scheme which replicates the state of an operator among the neighbors of the processing peer. Our analytical evaluation shows that the network overhead of the in-query replication is in O(1) regarding network size, i.e., our scheme is scalable. We have carried out an extensive experimental evaluation using simulations as well as a PlanetLab deployment. It confirms the effectiveness and the efficiency of the in-query replication scheme and shows the effectiveness of the routing extension in networks of varying reliability.
Similar content being viewed by others
References
Ratnasamy, S., Francis, P., Handley, M., Karp, R., Schenker, S.: A scalable content-addressable network. In: SIGCOMM ’01: Proceedings of the 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, pp. 161–172. ACM Press, New York (2001). ISBN:1-58113-411-8. http://doi.acm.org/10.1145/383059.383072
Morris, R., Karger, D., Kaashoek, F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: ACM SIGCOMM 2001, San Diego, CA, September 2001
Rowstron, A., Druschel, P.: Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. In: Proceedings of the 18th IFIP/ACM International Conference on Distributed Systems Platforms (Middleware 2001), Heidelberg, Germany, November 2001
Zhao, B.Y., Huang, L., Stribling, J., Rhea, S.C., Joseph, A.D., Kubiatowicz, J.D.: Tapestry: A resilient global-scale overlay for service deployment. IEEE J. Sel. Areas Commun. 22(1), 41–53 (2004)
Huebsch, R., Chun, B.N., Hellerstein, J.M., Loo, B.T., Maniatis, P., Roscoe, T., Shenker, S., Stoica, I., Yumerefendi, A.R.: The architecture of PIER: An internet-scale query processor. In: CIDR, pp. 28–43 (2005)
Rösch, P., Sattler, K.-U., von der Weth, C., Buchmann, E.: Best effort query processing in dht-based P2P systems. In: ICDE Workshops, p. 1186 (2005)
Wu, S., Li, J., Ooi, B.C., Tan, K.-L.: Just-in-time query retrieval over partially indexed data on structured P2P overlays. In: SIGMOD ’08: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 279–290. ACM, New York (2008). ISBN:978-1-60558-102-6. http://doi.acm.org/10.1145/1376616.1376647
Castro, M., Jones, M.B., Kermarrec, A.-M., Rowstron, A., Theimer, M., Wang, H., Wolman, A.: An evaluation of scalable application-level multicast built using peer-to-peer overlays. In: Infocom’03, April 2003
Ratnasamy, S., Handley, M., Karp, R.M., Shenker, S.: Application-level multicast using content-addressable networks. In: NGC ’01: Proceedings of the Third International COST264 Workshop on Networked Group Communication, pp. 14–29. Springer-Verlag, London (2001). ISBN:3-540-42824-0
Gao, J., Steenkiste, P.: An adaptive protocol for efficient support of range queries in DHT-based systems. In: ICNP ’04: Proceedings of the Network Protocols, 12th IEEE International Conference on (ICNP’04), pp. 239–250. IEEE Computer Society, Washington (2004). ISBN:0-7695-2161-4
Ramabhadran, S., Hellerstein, J., Ratnasamy, S., Shenker, S.: Prefix hash tree – an indexing data structure over distributed hash tables (2004)
Andrzejak, A., Xu, Z.: Scalable, efficient range queries for grid information services. In: P2P ’02: Proceedings of the Second International Conference on Peer-to-Peer Computing, p. 33. IEEE Computer Society, Washington (2002). ISBN:0-7695-1810-9
Graefe, G.: Query evaluation techniques for large databases. ACM Comput. Surv. 25(2), 73–169 (1993). http://doi.acm.org/10.1145/152610.152611
Wilschut, A.N., Apers, P.M.G.: Dataflow query execution in a parallel main-memory environment. In: PDIS ’91: Proceedings of the First International Conference on Parallel and Distributed Information Systems, pp. 68–77. IEEE Computer Society Press, Los Alamitos (1991). ISBN:0-8186-2295-4
Gopalakrishnan, V., Silaghi, B., Bhattacharjee, B., Keleher, P.: Adaptive replication in peer-to-peer systems (2003)
Pitoura, T., Ntarmos, N., Triantafillou, P.: Replication, load balancing and efficient range query processing in DHTs. In: 10th International Conference on Extending Database Technology (EDBT06) (2006)
Waldvogel, M., Hurley, P., Bauer, D.: Dynamic replica management in distributed hash tables. Research Report RZ-3502, IBM, July 2003
Chaudhuri, S., Motwani, R.: On sampling and relational operators. IEEE Data Eng. Bull. 22(4), 41–46 (1999)
Wouhaybi, R.H., Campbell, A.T.: Building resilient low-diameter peer-to-peer topologies. Comput. Netw. 52(5), 1019–1039 (2008). http://dx.doi.org/10.1016/j.comnet.2007.11.018
Bonnet, P., Tomasic, A.: Partial answers for unavailable data sources. In: FQAS ’98: Proceedings of the Third International Conference on Flexible Query Answering Systems, pp. 43–54. Springer-Verlag, London (1998). ISBN:3-540-65082-2
Palma, W., Akbarinia, R., Pacitti, E., Valduriez, P.: DHTJoin: processing continuous join queries using DHT networks. Distrib. Parallel Databases 26(2–3), 291–317 (2009). doi:10.1007/s10619-009-7054-7
Hauglid, J.O., Nørvåg, K.: PROQID: partial restarts of queries in distributed databases. In: CIKM ’08: Proceeding of the 17th ACM Conference on Information and Knowledge Management, pp. 1251–1260. ACM, New York (2008). ISBN:978-1-59593-991-3. http://doi.acm.org/10.1145/1458082.1458247
Smith, J., Watson, P.: Fault-tolerance in distributed query processing. In: IDEAS ’05: Proceedings of the 9th International Database Engineering & Application Symposium, pp. 329–338. IEEE Computer Society, Washington (2005). ISBN:0-7695-2404-4. http://dx.doi.org/10.1109/IDEAS.2005.29
Skeen, D., Stonebraker, M.: A formal model of crash recovery in a distributed system. In: Concurrency Control and Reliability in Distributed Systems, pp. 295–317. Van Nostrand Reinhold Co., New York (1987). ISBN:0-442-21148-1
Ratnasamy, S., Handley, M., Karp, R., Shenker, S.: Topologically-aware overlay construction and server selection. INFOCOM 2002. Twenty-first Annual Joint Conference of the IEEE Computer and Communications Societies. Proc. IEEE, vol. 3, pp. 1190–1199 (2002). doi:10.1109/INFCOM.2002.1019369
Sattler, K., Rösch, P., Buchmann, E., Böhm, K.: A physical query algebra for DHT-based P2P systems. In: 6th Workshop on Distributed Data and Structures (WDAS’2004), Lausanne, 2004
Buchmann, E., Böhm, K.: How to run experiments with large peer-to-peer data structure. In: IPDPS (2004)
TPC-H standard specification revision 2.6.0. Transaction Processing Performance Council, 777 N. First Street, Suite 600, San Jose, CA 95112-6311, USA, October 2006
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bestehorn, M., von der Weth, C., Buchmann, E. et al. Fault-tolerant query processing in structured P2P-systems. Distrib Parallel Databases 28, 33–66 (2010). https://doi.org/10.1007/s10619-010-7064-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-010-7064-5