Abstract
In this paper we study the problem of enabling uninterrupted delivery of messages between endpoints, subject to spatially correlated failures in addition to independent failures. We developed a failure model-independent algorithm for computing routing paths based on failure correlations using both a-priory failure statistics together with available real-time monitoring information. The algorithm provides the most cost-efficient message routes that are potentially comprised of multiple simultaneous paths. We also designed and implemented an Internet-based overlay routing service that allows applications to construct and maintain highly resilient end-to-end paths. We have deployed our system over a set of geographically distributed Planetlab nodes. Our experimental results illustrate the feasibility and performance of our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Nedic, D.P., Dobson, I., Kirschen, D.S., Carreras, B.A., Lynch, V.E.: Criticality in a cascading failure blackout model. Int’l. Journal of Electrical Power & Energy Systems 28, 627–633 (2006)
Andersen, D.G., Balakrishnan, H., Kaashoek, M.F., Morris, R.: Resilient overlay networks. In: Proc. 18th ACM SOSP, pp. 131–145 (2001)
Andersen, D.G., Snoeren, A.C., Balakrishnan, H.: Best-path vs. multi-path overlay routing. In: Proc. of ACM IMC, pp. 91–100 (2003)
PlanetLab, http://www.planet-lab.org
Wang, M., Takada, T.: Macrospatial correlation model of seismic ground motions. Earthquake spectra 21, 1137–1156 (2005)
Haeberlen, A., Mislove, A., Druschel, P.: Glacier: Highly durable, decentralized storage despite massive correlated failures. In: Proc.of NSDI, vol. 75 (2005)
Nath, S., Yu, H., Gibbons, P.B., Seshan, S.: Subtleties in tolerating correlated failures in wide-area storage systems. In: Proc. USENIX NSDI, pp. 225–238 (2006)
Cui, W., Stoica, I., Katz, R.H.: Backup path allocation based on a correlated link failure probability model in overlay networks. In: IEEE ICNP, p. 236 (2002)
Pendarakis, D.E., Shi, S., Verma, D.C., Waldvogel, M.: ALMI: An application level multicast infrastructure. In: Proc. of USENIX USITS, pp. 49–60 (2000)
Rowstron, A., Druschel, P.: Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems. In: IFIP/ACM Middleware, pp. 329–350 (2001)
Ratnasamy, S., Handley, M., Karp, R.M., Shenker, S.: Topologically-aware overlay construction and server selection. In: INFOCOM (2002)
Subramanian, L., Padmanabhan, V.N., Katz, R.H.: Geographic properties of internet routing. In: USENIX Annual Technical Conference, pp. 243–259 (2002)
Fan, J., Chang, T., Pendarakis, D., Liu, Z.: Cost-effective configuration of content resiliency services under correlated failures. In: Proc. of DSN, pp. 536–548 (2006)
Rowstron, A., Druschel, P., Yalagandula, P., Nath, S., Yu, H., Gibbons, P.B., Sesha, S.: Beyond availability: Towards a deeper understanding of machine failure characteristics in large distributed systems. In: WORLDS (2004)
Dennis, S.Z., Geels, D., Stoica, I., Katz, R.H.: On failure detection algorithms in overlay networks. In: IEEE INFOCOM (2003)
de Queirós Vieira Martins, E., Marta, M.: A new implementation of yen’s ranking loopless paths algorithm. 4OR-Q J. Oper. Res. 1(2), 121–133 (2003)
Colbourn, J.: The Combinatorics of Network Reliability. Oxford University Press, New York (1987)
Gupta, A., Jain, B.N., Tripathi, S.: QoS aware path protection schemes for MPLS networks. In: Proc. of ICCC, pp. 103–118 (2002)
Han, S., Shin, K.G.: A primary-backup channel approach to dependable real-time communication in multihop networks. IEEE Trans. Computers 47(1), 46–61 (1998)
Xu, Y.: Understanding the performance and resilience of large-scale multi-hop wireless networks. In: NCSU PhD thesis (2010)
Gupta, A., Liskov, B., Rodrigues, R.: Efficient routing for peer-to-peer overlays. In: NSDI, San Francisco, CA (March 2004)
Banerjee, S., Lee, S., Bhattacharjee, B., Srinivasan, A.: Resilient multicast using overlays. In: Proc. of SIGMETRICS, pp. 102–113 (2003)
Mittra, S.: Lolus: A framework for scalable secure multicasting. In: SIGCOMM, pp. 277–288 (1997)
Pappas, V., Zhang, B., Terzis, A., Zhang, L.: Fault-tolerant data delivery for multicast overlay networks. In: Proc. of ICDCS, pp. 670–679 (2004)
Mahambre, S., Bellur, U.: Reliable routing of event notifications over p2p overlay routing substrate in event based middleware. In: IPDPS, pp. 1–8 (2007)
Campbell, J., Gibbons, P.B., Nath, S., Pillai, P., Seshan, S., Sukthankar, R.: Irisnet: an internet-scale architecture for multimedia sensors. In: Proc. of ACM MULTIMEDIA, pp. 81–88 (2005)
Bakkaloglu, M., Wylie, J.J., Wang, C., Ganger, G.R.: Modeling correlated failures in survivable storage systems. In: Proc. of DSN (2002)
Kotla, R., Alvisi, L., Dahlin, M.: Safestore: A durable and practical storage system. In: USENIX (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Karenos, K., Pendarakis, D., Kalogeraki, V., Yang, H., Liu, Z. (2010). Overlay Routing under Geographically Correlated Failures in Distributed Event-Based Systems. In: Meersman, R., Dillon, T., Herrero, P. (eds) On the Move to Meaningful Internet Systems, OTM 2010. OTM 2010. Lecture Notes in Computer Science, vol 6427. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16949-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-16949-6_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16948-9
Online ISBN: 978-3-642-16949-6
eBook Packages: Computer ScienceComputer Science (R0)