Abstract
In this paper, we explore new failure models for multi-site systems, which are systems characterized by a collection of sites spread across a wide area network, each site formed by a set of computing nodes running processes. In particular, we introduce two failure models that allow sites to fail, and we use them to derive coteries. We argue that these coteries have better availability than quorums formed by a majority of processes, which are known for having best availability when process failures are independent and identically distributed. To motivate introducing site failures explicitly into a failure model, we present availability data from a production multi-site system, showing that sites are frequently unavailable. We then discuss the implementability of our abstract models, showing possibilities for obtaining these models in practice. Finally, we present evaluation results from running an implementation of the Paxos algorithm on PlanetLab using different quorum constructions. The results show that our constructions have substantially better availability and response time compared to majority coteries.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Amir, Y., Wool, A.: Evaluating quorum systems over the Internet. In: Proceedings of the 26th IEEE FTCS, Sendai, Japan, June 1996, pp. 26–37 (1996)
Amir, Y., Wool, A.: Optimal availability quorum systems: Theory and practice. Information Processing Letters 65(5), 223–228 (1998)
Barbara, D., Garcia-Molina, H.: The vulnerability of vote assignments. ACM Transactions on Computer Systems 4(3), 187–213 (1986)
Bioch, J., Ibaraki, T.: Generating and approximating nondominated coteries. IEEE Transactions on Parallel and Distributed Systems 6(9), 905–914 (1995)
The Biomedical Informatics Research Network (BIRN), http://www.nbirn.net
Busca, J.-M., Bertier, M., Belkouch, F., Sens, P., Arantes, L.: A performance evaluation of a quorum-based state-machine replication algorithm for computing grids. In: Proceedings of the 16th IEEE SBAC-PAD 2004, Foz do Iguaçú, PR, Brazil (October 2004)
Garcia-Molina, H., Barbara, D.: How to assign votes in a distributed system. Journal of the ACM 32(4), 841–860 (1985)
Gifford, D.: Weighted voting for replicated data. In: Proceedings of ACM SOSP, Pacific Grove, CA, USA, December 1979, pp. 150–162 (1979)
Gilbert, S., Malewicz, G.: The Quorum Deployment Problem. In: Higashino, T. (ed.) OPODIS 2004. LNCS, vol. 3544, pp. 218–228. Springer, Heidelberg (2005)
Junqueira, F., Bhagwan, R., Hevia, A., Marzullo, K., Voelker, G.M.: Surviving Internet catastrophes. In: Proceedings of USENIX Tech. Conference, General Track, Anaheim, CA, USA, April 2005, pp. 45–60 (2005)
Junqueira, F., Marzullo, K.: Synchronous consensus for dependent process failures. In: Proceedings of the 23rd IEEE ICDCS, Providence, RI, USA, May 2003, pp. 274–283 (2003)
Junqueira, F., Marzullo, K.: Coterie availability in sites (extended version). Technical report, UC San Diego, La Jolla, CA, USA (June 2005)
Junqueira, F., Marzullo, K.: Replication predicates for dependent-failure algorithms. In: Cunha, J.C., Medeiros, P.D. (eds.) Euro-Par 2005. LNCS, vol. 3648, pp. 617–632. Springer, Heidelberg (2005)
Junqueira, F., Marzullo, K.: The virtue of dependent failures in multi-site systems. In: Proceedings of the IEEE Workshop on Hot Topics in System Dependability, Supplemental DSN 2005, Yokohama, Japan, June 2005, pp. 242–247 (2005)
Kumar, A.: Hierarchical Quorum Consensus: A new algorithm for managing replicated data. IEEE Transactions on Computers 40(9), 996–1004 (1991)
Lamport, L.: The part-time parliament. ACM Transactions on Computer Systems 16(2), 133–169 (1998)
Lamport, L.: Specifying systems: The TLA+ language and tools for hardware and software engineers. Addison-Wesley, Reading (2002)
Maekawa, M.: A \(\sqrt{n}\) algorithm for mutual exclusion in decentralized systems. ACM Transactions on Computer Systems 3(2), 145–159 (1985)
Naor, M., Wool, A.: The load, capacity, and availability of quorum systems. SIAM Journal on Computing 27(2), 423–447 (1998)
Peleg, D., Wool, A.: Crumbling Walls: A class of practical and efficient quorum systems. In: Proceedings of ACM PODC, Ottawa, Ontario, Canada, April 1995, pp. 120–129 (1995)
The Planetlab testbed, http://www.planet-lab.org/
Ross, S.: Introduction to probability models. Harcourt Academic Press, London (2000)
The TeraGrid project, http://www.teragrid.org/
Yu, H.: Signed Quorum Systems. In: Proceedings of the 23rd ACM PODC, St. John’s, Newfoundland, Canada, July 2004, pp. 246–255 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Junqueira, F., Marzullo, K. (2005). Coterie Availability in Sites. In: Fraigniaud, P. (eds) Distributed Computing. DISC 2005. Lecture Notes in Computer Science, vol 3724. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11561927_3
Download citation
DOI: https://doi.org/10.1007/11561927_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29163-3
Online ISBN: 978-3-540-32075-3
eBook Packages: Computer ScienceComputer Science (R0)