Skip to main content

Coterie Availability in Sites

  • Conference paper
Distributed Computing (DISC 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3724))

Included in the following conference series:

Abstract

In this paper, we explore new failure models for multi-site systems, which are systems characterized by a collection of sites spread across a wide area network, each site formed by a set of computing nodes running processes. In particular, we introduce two failure models that allow sites to fail, and we use them to derive coteries. We argue that these coteries have better availability than quorums formed by a majority of processes, which are known for having best availability when process failures are independent and identically distributed. To motivate introducing site failures explicitly into a failure model, we present availability data from a production multi-site system, showing that sites are frequently unavailable. We then discuss the implementability of our abstract models, showing possibilities for obtaining these models in practice. Finally, we present evaluation results from running an implementation of the Paxos algorithm on PlanetLab using different quorum constructions. The results show that our constructions have substantially better availability and response time compared to majority coteries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Amir, Y., Wool, A.: Evaluating quorum systems over the Internet. In: Proceedings of the 26th IEEE FTCS, Sendai, Japan, June 1996, pp. 26–37 (1996)

    Google Scholar 

  2. Amir, Y., Wool, A.: Optimal availability quorum systems: Theory and practice. Information Processing Letters 65(5), 223–228 (1998)

    Article  MathSciNet  Google Scholar 

  3. Barbara, D., Garcia-Molina, H.: The vulnerability of vote assignments. ACM Transactions on Computer Systems 4(3), 187–213 (1986)

    Article  Google Scholar 

  4. Bioch, J., Ibaraki, T.: Generating and approximating nondominated coteries. IEEE Transactions on Parallel and Distributed Systems 6(9), 905–914 (1995)

    Article  Google Scholar 

  5. The Biomedical Informatics Research Network (BIRN), http://www.nbirn.net

  6. Busca, J.-M., Bertier, M., Belkouch, F., Sens, P., Arantes, L.: A performance evaluation of a quorum-based state-machine replication algorithm for computing grids. In: Proceedings of the 16th IEEE SBAC-PAD 2004, Foz do Iguaçú, PR, Brazil (October 2004)

    Google Scholar 

  7. Garcia-Molina, H., Barbara, D.: How to assign votes in a distributed system. Journal of the ACM 32(4), 841–860 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  8. Gifford, D.: Weighted voting for replicated data. In: Proceedings of ACM SOSP, Pacific Grove, CA, USA, December 1979, pp. 150–162 (1979)

    Google Scholar 

  9. Gilbert, S., Malewicz, G.: The Quorum Deployment Problem. In: Higashino, T. (ed.) OPODIS 2004. LNCS, vol. 3544, pp. 218–228. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  10. Junqueira, F., Bhagwan, R., Hevia, A., Marzullo, K., Voelker, G.M.: Surviving Internet catastrophes. In: Proceedings of USENIX Tech. Conference, General Track, Anaheim, CA, USA, April 2005, pp. 45–60 (2005)

    Google Scholar 

  11. Junqueira, F., Marzullo, K.: Synchronous consensus for dependent process failures. In: Proceedings of the 23rd IEEE ICDCS, Providence, RI, USA, May 2003, pp. 274–283 (2003)

    Google Scholar 

  12. Junqueira, F., Marzullo, K.: Coterie availability in sites (extended version). Technical report, UC San Diego, La Jolla, CA, USA (June 2005)

    Google Scholar 

  13. Junqueira, F., Marzullo, K.: Replication predicates for dependent-failure algorithms. In: Cunha, J.C., Medeiros, P.D. (eds.) Euro-Par 2005. LNCS, vol. 3648, pp. 617–632. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  14. Junqueira, F., Marzullo, K.: The virtue of dependent failures in multi-site systems. In: Proceedings of the IEEE Workshop on Hot Topics in System Dependability, Supplemental DSN 2005, Yokohama, Japan, June 2005, pp. 242–247 (2005)

    Google Scholar 

  15. Kumar, A.: Hierarchical Quorum Consensus: A new algorithm for managing replicated data. IEEE Transactions on Computers 40(9), 996–1004 (1991)

    Article  Google Scholar 

  16. Lamport, L.: The part-time parliament. ACM Transactions on Computer Systems 16(2), 133–169 (1998)

    Article  Google Scholar 

  17. Lamport, L.: Specifying systems: The TLA+ language and tools for hardware and software engineers. Addison-Wesley, Reading (2002)

    Google Scholar 

  18. Maekawa, M.: A \(\sqrt{n}\) algorithm for mutual exclusion in decentralized systems. ACM Transactions on Computer Systems 3(2), 145–159 (1985)

    Article  Google Scholar 

  19. Naor, M., Wool, A.: The load, capacity, and availability of quorum systems. SIAM Journal on Computing 27(2), 423–447 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  20. Peleg, D., Wool, A.: Crumbling Walls: A class of practical and efficient quorum systems. In: Proceedings of ACM PODC, Ottawa, Ontario, Canada, April 1995, pp. 120–129 (1995)

    Google Scholar 

  21. The Planetlab testbed, http://www.planet-lab.org/

  22. Ross, S.: Introduction to probability models. Harcourt Academic Press, London (2000)

    MATH  Google Scholar 

  23. The TeraGrid project, http://www.teragrid.org/

  24. Yu, H.: Signed Quorum Systems. In: Proceedings of the 23rd ACM PODC, St. John’s, Newfoundland, Canada, July 2004, pp. 246–255 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Junqueira, F., Marzullo, K. (2005). Coterie Availability in Sites. In: Fraigniaud, P. (eds) Distributed Computing. DISC 2005. Lecture Notes in Computer Science, vol 3724. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11561927_3

Download citation

  • DOI: https://doi.org/10.1007/11561927_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29163-3

  • Online ISBN: 978-3-540-32075-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics