Abstract
Failure detectors were first proposed as an abstraction that makes it possible to solve consensus in asynchronous systems. A failure detector is a distributed oracle that provides information about the state of processes of a distributed system. This work presents a failure detection service based on a gossip strategy. The service was implemented on the JXTA platform. A simulator was also implemented so the detector could be evaluated for a larger number of processes. Experimental results show that increasing the frequency in which gossip messages are sent gives better results than increasing the fanout. Results are included for fault and recovery detection time and mistake rate of the detector.
This work was partially supported by grant 304013/2009-9 from the Brazilian Research Agency (CNPq).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Chandra, T.D., Toueg, S.: Unreliable failure detectors for reliable distributed systems. J. ACM 43(2), 225–267 (1996)
Chen, W., Toueg, S., Aguilera, M.K.: On the quality of service of failure detectors. IEEE Trans. Comput. 51(1), 13–32 (2002)
Das, A., Gupta, I., Motivala, A.: Swim: scalable weakly-consistent infection-style process group membership protocol. In: Proc. International Conference on Dependable Systems and Networks DSN 2002, pp. 303–312 (June 23-26, 2002)
Fischer, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed consensus with one faulty process. J. ACM 32(2), 374–382 (1985)
Gupta, I., Birman, K.P., van Renesse, R.: Fighting fire with fire: using randomized gossip to combat stochastic scalability limits. Quality and Reliability Engineering International 18(3), 165–184 (2002)
Gupta, I., Chandra, T.D., Goldszmidt, G.S.: On scalable and efficient distributed failure detectors. In: PODC 2001: Proceedings of the Twentieth Annual ACM Symposium on Principles of Distributed Computing, pp. 170–179. ACM, New York (2001)
Jxta website, http://java.net/projects/jxta/ (last access in April 2011)
Lamport, L.: The part-time parliament. ACM Trans. Comput. Syst. 16(2), 133–169 (1998)
MacDougall, M.H.: Simulating Computer Systems, Techniques and Tools. The MIT Press, Cambridge (1997)
Raynal, M.: A short introduction to failure detectors for asynchronous distributed systems. SIGACT News 36(1), 53–70 (2005)
Turek, J., Shasha, D.: The many faces of consensus in distributed systems. Computer 25(6), 8–17 (1992)
van Renesse, R., Minsky, Y., Hayden, M.: A gossip-style failure detection service. Tech. rep., Cornell University, Ithaca, NY, USA (1998)
Wan, Y., Luo, Y., Liu, L., Feng, D.: A dynamic failure detector for p2p storage system. In: NISS (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
de Sousa, L.P., Duarte, E.P. (2011). Experimental Evaluation of a Failure Detection Service Based on a Gossip Strategy. In: Xiang, Y., Cuzzocrea, A., Hobbs, M., Zhou, W. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2011. Lecture Notes in Computer Science, vol 7017. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24669-2_21
Download citation
DOI: https://doi.org/10.1007/978-3-642-24669-2_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24668-5
Online ISBN: 978-3-642-24669-2
eBook Packages: Computer ScienceComputer Science (R0)