skip to main content
10.1145/1244002.1244129acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
Article

A new adaptive accrual failure detector for dependable distributed systems

Published: 11 March 2007 Publication History

Abstract

The detection of failures in distributed environments is a crucial part for developing dependable, robust, and self-healing systems. The contribution of this paper is a new failure detection algorithm that can be described as an adaptive accrual algorithm coupled with features to increase flexiblity and decrease computation costs. Furthermore our evaluation results show a very good detection quality in the case of message losses.

References

[1]
M. Bertier, O. Marin, and P. Sens. Implementation and performance evaluation of an adaptable failure detector. In DSN '02, pages 354--363, Washington, DC, USA, 2002. IEEE Computer Society.
[2]
T. D. Chandra, V. Hadzilacos, and S. Toueg. The weakest failure detector for solving consensus. J. ACM, 43(4):685--722, 1996.
[3]
T. D. Chandra and S. Toueg. Unreliable failure detectors for reliable distributed systems. J. ACM, 43(2):225--267, 1996.
[4]
W. Chen, S. Toueg, and M. K. Aguilera. On the quality of service of failure detectors. In Proceedings of the International Conference on Dependable Systems and Networks (DSN 2000), New York, 2000. IEEE Computer Society Press.
[5]
C. Fetzer, M. Raynal, and F. Tronel. An adaptive failure detection protocol. In PRDC '01: Proceedings of the 2001 Pacific Rim International Symposium on Dependable Computing, page 146, Washington, DC, USA, 2001. IEEE Computer Society.
[6]
M. J. Fischer, N. A. Lynch, and M. S. Paterson. Impossibility of distributed consensus with one faulty process. J. ACM, 32(2):374--382, 1985.
[7]
N. Hayashibara, X. Défago, R. Yared, and T. Katayama. The f accrual failure detector. In SRDS, pages 66--78. IEEE Computer Society, 2004.
[8]
M. Horstmann and M. Kirtland. Dcom architecture. Technical report, http://msdn.microsoft.com/library/backgrnd/html/msdn_dcomarch.htm, July 1997.
[9]
V. Jacobson. Congestion avoidance and control. In SIGCOMM '88: Symposium proceedings on Communications architectures and protocols, pages 314--329, New York, NY, USA, 1988. ACM Press.
[10]
N. Lynch. A hundred impossibility proofs for distributed computing. In PODC '89: Proceedings of the eighth annual ACM Symposium on Principles of distributed computing, pages 1--28, New York, NY, USA, 1989. ACM Press.

Cited By

View all
  • (2024)Formal Verification of Consistency for Systems with Redundant ControllersElectronic Proceedings in Theoretical Computer Science10.4204/EPTCS.399.8399(169-191)Online publication date: 27-Mar-2024
  • (2023)Preliminary Exploration on Node-To-Node Fault Tolerance Coordination in Distributed System2023 IEEE International Conference on Computing (ICOCO)10.1109/ICOCO59262.2023.10397752(219-224)Online publication date: 9-Oct-2023
  • (2023)Consistency Before Availability: Network Reference Point based Failure Detection for Controller Redundancy2023 IEEE 28th International Conference on Emerging Technologies and Factory Automation (ETFA)10.1109/ETFA54631.2023.10275664(1-8)Online publication date: 12-Sep-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '07: Proceedings of the 2007 ACM symposium on Applied computing
March 2007
1688 pages
ISBN:1595934804
DOI:10.1145/1244002
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 March 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. accrual
  2. algorithm
  3. asynchronous systems
  4. dependable systems
  5. distributed systems
  6. failure detection
  7. fault-tolerance
  8. heartbeat
  9. histogram
  10. probability distribution
  11. self-healing

Qualifiers

  • Article

Conference

SAC07
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25
The 40th ACM/SIGAPP Symposium on Applied Computing
March 31 - April 4, 2025
Catania , Italy

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Formal Verification of Consistency for Systems with Redundant ControllersElectronic Proceedings in Theoretical Computer Science10.4204/EPTCS.399.8399(169-191)Online publication date: 27-Mar-2024
  • (2023)Preliminary Exploration on Node-To-Node Fault Tolerance Coordination in Distributed System2023 IEEE International Conference on Computing (ICOCO)10.1109/ICOCO59262.2023.10397752(219-224)Online publication date: 9-Oct-2023
  • (2023)Consistency Before Availability: Network Reference Point based Failure Detection for Controller Redundancy2023 IEEE 28th International Conference on Emerging Technologies and Factory Automation (ETFA)10.1109/ETFA54631.2023.10275664(1-8)Online publication date: 12-Sep-2023
  • (2023)Stab-FD: a cooperative and adaptive failure detector for wide area networksJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.104803(104803)Online publication date: Nov-2023
  • (2020)Optimally Self-Healing IoT ChoreographiesACM Transactions on Internet Technology10.1145/338636120:3(1-20)Online publication date: 24-Jul-2020
  • (2020)Heartbeat Bully: Failure Detection and Redundancy Role Selection for Network-Centric ControllerIECON 2020 The 46th Annual Conference of the IEEE Industrial Electronics Society10.1109/IECON43393.2020.9254494(2126-2133)Online publication date: 18-Oct-2020
  • (2019)Dynamic IoT ChoreographiesIEEE Pervasive Computing10.1109/MPRV.2019.290700318:1(19-27)Online publication date: Jan-2019
  • (2017)Capacitated Next Controller Placement in Software Defined NetworksIEEE Transactions on Network and Service Management10.1109/TNSM.2017.272069914:3(514-527)Online publication date: Sep-2017
  • (2017)Self-adaptive Failure Detector for Peer-to-Peer Distributed System Considering the Link FaultsAdvanced Parallel Processing Technologies10.1007/978-3-319-67952-5_6(64-75)Online publication date: 14-Sep-2017
  • (2016)Implementing a Flexible Failure Detector That Expresses the Confidence in the System2016 Seventh Latin-American Symposium on Dependable Computing (LADC)10.1109/LADC.2016.19(61-70)Online publication date: Oct-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media