Skip to main content

Self-adaptive Failure Detector for Peer-to-Peer Distributed System Considering the Link Faults

  • Conference paper
  • First Online:
Advanced Parallel Processing Technologies (APPT 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10561))

Included in the following conference series:

Abstract

Nowadays, the distributed computing is prevailing in artificial intelligence applications due to the limited computation capacity of single computing node. Generally, distributed computing system contains large scale of computing node, and therefore system breakdown is regarded as usual matter. To enhance the system availability and performance, failure detection dominates important status to recover the system. The traditional failure detector simply equates the link fault with the node fault problem, which greatly affects the resource utilization, fault locating and fast repair. We present a self-adaptive Link-based Failure Detection Agreement DLFDA with an improved node fault detection algorithm, which can accurately distinguish the node fault and link fault. DLFDA can dynamically adjust the detection structure to increase the coverage of the link fault detection, while using Gossip protocol to distribute fault diagnosis results to other system members, which extensively reduces the damage of the system performance. Finally, the experimental results show that our method can meet the requirements of theoretical design.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. He, Y., Jiang, X., Ye, K., Ma, R., Li, X.: HPACS: a high privacy and availability cloud storage platform with matrix encryption. In: Wu, C., Cohen, A. (eds.) APPT 2013. LNCS, vol. 8299, pp. 132–145. Springer, Heidelberg (2013). doi:10.1007/978-3-642-45293-2_10

    Chapter  Google Scholar 

  2. He, Y., Jiang, X., Wu, Z., et al.: Scalability analysis and improvement of hadoop virtual cluster with cost consideration. In: IEEE 7th International Conference on Cloud Computing (CLOUD), pp. 594–601. IEEE Press, New York (2014)

    Google Scholar 

  3. Summary of the Amazon EC2 and Amazon RDS Service Disruption in the US East Region. https://aws.amazon.com/cn/message/65648/

  4. Microsoft apologizes for Outlook, ActiveSync downtime, says error overloaded servers. http://www.theverge.com/2013/8/17/4631622

  5. Guerraoui, R., Hurfinn, M., Mostefaoui, A., Oliveira, R., Raynal, M., Schiper, A.: Consensus in asynchronous distributed systems: a concise guided tour. In: Krakowiak, S., Shrivastava, S. (eds.) Advances in Distributed Systems. LNCS, vol. 1752, pp. 33–47. Springer, Heidelberg (2000). doi:10.1007/3-540-46475-1_2

    Chapter  Google Scholar 

  6. Sar, A., Akkaya, M.: Fault tolerance mechanisms in distributed systems. Int. J. Commun. Netw. Syst. Sci. 8, 471–482 (2015)

    Google Scholar 

  7. Pasin, M., Fontaine, S., Bouchenak, S.: Failure detection in large scale systems: a survey. In: Proceedings of IEEE Network Operations and Management Symposium Workshops, pp. 7–11. IEEE Press, New York (2008)

    Google Scholar 

  8. Lamport, L., Shostak, R., Pease, M.: The Byzantine generals problem. ACM Trans. Program. Lang. Syst. (TOPLAS) 4(3), 382–401 (1982)

    Article  MATH  Google Scholar 

  9. Satzger, B., Pietzowski, A., Trumler, W., et al.: A new adaptive accrual failure detector for dependable distributed systems. In: Proceedings of the 2007 ACM symposium on Applied computing, pp. 551–555. ACM Press, New York (2007)

    Google Scholar 

  10. Hayashibara, N., Defago, X., Yared, R., et al.: The φ accrual failure detector. In: Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, pp. 66–78. IEEE Press, New York (2004)

    Google Scholar 

  11. Hayashibara, N., Défago, X., Katayama, T.: Two-ways adaptive failure detection with the ϕ-failure detector. In: Workshop on Adaptive Distributed Systems (WADiS 2003), pp. 22–27. (2003)

    Google Scholar 

  12. Apache Cassandra: Apache Cassandra. http://planetcassandra.org/what-is-apache-cassandra

  13. Das, A., Gupta, I., Motivala, A.: Swim: Scalable weakly-consistent infection-style process group membership protocol. In: Proceedings of the International Conference on Dependable Systems and Networks (DSN 2002), pp. 303–312. IEEE Press, New York (2002)

    Google Scholar 

  14. Horita, Y., Taura, K., Chikayama, T.: A scalable and efficient self-organizing failure detector for grid applications. In: Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing, pp. 202–210. IEEE Computer Society, New York (2005)

    Google Scholar 

Download references

Acknowledgments

This work is supported by National High Technology Research 863 Major Program of China (No. 2011AA01A207).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaohong Jiang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

He, Y., Jiang, X., Dai, C., Fan, Z. (2017). Self-adaptive Failure Detector for Peer-to-Peer Distributed System Considering the Link Faults. In: Dou, Y., Lin, H., Sun, G., Wu, J., Heras, D., Bougé, L. (eds) Advanced Parallel Processing Technologies. APPT 2017. Lecture Notes in Computer Science(), vol 10561. Springer, Cham. https://doi.org/10.1007/978-3-319-67952-5_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67952-5_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67951-8

  • Online ISBN: 978-3-319-67952-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics