A security management scheme for failure detector distributed systems based on self-tuning control theory

Xiong, Naixue; Park, Jong Hyuk; Yang, Laurence T.; Koh, Byoung-Soo; Li, Yingshu

doi:10.1007/s10845-009-0315-5

A security management scheme for failure detector distributed systems based on self-tuning control theory

Published: 18 September 2009

Volume 22, pages 333–342, (2011)
Cite this article

Journal of Intelligent Manufacturing Aims and scope Submit manuscript

Naixue Xiong¹,
Jong Hyuk Park²,
Laurence T. Yang³,
Byoung-Soo Koh⁴ &
…
Yingshu Li¹

128 Accesses
Explore all metrics

Abstract

Information security management has become an important research issue in distributed systems, and the detection of failures is a fundamental issue for fault tolerance in large distributed systems. Recently, many people have come to realize that failure detection ought to be provided as some form of generic service, similar to IP address lookup. However, this has not been successful so far; one of the reasons being the fact that classical failure detectors were not designed to satisfy several application requirements simultaneously. More specifically, traditional implementations of failure detectors are often tuned for running over local networks and fail to address some important problems found in wide-area distributed systems with a large number of monitored components, such as Grid systems. In this paper, we study the security management scheme for failure detector distributed systems. We first identify some of the most important QoS problems raised in the context of large wide-area distributed systems. Then we present a novel failure detector scheme combined with self-tuning control theory that can help in solving or optimizing some of these problems. Furthermore, this paper discusses the design and analysis of implementing a scalable failure detection service for such large wide-area distributed systems considering dynamically adjusting the heartbeat streams, so that it satisfies the bottleneck router requirements. The basic z-transformation stability test is used to achieve the stability criterion, which ensures the bounded rate allocation without steady state oscillation. We further show how the online failure detector control algorithm can be used to design a controller, analyze the theoretical aspects of the proposed algorithm and verify its agreement with the simulations in the LAN and WAN case. Simulation results show the efficiency of our scheme in terms of high utilization of the bottleneck link, fast response and good stability of the bottleneck router buffer occupancy as well as of the controlled sending rates. In conclusion, the new security management failure detector algorithm provides a better QoS than an algorithm that is proposed by Stelling et al. (Proceedings of 7th IEEE symposium on high performance distributed computing, pp. 268–278, 1998), Foster et al. (Int J Supercomput Appl, 2001).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A QoS-configurable failure detection service for internet applications

Article Open access 26 September 2016

Failure detection algorithm for Fail-Lagging model applied to HPC

Article 27 March 2022

The missing piece: a distributed system-level diagnosis model for the implementation of unreliable failure detectors

Article 18 August 2023

References

Amir, Y., Dolev, D., Kramer, S., & Malkhi, D. (1992). Transis: A communication sub-system for high availability. In Proceedings 22nd annual international symposium. Fault-Tolerant computing (pp. 76–84).
Babaoglu O., Davoli R., Giachini L.-A., Baker M. G. (1994) Relacs: A communications infrastructure for constructing reliable applications in large-Scale distributed systems BROADCAST project deliverable report. Department of computing science, University of Newcastle upon Tyne, UK
Barmish B. R. (1994) New tools for robustness of linear systems. MacMillan, New York
Google Scholar
Birman K. P., Van Renesse R. (1993) Reliable distributed computing with the Isis toolkit. IEEE CS Press, Los Alamitos
Google Scholar
Braden, R. (Ed.). (1989). Requirements for internet hosts-communication layers, RFC 1122.
Felber, P., Defago, X., Guerraoui, R., & Oser, P. (1999). Failure detectors as first class objects. In Proceedings of the 1st international symposium on distributed-objects and applications (pp. 132–141) (DOA’99), Edinburgh, Scotland.
Fetzer, C., Raynal, M., & Tronel, F. (2001). An adaptive failure detection protocol. In Proceedings of the 8th IEEE pacific rim international symposium on dependable computing (PRDC-8).
Foster I., Kesselman C., Tuecke S. (2001) The anatomy of the grid – enabling scalable virtual organizations. International Journal of Supercomputer Applications, 15: 1–24
Article Google Scholar
Gupta, I., Chandra, T. D., & Goldszmidt, G. S. (2001). On scalable and efficient distributed failure detectors. In Proceedings of the 20th annual ACM symposium on principles of distributed computing (pp. 170–179). ACM Press.
Hayashibara, N., Cherif, A., & Katayama, T. (2002). Failure detectors for large-scale distributed systems. In Proceedings of the 21st IEEE international symposium on reliable distributed systems (SRDS’02) (pp. 404–409). Osaka, Japan.
Hayden, M. G. (1998). The ensemble system, PhD thesis, Department of Computer Science, Cornell University.
He, Y., Xiong, N., & Yang, Y. (2004) Data transmission rate control in computer networks using neural predictive networks. In The 2004 international symposium on parallel processing and applications (ISPA 2004) (pp. 875–887). LNCS 3358.
Moser L. E., Melliar-Smith P. M., Argarwal D. A., Budhia R. K., Lingley-Papadopoulos C. A. (1996) Totem: A fault-tolerant multicast group communication system. Communications of the ACM 39(4): 54–63
Article Google Scholar
Pfister, G. F. (1998). Search of clusters (second edn.). Prentice Hall.
Sergent, N., Defago, X., & Schiper, A. (2001). Impact of a failure detection mechanism on the performance of consensus. In Proceedings of the 8th IEEE Pacific Rim symposium on dependable computing (PRDC-8) (pp. 137–145).
Sotoma, I., & Madeira, E. R. M. (2001). ADAPTATION-algorithms to adaptive fault monitoring and their implementation on CORBA. In Proceedings of 3rd international symposium on distributed-objects and applications (DOA’01) (pp. 219–228). Rome, Italy.
Stelling, P., Foster, I., Kesselman, C., Lee, C., & von Laszewski G. (1998). A fault detection service for wide area distributed computations. In Proceedings of 7th IEEE symposium on high performance distributed computing (pp. 268–278).
Tan L., Pugh A. C., Yin M. (2003) Rate-based congestion control in ATM switching networks using a recursive digital filter. Control Engineering Practice (Special issue on control methods for telecommunication networks) 11(10): 1171–1181
Google Scholar
Tan L., Xiong N., Yang Y. (2004) A PGM-based single- rate multicast congestion control scheme. Journal of Software 15(10): 1538–1546
Google Scholar
Tan, L., Yang, Y., Lin, C., Xiong, N., & Zukerman, M. (2005). Scalable parameter tuning for AVQ. IEEE Communications Letters, 9(1).
van Renesse R., Birman K. P., Maffeis S. (1996) Horus: A flexible group communication system. committee ACM 39(4): 76–83
Article Google Scholar
van Renesse, R., Minsky, Y., & Hayden, M. (1998). A gossip-style failure detection service. In N. Davies, K. Raymond, & J. Seitz, (Eds.), Middleware’98 (pp. 55–70). The Lake District, UK.
Xiong, N., He, Y., Yang, Y., Cao, J., & Lin, C. (2004). An efficient flow control algorithm for multi-rate multicast networks. In 2004 IEEE international workshop on IP operations and management (pp. 74–81), China.
Xiong N., Tan L., Yang Y. (2004) An approach for regulating the transmission rate in multicast congestion control. Journal of China Institute of Communications 11: 142–150
Google Scholar
Xiong, N., Yang, Y., & Defago, X. (2007). Comparative analysis of QoS and memory usage of adaptive failure detectors. In The 13th IEEE pacific rim international symposium on dependable computing (PRDC’07) (pp. 27–34). Melbourne, Australia, 17–19 December.
Zhang X., Shin K. G., Saha D., Kandlur D. D. (2002) Scalable flow control for multicast ABR services in ATM networks. IEEE/ACM Transactions on Networking 10(1): 67–85
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Georgia State University, Atlanta, GA, USA
Naixue Xiong & Yingshu Li
Department of Computer Science and Engineering, Seoul National University of Technology, Seoul, Korea
Jong Hyuk Park
Department of Computer Science, St. Francis Xavier University, Antigonish, NS, Canada
Laurence T. Yang
DigiCAPS Co., Ltd., 938-26 Bangbae-Dong, Seocho-Gu, Korea
Byoung-Soo Koh

Authors

Naixue Xiong
View author publications
You can also search for this author inPubMed Google Scholar
Jong Hyuk Park
View author publications
You can also search for this author inPubMed Google Scholar
Laurence T. Yang
View author publications
You can also search for this author inPubMed Google Scholar
Byoung-Soo Koh
View author publications
You can also search for this author inPubMed Google Scholar
Yingshu Li
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Naixue Xiong.

Additional information

This research has been supported by the US National Science Foundation CAREER Award under Grant No. CCF-0545667.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xiong, N., Park, J.H., Yang, L.T. et al. A security management scheme for failure detector distributed systems based on self-tuning control theory. J Intell Manuf 22, 333–342 (2011). https://doi.org/10.1007/s10845-009-0315-5

Download citation

Received: 31 May 2008
Accepted: 31 August 2009
Published: 18 September 2009
Issue Date: April 2011
DOI: https://doi.org/10.1007/s10845-009-0315-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A security management scheme for failure detector distributed systems based on self-tuning control theory

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A QoS-configurable failure detection service for internet applications

Failure detection algorithm for Fail-Lagging model applied to HPC

The missing piece: a distributed system-level diagnosis model for the implementation of unreliable failure detectors

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now