ABSTRACT
When designing a system that has strong reliability, security, or survivability requirements one moves in a trade-off space with a delicate balance between causes and effects that have implications on various objective functions such as cost, performance, availability, analyzability, predictability, or feasibility. The key issues are: 1) given an existing system or application, what are the impacts of adjustments in the fault assumptions, 2) given an existing system or application, what are the impacts of adding or subtracting security features, and 3) given performance, availability, security, or survivability requirements, how can one determine feasibility based on the infrastructure- or application-induced limitations.
This research promotes design for survivability and analyzability to allow for effective assessment of the trade-off space from the view of dynamically changing fault models and the analyzability of a system. It gives pointers to new research directions and presents solutions that aid in making operational decisions or assessing impacts of design decisions.
Supplemental Material
Available for Download
Slide presentation for "Design for survivability: a tradeoff space"
- A. Avizienis, et. al., Fundamental Concepts of Dependability, Information Survivability Workshop (ISW-2000), Boston, Massachusetts, Oct. 24--26, 2000.Google Scholar
- M. H. Azadmanesh, and R. M. Kieckhafer, Exploiting Omissive Faults in Synchronous Approximate Agreement, IEEE Trans. Computers, 49(10), pp. 1031--1042, Oct. 2000. Google ScholarDigital Library
- R. J. Ellison, D. A. Fisher, R. C. Linger, H. F. Lipson, T. Longstaff and N. R. Mead, Survivable Network Systems: An Emerging Discipline, Technical Report CMU/SEI-97-TR-013, November 1997, Revised: May 1999.Google ScholarCross Ref
- S. Jafar, A. Krings and T. Gautier, Flexible Rollback Recovery in Dynamic Heterogeneous Grid Computing, IEEE Transactions on Dependable and Secure Computing, (TDSC), in print. Google ScholarDigital Library
- A. Krings, J-L. Roch, and S. Jafar, Certification of Large Distributed Computations with Task Dependencies in Hostile Environments, IEEE Electro/Information Technology Conference, (EIT 2005), May 22--25, Lincoln, Nebraska, 2005.Google ScholarCross Ref
- A. Krings, J.-L. Roch, S. Jafar and S. Varrette, A Probabilistic Approach for Task and Result Certification of Large-scale Distributed Applications in Hostile Environments, Proc. European Grid Conference (EGC2005), in LNCS 3470, Springer Verlag, February 14--16, Amsterdam, Netherlands, 2005. Google ScholarDigital Library
- A. Krings, Survivable Systems, Chapter 5 in: Information Assurance: Dependability and Security in Networked Systems. Morgan Kaufmann Publishers, Yi Qian, James Joshi, David Tipper, and Prashant Krishnamurthy Editors), in press, 2008.Google Scholar
- L. Lamport, et. al., The Byzantine Generals Problem, ACM Transactions on Programming Languages and Systems, Vol. 4, No. 3, pp. 382--401, July 1982. Google ScholarDigital Library
- J. C. Laprie, editor, Dependability: Basic Concepts and Terminology, Springer-Verlag, 1992. Google ScholarDigital Library
- Y. Liu, and K. S. Trivedi, Survivability Quantification: The Analytical Modeling Approach, International Journal of Performability Engineering, Vol. 2, No 1, Jan. 2006, pp. 29--44.Google Scholar
- Z. S. Ma, A. W. Krings, and R. E. Hiromoto, Insect Sensory Systems Inspired Communication and Computing (II): An Engineering Perspective, IEEE-ACM International Conference on Bio-inspired Systems and Signal Processing, (BioSignals 2008), Funchal, Madeira, Portugal, 28--31 January, 2008.Google Scholar
- Z. S. Ma, and A. W. Krings, Survival Analysis Approach to Reliability Analysis and Prognostics and Health Management (PHM), Proc. IEEE AeroSpace Conference, March 1--8, Big Sky, MT, 2008.Google Scholar
- Z. S. Ma, and A. W. Krings, Competing Risks Analysis of Reliability, Survivability, and Prognostics and Health Management (PHM), Proc. IEEE AeroSpace Conference, March 1--8, Big Sky, MT, 2008.Google ScholarCross Ref
- Z. S. Ma, and A. W. Krings, Multivariate Survival Analysis (I): Shared Frailty Approaches to Reliability and Dependence Modeling, Proc. IEEE AeroSpace Conference, March 1--8, Big Sky, MT, 2008.Google ScholarCross Ref
- Z. S. Ma, A. W. Krings, and R. E. Hiromoto, Multivariate Survival Analysis (II): An Overview of Multi-State Models in Biomedicine and Engineering Reliability, IEEE International Conference of Biomedical Engineering and Informatics, (BMEI 2008), 27--30 May, Sanya, Hainan, China, 2008. Google ScholarDigital Library
- Z. S. Ma, and A. W. Krings, Bio-Robustness and Fault Tolerance: A New Perspective on Reliable, Survivable and Evolvable Network Systems, Proc. IEEE AeroSpace Conference, March 1--8, Big Sky, MT, 2008.Google ScholarCross Ref
- Z. A. Ma, and A. W. Krings, Spatial Distribution Patterns, Power Law, and the Agent-based Directed Diffusion Sensor Networks, Sixth Annual IEEE International Conference on Pervasive Computing and Communications, (PerCom 2008), March 17--21, Hong Kong, 2008. Google ScholarDigital Library
- Z. S. Ma, and A. W. Krings, Insect Population Inspired Wireless Sensor Networks: A Unified Architecture with Survival Analysis, Evolutionary Game Theory, and Hybrid Fault Models, IEEE International Conference of Biomedical Engineering and Informatics, (BMEI 2008), 27--30 May, Sanya, Hainan, China, 2008. Google ScholarDigital Library
- N. R. Mead, R. J. Ellison, R. C. Linger, T. Longstaff, and J. McHugh, Survivable Network Analysis Method, Technical Report CMU/SEI-2000-TR-013, Software Engineering Institute, Carnegie Mellon, 2000.Google Scholar
- The PASIS project, Engineering Survivable Storage, Carnegie Mellon University, http://www.pdl.cmu.edu/Pasis/Google Scholar
- L. F. G. Sarmenta, Sabotage-Tolerance Mechanisms for Volunteer Computing Systems, Future Generation Computer Systems, Elsevier Publishing, No. 4, Vol. 18, 2002. Google ScholarDigital Library
- P. Thambidurai, and Y.-K. Park, Interactive Consistency with Multiple Failure Modes, Proc. 7th Symp. on Reliable Distributed Systems, Columbus, OH, pp. 93--100, Oct. 1988.Google ScholarCross Ref
Index Terms
- Design for survivability: a tradeoff space
Recommendations
Resilience and survivability in communication networks: Strategies, principles, and survey of disciplines
The Internet has become essential to all aspects of modern life, and thus the consequences of network disruption have become increasingly severe. It is widely recognised that the Internet is not sufficiently resilient, survivable, and dependable, and ...
A generalized model for network survivability
TAPIA '03: Proceedings of the 2003 conference on Diversity in computingThe high expectation of network to be available and perform at all time has created growing concerns between network operators and engineers all over the globe. Network should be available to users whenever they want to use them. In the wake of an ...
Practical issues for the implementation of survivability and recovery techniques in optical networks
Failures in optical networks are inevitable. They may occur during work being done for the maintenance of other infrastructures, or on a larger scale as the result of an attack or large-scale disaster. As a result, service availability, an important ...
Comments