Abstract
As increasingly complex computer systems have started playing a controlling role in all aspects of modern life, system availability and associated downtime of technical systems have acquired critical importance. Losses due to system downtime have risen manifold and become wide-ranging. Even though the component level availability of hardware and software has increased considerably, system wide availability still needs improvement as the heterogeneity of components and the complexity of interconnections has gone up considerably too. As systems become more interconnected and diverse, architects are less able to anticipate and design for every interaction among components, leaving such issues to be dealt with at runtime. Therefore, in this paper, we propose an approach for autonomic management of system availability, which provides real-time evaluation, monitoring and management of the availability of systems in critical applications. A hybrid approach is used where analytic models provide the behavioral abstraction of components/subsystems, their interconnections and dependencies and statistical inference is applied on the data from real time monitoring of those components and subsystems, to parameterize the system availability model. The model is solved online (that is, in real time) so that at any instant of time, both the point as well as the interval estimates of the overall system availability are obtained by propagating the point and the interval estimates of each of the input parameters, through the system model. The online monitoring and estimation of system availability can then lead to adaptive online control of system availability.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Trivedi, K.S.: Probability and Statistics with Reliability, Queuing and Computer Science Applications. John Wiley & Sons, New York (2001)
Sahner, R.A., Trivedi, K.S., Puliafito, A.: Performance and Reliability Analysis of Computer Systems: An Example-Based Approach Using the SHARPE Software Package. Kluwer Academic Publishers, Dordrecht (1996)
Leemis, L.M.: Reliability. Probabilistic Models and Statistical Methods. Prentice Hall, New Jersey (1995)
Tang, D., Iyer, R.K.: Dependability Measurement and Modeling of a Multicomputer System. IEEE Transactions on Computers 42(1), 62–75 (1993)
Malhotra, M., Trivedi, K.S.: Dependability Modeling Using Petri Net Based Models. IEEE Transactions on Reliability 44(3), 428–440 (1995)
Cristian, F., Dancey, B., Dehn, J.: Fault Tolerance in Air Traffic Control Systems. ACM Transactions on Computer Systems 14, 265–286 (1996)
Morgan, P., Gaffney, P., Melody, J., Condon, M., Hayden, M.: System Availability Monitoring. IEEE Transactions on Reliability 39(4), 480–485 (1990)
Ibe, O., Howe, R., Trivedi, K.S.: Approximate availability analysis of VAXCluster systems. IEEE Transactions on Reliability R-38(1), 146–152 (1989)
Blake, J.T., Trivedi, K.S.: Reliability analysis of interconnection networks using hierarchical composition. IEEE Transactions on Reliability 32, 111–120 (1989)
Albin, S.L., Chao, S.: Preventive Replacement in Systems with Dependent Components. IEEE Transactions on Reliability 41(2), 230–238 (1992)
Kephart, J.O., Chess, D.M.: The Vision of Autonomic Computing. Computer magazine, 41–50 (January 2003)
Li, L., Vaidyanathan, K., Trivedi, K.S.: An Approach for Estimation of Software Aging in a Web Server. In: Proc. of Intl. Symposium on Empirical Software Engineering (ISESE 2002) (2002)
Garzia, M.R.: Assessing the Reliability of Windows Servers. In: Proc. of Dependable Systems and Networks (DSN 2002) (2003)
Hunter, S.W., Smith, W.E.: Availability modeling and analysis of a two node cluster. In: Proc. of 5th Int. Conf. On Information Systems, Analysis and Synthesis (1999)
Yin, L., Smith, M.A.J., Trivedi, K.S.: Uncertainty Analysis in Reliability Modeling. In: Proc. of the Annual Reliability and Maintainability Symposium (RAMS 2001) (2001)
Dohi, T., Popstojanova, K.-G., Trivedi, K.S.: Statistical Non-Parametric Algorithms to estimate the Optimal Software Rejuvenation Schedule. In: Proc. of Pacific Rim Intl. Symposium on Dependable Computing (PRDC) (2000)
Garg, S., Huang, Y., Kintala, C.M.R., Trivedi, K.S., Yajnik, S.: Performance and Reliability Evaluation of Passive replication Schemes in Application Level fault Tolerance. In: Proc. of 29th Annual Intl. Symposium on Fault Tolerant Computing (FTCS) (1999)
Chen, D.Y., Trivedi, K.S.: Analysis of Periodic Preventive Maintenance with General System Failure Distribution. In: Pacific Rim Intl. Symposium on Dependable Computing (PRDC) (2001)
Long, D., Muir, a., Golding, R.: A Longitudinal Survey of Internet Host Reliability. In: Proc. of the 14th Symposium on Reliable Distributed Systems (1995)
Garg, S., Puliafito, A., Telek, M., Trivedi, K.S.: Analysis of Software Rejuvenation using Markov Regenerative Stochastic Petri Net. In: Proc. of Intl. Symposium on Software Reliability Engineering (ISSRE) (1995)
Fricks, R.M., Ketcham, M.: Steady State Availability Estimation Using Field Failure Data. In: Proc. Annual Reliability and Maintainability Symposium (RAMS 2004) (2004)
Sathaye, A., Ramani, S., Trivedi, K.S.: Availability Models in Practice. In: Proc. of Intl. Workshop on Fault-Tolerant Control and Computing (FTCC-1) (2000)
Logothetis, D., Trivedi, K.: Time-dependent behavior of redundant systems with deterministic repair. In: Proc. of 2nd International Workshop on the Numerical Solution of Markov Chains (1995)
Hughes-Fenchel, G.: A Flexible Clustered Approach to High Availability. In: 27th Int. Symp. on Fault-Tolerant Computing (FTCS-27) (1997)
Epylog Log Analyzer, http://linux.duke.edu/projects/epylog
Sun SNMP Management Agent Guide for Sun Fire B1600, http://docs.sun.com/source/817-1010-10/SNMP_intro.html
Simple Network Management Protocol, http://www.cisco.com/univercd/cc/td/doc/cisintwk/ito_doc/snmp.htm
Hardware Monitoring by lm_sensors, http://secure.netroedge.com/lm78/info.html
Windows 2000 Cluster Service Architecture, http://www.microsoft.com/serviceproviders/whitepapers/win2k
SwiFT for Windows NT, http://www.bell-labs.com/project/swift
IBM Research — Autonomic Computing, http://www.research.ibm.com/autonomic/index.html
Linux Syslog Man Page, http://www.die.net/doc/linux/man/man2/syslog.2.html
Mosberger, D., Jin, T.: httperf—A Tool for Measuring Web Server Performance, http://www.hpl.hp.com/personal/David_Mosberger/httperf.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mishra, K., Trivedi, K.S. (2006). Model Based Approach for Autonomic Availability Management. In: Penkler, D., Reitenspiess, M., Tam, F. (eds) Service Availability. ISAS 2006. Lecture Notes in Computer Science, vol 4328. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11955498_1
Download citation
DOI: https://doi.org/10.1007/11955498_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68724-5
Online ISBN: 978-3-540-68725-2
eBook Packages: Computer ScienceComputer Science (R0)