Abstract
The dependability of a system can be experimentally evaluated at different phases of its life cycle. In the design phase, computer-aided design (CAD) environments are used to evaluate the design via simulation, including simulated fault injection. Such fault injection tests the effectiveness of fault-tolerant mechanisms and evaluates system dependability, providing timely feedback to system designers. Simulation, however, requires accurate input parameters and validation of output results. Although the parameter estimates can be obtained from past measurements, this is often complicated by design and technology changes. In the prototype phase, the system runs under controlled workload conditions. In this stage, controlled physical fault injection is used to evaluate the system behavior under faults, including the detection coverage and the recovery capability of various fault tolerance mechanisms. Fault injection on the real system can provide information about the failure process, from fault occurrence to system recovery, including error latency, propagation, detection, and recovery (which may involve reconfiguration). But this type of fault injection can only study artificial faults; it cannot provide certain important dependability measures, such as mean time between failures (MTBF) and availability. In the operational phase, a direct measurement-based approach can be used to measure systems in the field under real workloads. The collected data contain a large amount of information about naturally occurring errors/failures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Arlitt, M.F., Williamson, C. L.: Web Server Workload Characterization: The Search for Invariants. SIGMETRICS’ 96. (1996) 126–137
Butner, S.E., Iyer R.K.: A Statistical Study of Reliability and System Load at SLAC. Proc. 10th Int. Symp. Fault-Tolerant Computing. (1980) 207–209
Castillo, X., Siewiorek, D.P.: Workload, Performance, and Reliability of Digital Computer Systems. Proc. 11th Int. Symp. Fault-Tolerant Computing. (1981) 84–89
Castillo, X., Siewiorek, D.P.: Workload Dependent Software Reliability Prediction Model. Proc. 12th Int. Symp. Fault-Tolerant Computing. (1982) 279–286
Chimoy, B.: Dynamics of Internet Routing Information. Proc. SIGCOMM’ 93. (1993) 45–52
Dugan, J.B.: Correlated Hardware Failures in Redundant Systems. Proc. 2nd IFIP Working Conf. Dependable Computing for Critical Applications. (1991)
Gray, J.: A Census of Tandem System Availability Between 1985 and 1990. IEEE Trans. Reliability. 39 (1990) 409–418
Hansen, J.P., Siewiorek, D.P.: Models for Time Coalescence in Event Logs. Proc. 22nd Int. Symp. Fault-Tolerant Computing. (1992) 221–227
Hsueh, M.C., Iyer, R.K., Trivedi, K.S.: Performability Modeling Based on Real Data: A Case Study. IEEE Trans. Computers. 37 (1988) 478–484
Iyer, R.K., Rossetti, D.J.: A Statistical Load Dependency Model for CPU Errors at SLAC. Proc. 12th Int. Symp. Fault-Tolerant Computing. (1982) 363–372
Iyer, R.K., Velardi, P.: Hardware-related Software Errors: Measurement and Analysis. IEEE Trans. Software Engineering. SE-11 (1985) 223–231
Iyer, R.K., Rossetti, D.J.: Effect of System Workload on Operating System Reliability: A Study on IBM 3081. IEEE Trans. Software Engineering. SE-11 (1985) 1438–1448
Iyer, R.K., Rossetti, D.J., Hsueh, M.C.: Measurement and Modeling of Computer Reliability as Affected by System Activity. ACM Trans. Computer Systems. 4 (1986) 214–237
Iyer, R.K., Young, L.T., Iyer, P.V.K.: Automatic Recognition of Intermittent Failures: An Experimental Study of Field Data. IEEE Trans. Computers. 39 (1990) 525–527
Iyer, R.K., Tang D.: Experimental Analysis of Computer System Dependability. Chapter 5 in Fault Tolerant Computer Design, D.K. Pradhan, Prentice Hall. (1996) 282–392
Jacobson, V.: traceroute, ftp://ftp.ee.lbl.gov/traceroute.tar.Z (1989)
Kalyanakrishnam, M., Iyer, R.K., Patel, J.: Reliability of Internet Hosts: A Case Study from End User’s Perspective. Proc. 6th Int. Conf. on Computer Communications and Networks. (1996) 418–423
Kalyanakrishnam, M.: Analysis of Failures in Windows NT Systems. Master Thesis, Technical Report CRHC 98-08, University of Illinois at Urbana-Champaign. (1998)
Lee, I., Iyer, R.K., Tang, D.: Error/Failure Analysis Using Event Logs from Fault Tolerant Systems. Proc. 21st Int. Symp. Fault-Tolerant Computing. (1991) 10–17
Lee, I., Iyer, R.K.: Analysis of Software Halts in Tandem System. Proc. 3rd Int. Symp. Software Reliability Engineering. (1991) 227–236
Lee, I., Tang, D., Iyer, R.K., Hsueh, M.C.: Measurement-Based Evaluation of Operating System Fault Tolerance. IEEE Trans. Reliability. 42 (1993) 238–249
Lee, I., Iyer, R.K.: Faults, Symptoms, and Software Fault tolerance in the Tandem GUARDIAN90 Operating System. Proc. 23rd Int. Symp. Fault-Tolerant Computing. (1993) 20–29
Lin, T.T., Siewiorek, D.P.: Error Log Analysis:Statistical Modeling and Heuristic Trend Analysis. IEEE Trans. Reliability. 39 (1990) 238–249
Long, D., Muir, A., Golding, R.: A Longitudinal Survey of Internet Host Reliability. Proc. Symposium on Reliable Distributed Systems. (1995) 2–9
Maxion, A.: Anomaly Detection for Diagnosis. Proc. 20th Int. Symp. Fault-Tolerant Computing. (1990) 20–27
Maxion, R.A., Feather, F.E.: A Case Study of Ethernet Anomalies in a Distributed Computing Environment. IEEE Trans. Reliability. 39 (1990) 433–443
Maxion, R.A., Olszewski, R.T.: Detection and Discrimination of Injected Network Faults. Proc. 23rd Int. Symp. Fault-Tolerant Computing. (1993) 198–207
McConnel, S.R., Siewiorek, D.P., Tsao, M.M.: The Measurement and Analysis of Transient Errors in Digital Computer Systems. Proc. 9th Int. Symp. Fault-Tolerant Computing. (1979) 67–70
Paxson, V.: Towards a Framework for Defining Internet Performance Metrics. Proc. INET’ 96.
Paxson, V.: End-to-End Routing Behavior in the Internet. SIGCOMM ”96. (1996)
PC Magazine. 16 (1997) 101–124
Sahner, R.A., Trivedi, K.S.: Reliability Modeling Using SHARPE. IEEE Trans. Reliability. R-36 (1987) 186–193
SAS User’s Guide: Basics. SAS Institute. (1985)
Siewiorek, D.P., Kini, V., Mashburn, H., McConnel, S.R., Tsao, M.: A Case Study of C.mmp, Cm, and C.vmp: Part I-Experience with Fault Tolerance in Multiprocessor Systems. Proc. of IEEE. 66 (1978) 1178–1199
Siewiorek, D.P., Swarz. R.W.: Reliable Computer Systems: Design and Evaluation. Digital Press. (1992)
Stevens, W.R.: TCP/IP Illustrated, Volume 1: The Protocols. Addison-Wesley. (1994)
Sullivan, M.S., Chillarege, R.: Software Defects and Their Impact on System Availability-A Study of Field Failures in Operating Systems. Proc. 21st Int. Symp. Fault-Tolerant Computing. (1991) 2–9
Sullivan, M.S., Chillarege, R.: A Comparison of Software Defects in Database Management Systems and Operating Systems. Proc. 22nd Int. Symp. Fault-Tolerant Computing. (1992) 475–484
Tang, D., Iyer, R.K., Subramani, S.: Failure Analysis andModeling of a VAXcluster System. Proc. 20th Int. Symp. Fault-Tolerant Computing. (1990) 244–251
Tang, D., Iyer, R.K.: Impact of Correlated Failures on Dependability in a VAX-cluster System. Proc. 2nd IFIP Working Conf. Dependable Computing for Critical Applications. (1991)
Tang, D., Iyer, R.K.: Analysis and Modeling of Correlated Failures in Multicomputer Systems. IEEE Trans. Computers. 41 (1992) 567–577
Tang, D., Iyer, R.K.: Analysis of the VAX/VMS Error Logs in Multicomputer Environments-A Case Study of Software Dependability. Proc. 3rd Int. Symp. Software Reliability Engineering. (1992) 216–226
Tang, D., Iyer, R.K.: Dependability Measurement and Modeling of a Multicomputer System. IEEE Trans. Computers. 42 (1993) 62–75
Tang, D., Iyer, R.K.: MEASURE+-A Measurement-Based Dependability Analysis Package. Proc. ACM SIGMETRICS Conf. Measurement and Modeling of Computer Systems. (1993) 110–121
Thakur, A., Iyer, R.K.: Analyze-NOW-An Environment for Collection and Analysis of Failures in a Network of Workstations. IEEE Trans. Reliability. R-46 (1996) 561–570
Tsao, M.M., Siewiorek, D.P.: Trend Analysis on System Error Files. Proc. 13th Int. Symp. Fault-Tolerant Computing. (1983) 116–119
Velardi, P., Iyer, R.K.: A Study of Software Failures and Recovery in the MVS Operating System. IEEE Trans. Computers. C-33 (1984) 564–568.
Wein, A.S., Sathaye, A.: Validating Complex Computer System Availability Models. IEEE Trans. Reliability. 39 (1990) 468–479
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Iyer, R.K., Kalbarczyk, Z., Kalyanakrishnan, M. (2000). Measurement-Based Analysis of Networked System Availability. In: Haring, G., Lindemann, C., Reiser, M. (eds) Performance Evaluation: Origins and Directions. Lecture Notes in Computer Science, vol 1769. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46506-5_7
Download citation
DOI: https://doi.org/10.1007/3-540-46506-5_7
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67193-0
Online ISBN: 978-3-540-46506-5
eBook Packages: Springer Book Archive