Skip to main content

Measurement-Based Analysis of Networked System Availability

  • Chapter
  • First Online:
Book cover Performance Evaluation: Origins and Directions

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1769))

Abstract

The dependability of a system can be experimentally evaluated at different phases of its life cycle. In the design phase, computer-aided design (CAD) environments are used to evaluate the design via simulation, including simulated fault injection. Such fault injection tests the effectiveness of fault-tolerant mechanisms and evaluates system dependability, providing timely feedback to system designers. Simulation, however, requires accurate input parameters and validation of output results. Although the parameter estimates can be obtained from past measurements, this is often complicated by design and technology changes. In the prototype phase, the system runs under controlled workload conditions. In this stage, controlled physical fault injection is used to evaluate the system behavior under faults, including the detection coverage and the recovery capability of various fault tolerance mechanisms. Fault injection on the real system can provide information about the failure process, from fault occurrence to system recovery, including error latency, propagation, detection, and recovery (which may involve reconfiguration). But this type of fault injection can only study artificial faults; it cannot provide certain important dependability measures, such as mean time between failures (MTBF) and availability. In the operational phase, a direct measurement-based approach can be used to measure systems in the field under real workloads. The collected data contain a large amount of information about naturally occurring errors/failures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arlitt, M.F., Williamson, C. L.: Web Server Workload Characterization: The Search for Invariants. SIGMETRICS’ 96. (1996) 126–137

    Google Scholar 

  2. Butner, S.E., Iyer R.K.: A Statistical Study of Reliability and System Load at SLAC. Proc. 10th Int. Symp. Fault-Tolerant Computing. (1980) 207–209

    Google Scholar 

  3. Castillo, X., Siewiorek, D.P.: Workload, Performance, and Reliability of Digital Computer Systems. Proc. 11th Int. Symp. Fault-Tolerant Computing. (1981) 84–89

    Google Scholar 

  4. Castillo, X., Siewiorek, D.P.: Workload Dependent Software Reliability Prediction Model. Proc. 12th Int. Symp. Fault-Tolerant Computing. (1982) 279–286

    Google Scholar 

  5. Chimoy, B.: Dynamics of Internet Routing Information. Proc. SIGCOMM’ 93. (1993) 45–52

    Google Scholar 

  6. Dugan, J.B.: Correlated Hardware Failures in Redundant Systems. Proc. 2nd IFIP Working Conf. Dependable Computing for Critical Applications. (1991)

    Google Scholar 

  7. Gray, J.: A Census of Tandem System Availability Between 1985 and 1990. IEEE Trans. Reliability. 39 (1990) 409–418

    Article  Google Scholar 

  8. Hansen, J.P., Siewiorek, D.P.: Models for Time Coalescence in Event Logs. Proc. 22nd Int. Symp. Fault-Tolerant Computing. (1992) 221–227

    Google Scholar 

  9. Hsueh, M.C., Iyer, R.K., Trivedi, K.S.: Performability Modeling Based on Real Data: A Case Study. IEEE Trans. Computers. 37 (1988) 478–484

    Article  Google Scholar 

  10. Iyer, R.K., Rossetti, D.J.: A Statistical Load Dependency Model for CPU Errors at SLAC. Proc. 12th Int. Symp. Fault-Tolerant Computing. (1982) 363–372

    Google Scholar 

  11. Iyer, R.K., Velardi, P.: Hardware-related Software Errors: Measurement and Analysis. IEEE Trans. Software Engineering. SE-11 (1985) 223–231

    Article  Google Scholar 

  12. Iyer, R.K., Rossetti, D.J.: Effect of System Workload on Operating System Reliability: A Study on IBM 3081. IEEE Trans. Software Engineering. SE-11 (1985) 1438–1448

    Article  Google Scholar 

  13. Iyer, R.K., Rossetti, D.J., Hsueh, M.C.: Measurement and Modeling of Computer Reliability as Affected by System Activity. ACM Trans. Computer Systems. 4 (1986) 214–237

    Article  Google Scholar 

  14. Iyer, R.K., Young, L.T., Iyer, P.V.K.: Automatic Recognition of Intermittent Failures: An Experimental Study of Field Data. IEEE Trans. Computers. 39 (1990) 525–527

    Article  Google Scholar 

  15. Iyer, R.K., Tang D.: Experimental Analysis of Computer System Dependability. Chapter 5 in Fault Tolerant Computer Design, D.K. Pradhan, Prentice Hall. (1996) 282–392

    Google Scholar 

  16. Jacobson, V.: traceroute, ftp://ftp.ee.lbl.gov/traceroute.tar.Z (1989)

  17. Kalyanakrishnam, M., Iyer, R.K., Patel, J.: Reliability of Internet Hosts: A Case Study from End User’s Perspective. Proc. 6th Int. Conf. on Computer Communications and Networks. (1996) 418–423

    Google Scholar 

  18. Kalyanakrishnam, M.: Analysis of Failures in Windows NT Systems. Master Thesis, Technical Report CRHC 98-08, University of Illinois at Urbana-Champaign. (1998)

    Google Scholar 

  19. Lee, I., Iyer, R.K., Tang, D.: Error/Failure Analysis Using Event Logs from Fault Tolerant Systems. Proc. 21st Int. Symp. Fault-Tolerant Computing. (1991) 10–17

    Google Scholar 

  20. Lee, I., Iyer, R.K.: Analysis of Software Halts in Tandem System. Proc. 3rd Int. Symp. Software Reliability Engineering. (1991) 227–236

    Google Scholar 

  21. Lee, I., Tang, D., Iyer, R.K., Hsueh, M.C.: Measurement-Based Evaluation of Operating System Fault Tolerance. IEEE Trans. Reliability. 42 (1993) 238–249

    Article  Google Scholar 

  22. Lee, I., Iyer, R.K.: Faults, Symptoms, and Software Fault tolerance in the Tandem GUARDIAN90 Operating System. Proc. 23rd Int. Symp. Fault-Tolerant Computing. (1993) 20–29

    Google Scholar 

  23. Lin, T.T., Siewiorek, D.P.: Error Log Analysis:Statistical Modeling and Heuristic Trend Analysis. IEEE Trans. Reliability. 39 (1990) 238–249

    Article  Google Scholar 

  24. Long, D., Muir, A., Golding, R.: A Longitudinal Survey of Internet Host Reliability. Proc. Symposium on Reliable Distributed Systems. (1995) 2–9

    Google Scholar 

  25. Maxion, A.: Anomaly Detection for Diagnosis. Proc. 20th Int. Symp. Fault-Tolerant Computing. (1990) 20–27

    Google Scholar 

  26. Maxion, R.A., Feather, F.E.: A Case Study of Ethernet Anomalies in a Distributed Computing Environment. IEEE Trans. Reliability. 39 (1990) 433–443

    Article  Google Scholar 

  27. Maxion, R.A., Olszewski, R.T.: Detection and Discrimination of Injected Network Faults. Proc. 23rd Int. Symp. Fault-Tolerant Computing. (1993) 198–207

    Google Scholar 

  28. McConnel, S.R., Siewiorek, D.P., Tsao, M.M.: The Measurement and Analysis of Transient Errors in Digital Computer Systems. Proc. 9th Int. Symp. Fault-Tolerant Computing. (1979) 67–70

    Google Scholar 

  29. Paxson, V.: Towards a Framework for Defining Internet Performance Metrics. Proc. INET’ 96.

    Google Scholar 

  30. Paxson, V.: End-to-End Routing Behavior in the Internet. SIGCOMM ”96. (1996)

    Google Scholar 

  31. PC Magazine. 16 (1997) 101–124

    Google Scholar 

  32. Sahner, R.A., Trivedi, K.S.: Reliability Modeling Using SHARPE. IEEE Trans. Reliability. R-36 (1987) 186–193

    Article  Google Scholar 

  33. SAS User’s Guide: Basics. SAS Institute. (1985)

    Google Scholar 

  34. Siewiorek, D.P., Kini, V., Mashburn, H., McConnel, S.R., Tsao, M.: A Case Study of C.mmp, Cm, and C.vmp: Part I-Experience with Fault Tolerance in Multiprocessor Systems. Proc. of IEEE. 66 (1978) 1178–1199

    Article  Google Scholar 

  35. Siewiorek, D.P., Swarz. R.W.: Reliable Computer Systems: Design and Evaluation. Digital Press. (1992)

    Google Scholar 

  36. Stevens, W.R.: TCP/IP Illustrated, Volume 1: The Protocols. Addison-Wesley. (1994)

    Google Scholar 

  37. Sullivan, M.S., Chillarege, R.: Software Defects and Their Impact on System Availability-A Study of Field Failures in Operating Systems. Proc. 21st Int. Symp. Fault-Tolerant Computing. (1991) 2–9

    Google Scholar 

  38. Sullivan, M.S., Chillarege, R.: A Comparison of Software Defects in Database Management Systems and Operating Systems. Proc. 22nd Int. Symp. Fault-Tolerant Computing. (1992) 475–484

    Google Scholar 

  39. Tang, D., Iyer, R.K., Subramani, S.: Failure Analysis andModeling of a VAXcluster System. Proc. 20th Int. Symp. Fault-Tolerant Computing. (1990) 244–251

    Google Scholar 

  40. Tang, D., Iyer, R.K.: Impact of Correlated Failures on Dependability in a VAX-cluster System. Proc. 2nd IFIP Working Conf. Dependable Computing for Critical Applications. (1991)

    Google Scholar 

  41. Tang, D., Iyer, R.K.: Analysis and Modeling of Correlated Failures in Multicomputer Systems. IEEE Trans. Computers. 41 (1992) 567–577

    Article  Google Scholar 

  42. Tang, D., Iyer, R.K.: Analysis of the VAX/VMS Error Logs in Multicomputer Environments-A Case Study of Software Dependability. Proc. 3rd Int. Symp. Software Reliability Engineering. (1992) 216–226

    Google Scholar 

  43. Tang, D., Iyer, R.K.: Dependability Measurement and Modeling of a Multicomputer System. IEEE Trans. Computers. 42 (1993) 62–75

    Article  Google Scholar 

  44. Tang, D., Iyer, R.K.: MEASURE+-A Measurement-Based Dependability Analysis Package. Proc. ACM SIGMETRICS Conf. Measurement and Modeling of Computer Systems. (1993) 110–121

    Google Scholar 

  45. Thakur, A., Iyer, R.K.: Analyze-NOW-An Environment for Collection and Analysis of Failures in a Network of Workstations. IEEE Trans. Reliability. R-46 (1996) 561–570

    Article  Google Scholar 

  46. Tsao, M.M., Siewiorek, D.P.: Trend Analysis on System Error Files. Proc. 13th Int. Symp. Fault-Tolerant Computing. (1983) 116–119

    Google Scholar 

  47. Velardi, P., Iyer, R.K.: A Study of Software Failures and Recovery in the MVS Operating System. IEEE Trans. Computers. C-33 (1984) 564–568.

    Article  Google Scholar 

  48. Wein, A.S., Sathaye, A.: Validating Complex Computer System Availability Models. IEEE Trans. Reliability. 39 (1990) 468–479

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Iyer, R.K., Kalbarczyk, Z., Kalyanakrishnan, M. (2000). Measurement-Based Analysis of Networked System Availability. In: Haring, G., Lindemann, C., Reiser, M. (eds) Performance Evaluation: Origins and Directions. Lecture Notes in Computer Science, vol 1769. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46506-5_7

Download citation

  • DOI: https://doi.org/10.1007/3-540-46506-5_7

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67193-0

  • Online ISBN: 978-3-540-46506-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics