Abstract
Reliability and performability modeling techniques and tools have been an area of lot of research activity in the last ten years. We present a survey of different techniques and tools that can be used for re-liability and performability analysis. A unified mathematical framework for reliability and performability models in terms of Markov reward models is presented. Among modeling techniques, we describe reward-based hybrid hierarchical modeling, combinatorial multistate models, queues with server breakdowns, completion time approach, and iterative modeling. Software packages METFAC, NUMAS, SHARPE, SPNP, and Ultra-SAN are considered in detail while DyQNtool, PENELOPE, PENPET, and SAVE are briefly discussed.
This work was supported in part by the National Science Foundation under Grant CCR-9108114 and by the Naval Surface Warfare Center under grant N60921-92-C-0161.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
M. Ajmone-Marsan and G. Chiola. On Petri nets with deterministic and exponentially distributed firing times. In Lecture Notes in Computer Science, volume 266, pages 132–145. Springer-Verlag, 1987.
M. Ajmone-Marsan, G. Conte, and G. Balbo. A class of Generalized Stochastic Petri Nets for the performance evaluation of multiprocessor systems. ACM Transactions on Computer Systems, 2 (2): 93–122, 1984.
T. Altiok. Queuing modeling of a single processor with failures. Performance Evaluation, 9: 93–102, 1989.
H. Ammar, S.M.R. Islam, and S. Deng. Performability analysis of parallel and distributed algorithms. In Proc. of 3rd Intl. Workshop on Petri Nets and Performance Models, pages 221–227. IEEE Computer Society Press, Silver Spring, MD, June 1989.
F. Baccelli and K.S. Trivedi. Analysis of an M/G/2 standby redundant system. In A. Agrawala and S.K. Tripathi, editors, PERFORMANCE ’83, pages 457–476. North-Holland, 1983.
F. Baskett, K.M. Chandy, R.R. Muntz, and F.G. Palacios. Open, closed, and mixed networks of queues with different classes of customers. Journal of the ACM, 22 (2): 248–260, 1975.
M. Beaudry. Performance related reliability for computer systems. IEEE Transactions on Computers, C-27: 540–547, June 1978.
H. Beilner, J. Maeter, and N. Weissenberg. Towards a performance modeling environment: News of HIT. In R. Puigjaner and D. Potier, editors, Modeling Techniques and Tools for Computer Performance Evaluation, pages 57–75. Plenum Press, 1989.
S. Berson, E. de Souza e Silva, and R.R. Muntz. A methodology for the specification and generation of markov models. In W.J. Stewart, editor, Numerical Solution of Markov Chains, pages 11–36. Marcel Dekker, 1991.
A. Bobbio. Petri nets generating Markov reward models for performance/reliability analysis of degradable systems. In R. Puigjaner and D. Potier, editors, Modeling Techniques and Tools for Computer Performance Evaluation. Plenum Press, 1989.
A. Bobbio and L. Eoberti. Distribution of the minimal completion time of parallel tasks in multi-reward semi-markov models. Performance Evaluation, 14: 239–256, 1992.
A. Bobbio and K. Trivedi. Computation of the distribution of the completion time when the work requirement is a PH random variable. Stochastic Models, 6: 133–149, 1990.
A. Bobbio and K. Trivedi. Computing cumulative measures of stiff Markov chains using aggregation. IEEE Transactions on Computers, 39 (10): 1291–1297, October 1990.
A. Bobbio and K.S. Trivedi. An aggregation technique for the transient analysis of stiff Markov chains. IEEE Transactions on Computers, C-35(9): 803–814, Sept 1986.
J.A. Carrasco and J. Figueras. METFAC: Design and implementation of a software tool for modeling and evaluation of complex fault-tolerant computing systems. In Proc. of IEEE 16th Fault-Tolerant Computing Symposium, pages 424–429, July 1986.
X. Castillo and D. P. Siewiorek. A performance reliability model for computing systems. In Proceedings of the 10th International Symposium on Fault-Tolerant Computing, pages 187–192, June 1980.
R. Chakka and I. Mitrani. A numerical solution method for multiprocessor systems with general breakdowns and repairs. In R. Pooley and J. Hills ton, editors, Computer Performance Evaluation, pages 289–299. September 1992.
P. Chimento. System performance in a failure-prone environment. PhD thesis, Department of Computer Science, Duke University, Durham, NC, 1988.
H. Choi, V. G. Kulkarni, and K. S. Trivedi. Markov Regenerative Stochastic Petri Nets. In 16th IFIP W.G. 7.3 Int’l Sym. on Computer Performance Modelling, Measurement and Evaluation (Performance’93), Rome, Italy, Sep. 1993, To appear.
H. Choi and K. S. Trivedi. Approximate performance models of polling systems using stochastic Petri nets. In Proc. of IEEE Infocom 92, pages 2306–2314, Florence Italy, May 1992.
G. Ciardo, A. Blakemore, P. F. Chimento, J. K. Muppala, and K. S. Trivedi. Automated generation and analysis of Markov reward models using Stochastic Reward Nets. In C. Meyer and R. J. Plemmons, editors, Linear Algebra, Markov Chains, and Queueing Models IMA Volumes in Mathematics and its Applications, volume 48. Springer-Verlag, Heidelberg, Germany, 1992.
G. Ciardo, R. Marie, B. Sericola, and K. S. Trivedi. Performability analysis using semi-Markov reward processes. IEEE Transactions on Computers, C- 39 (10): 1251–1264, Oct. 1990.
G. Ciardo, J.K. Muppala, and K.S. Trivedi. SPNP: Stochastic Petri Net Package. In Proc. Intl. Workshop on Petri Nets and Performance Models, pages 142–150. IEEE Computer Society Press, Kyoto, Japan, Dec. 1989.
G. Ciardo and K.S. Trivedi. A decomposition approach for stochastic reward net models. To appear in Performance Evaluation.
B. Ciciani and V. Grassi. Performability evaluation of fault-tolerant satellite systems. IEEE Transactions on Communications, 35 (4): 403–409, 1987.
J.A. Couvillion, R. Freire, R. Johnson, W.D.Obal II, M.A. Qureshi, M. Rai, W.H. Sanders, and J.E. Trivedi. Performability modeling with UltraSAN. IEEE software., 8: 69–80, Sept. 1991.
A. Cumani. ESP - A package for the evaluation of stochastic Petri nets with phase-type distributed transition times. In Proc. of International Workshop on Timed Petri Nets, pages 144–151, Torino, Italy, July 1985.
E. de Souza e Silva and H. R. Gail. Calculating cumulative operational time distributions of repairable computer systems. IEEE Transactions on Computers, C-35(4): 322–332, Apr. 1986.
E. de Souza e Silva and H. R. Gail. Calculating availability and performability measures of repairable computer systems using randomization. J. ACM., 36 (1): 171–193, Jan. 1989.
E. de Souza e Silva and H. R. Gail. Performability analysis of computer systems: from model specification to solution. Performance Evaluation, 14: 157–196, 1992.
H. DeMeer. Transiente leistungsbewertung und Optimierung rekonfiguier- barer fehlertoleranter rechensysteme. Arbeitsberichte des IMMDder Universität Erlangen-Nüremberg, 25 (10), October 1992.
N.M. Van Dijk. Simple bounds for queueing systems with breakdowns. Performance Evaluation, 8 (2): 117–128, 1988.
L. Donatiello and V. Grassi. On evaluating the cumulative performance distribution of fault-tolerant computer systems. IEEE Transactions on Computers, 40 (11): 1301–1307, 1991.
L. Donatiello and B. R. Iyer. Analysis of a composite performance reliability measure for fault-tolerant systems. Journal for the Association of Computing Machinery, 34 (1): 179–199, January 1987.
B.T. Doshi. Queuing systems with vacations. Queuing Systems, 1: 29–66, 1986.
B.T. Doshi. Generalizations of the stochastic decomposition results for single server queues with vacations. Stochastic Models, 6 (2): 307–333, 1990.
A. Duda. The effects of checkpointing on program execution time. Information Processing Letters, 16: 221–229, 1983.
D. G. Furchtgott and J. F. Meyer. A performability solution method for degradable nonrepayable systems. IEEE Transactions on Computers, C-33(6): 550–554, June 1984.
D.P. Gaver. A waiting line with interrupted service, including priorities. J. R. Statist Soc., B24: 73–90, 1962.
R. Geist, M. K. Smotherman, K. S. Trivedi, and J. B. Dugan. The reliability of life-critical systems. Acta Informática, 23: 621–642, 1986.
R. Geist and K.S. Trivedi. Reliability estimation of fault-tolerant systems: Tools and techniques. IEEE Computer, 23: 52–61, July 1990.
E. Gelenbe, D. Finkel, and S.K. Tripathi. Availability of a distributed computer system with failures. Acta Informática, 23: 643–655, 1986.
A. Goyai, W.C. Carter, E. de Souza e Silva, S.S, Lavenberg, and K.S. Trivedi. The system availability estimator. In Proc. of IEEE 16th Fault-Tolerant Computing Symposium, pages 84–89, July 1986.
A. Goyai and A. N. Tantawi. Evaluation of performability for degradable computer systems. IEEE Transactions on Computers, 36 (6): 738–744, June 1987.
A. Goyai and A.N. Tantawi. A measure of guaranteed availability and its numerical evaluation. IEEE Transactions on Computers, 37 (1): 25–32, 1988.
V. Grassi, L. Donatiello, and G. Iazeolla. Performability evaluation of multicom- ponent fault-tolerant systems. IEEE Transactions on Reliability, 37(2):216–222
B.R. Haverkort. Performability Modeling Tools, Evaluation Techniques, and Applications. PhD thesis, University of Twente, Netherlands, 1990.
B.R. Haverkort and I.G. Niemegeers. A survey of performability modeling tools. Q-Passport, 7: 1–12, October 1989.
B.R. Haverkort, I.G. Niemegeers, and P.V. van Zanten. DyQNtool - a performability modeling tool based on the dynamic queuing queueing network concept. In G. Balbo and G. Serrazi, editors, Computer Performance Evaluation, Modelling Techniques and Tools, pages 181–195. Elsevier, 1992.
B.R. Haverkort and K.S. Trivedi. Specification and generation of markov reward models. To appear.
R.A.Howard. Dynamic Probabilistic Systems, Vol.11: Semi-Markov and Decision Processes. John Wiley & Sons, New York, 1971.
M. C. Hsueh, R. K. Iyer, and K. S. Trivedi. Performability modeling based on real data: A case study. IEEE Transactions on Computers, C-37(4): 478–484, April 1988.
O. C. Ibe, R. C. Howe, and K. S. Trivedi. Approximate availability analysis of VAXcluster systems. IEEE Transactions on Reliability, R-38(l):146–152, Apr.
B. R. Iyer, L. Donatiello, and P. Heidelberger. Analysis of performability for stochastic models of fault-tolerant systems. IEEE Transactions on Computers, C-35(10): 902–907, October 1986.
A.M. Johnson and M. Malek. Survey of software tools for evaluating reliability, availability and serviceability. ACM Computing Surveys, 20 (4): 227–269, December 1988.
H. Kantz and K.S. Trivedi. Reliability modeling of MARS system: A case study in the use of different tools and techniques. In International Workshop on Petri Nets and Performance Models, Melbourne, Australia, 1991.
P.J.B. King and I. Mitrani. Multiserver systems subject to breakdowns: An empirical study. IEEE Transactions on Computers, C-32(10): 96–98, 1983.
V. Kulkarnie V.F. Nicola, R.M. Smith, and K.S. Trivedi. Numerical evaluation of performability measures and job completion time in repairable fault-tolerant systems. In Proc. 16th Intl. Symp. on Fault Tolerant Computing, Vienna, Austria, July 1986. IEEE.
V. G. Kulkarni, V. F. Nicola, and K. S. Trivedi. The completion time of a job on multimode systems. Advances in Applied Probability, 19: 932–954, 1987.
V. G. Kulkarni, V. F. Nicola, and K. S. Trivedi. Effects of checkpointing and queueing on program performance. Stochastic Models, 6 (4): 615–648, 1990.
R. Lepold. Penpet: A new approach to performability modeling using stochastic petri nets. In B.R. Haverkort, I.G. Niemegeers, and N.M. van Dijk, editors, Proc. of the First Intl. Workshop on Performability Modelling of Computer and Communication Systems, pages 3–17. 1992.
C. Lindemann, M. Malhotra, and K.S. Trivedi. Numerical methods for reliability evaluation of closed fault-tolerant systems. Technical Report DUKE-CCSR-92- 017, Center for Computer Systems Research, Duke University, 1992.
N. Lopez-Benitez and K.S. Trivedi. Multiprocessor performability analysis. IEEE Transactions on Reliability, Dec. 1993. To appear.
M. Malhotra. A computationally efficient technique for transient analysis of repairable Markovian systems. To appear in Performance Evlauation subject to revision, 1993.
M. Malhotra, J. K. Muppaia, and K. S. Trivedi. Stiffness-tolerant methods for transient analysis of stiff Markov chains. Technical Report DUKE-CCSR-92-003, Center for Computer Systems Research, Duke University, 1992.
M. Malhotra and A.L. Reibman. Selecting and implementing phase approximations for semi-Markov models. To appear in Stochastic Models, 1993.
M. Malhotra and K. S. Trivedi. Reliability analysis of redundant arrays of inexpensive disks. Journal of Parallel and Distributed Computing, 17: 146–151, Jan. 1993.
M. Malhotra, K. S. Trivedi, C. Y. Wang, and M. Veeraraghavan. Reliability modeling with computer-based tools. In H.T. Nagle and R. Schneider, editors, Quality and Reliability in Computer-Based Medical Products. TAB/IEEE Press, 1993. To appear.
M. Malhotra and K.S. Trivedi. Higher-order methods for transient analysis of stiff Markov chains. In Third international conference on Performance of Distributed Systems and Integrated Communication Networks, Kyoto, Japan, 1991.
M. Ajmone Marsan, G. Baibo, A. Bobbio, G. Conte, and A. Cumani. On Petri nets with stochastic timing. In Proceedings of the International Workshop on Timed Petri Nets, pages 80–87, Torino Italy, July 1985.
J. Meyer. On evaluating the performability of degradable computer systems. IEEE Transactions on Computers, C-29: 720–731, Aug 1980.
J. F. Meyer. Closed-form solutions of performability. IEEE Transactions on Computers, C-31(7): 648–657, July 1982.
J.F. Meyer. Performability: a retrospective and some pointers to the future. Performance Evaluation, 14: 139–156, 1992.
J.F. Meyer, A. Movaghar, and W.H. Sanders. Stochastic activity networks: Structure, behavior, and application. In International Workshop on Petri Nets and Performance Models, pages 106–115, Torino, Italy, July 1985.
I. Mitrani and B. Avi-Itzhak. A many-server queue with service interruptions. Operations Research, 16 (3): 628–638, 1968.
M. Mulazzani and K. S. Trivedi. Dependability prediction: Comparison of tools and techniques. In IFAC SAFECOMP Proc., Toulose, France, 1986.
B. Muller-Clostermann. NUMAS, a tool for numerical analysis of computer systems. In D. Potier, editor, Modeling Techniques and Tools for Performance Analysis, pages 141–154. North-Holland, Amsterdam, 1985.
B. Muller-Clostermann. An approximate product form for a class of degradable queuing networks. Performance Evaluation, pages 165–171, 1988.
F. Munkert and H. de Meer. XPenelope user guide. Technical report, June 1993.
J.K. Muppaia, A.S. Sathaye, R.C. Howe, and K.S. Trivedi. Dependability modeling of a heterogenous VAXcluster system using stochastic reward nets. In D. Averesky, editor, Hardware and Software Fault Tolerance in Parallel Computing Systems. Ellis Horwood Ltd., 1992.
J.K. Muppaia and K.S. Trivedi. Numerical transient analysis of finite markovian queueing systems. In U.N. Bhat and I.V. Basawa, editors, Queueing and Related Models, pages 262–284. Oxford University Press, 1992.
M.F. Neuts and D.M. Lucantoni. A Markovian queue with N servers subject to breakdowns and repairs. Management Science, 25 (9): 849–861, 1979.
V. F. Nicola, A. Bobbio, and K. S. Trivedi. A unified performance reliability analysis of a system with a cumulative down time constraint. Microelectronics and Reliability, 32: 49–65, 1992.
V. F. Nicola, V. G. Kulkarni, and K. S. Trivedi. Queueing analysis of fault- tolerant computer systems. IEEE Transactions on Software Engineering, 13 (3): 363–375, March 1987.
K.R. Pattipati, Y. Li, and H.A.P. Blom. A unified framework for the performa- bility evaluation of fault-tolerant computer systems. IEEE Transactions on Computers, 42 (3): 312–325, 1993.
K.E. Pattipati and S.A. Shah. On the computational aspects of performability models of fault-tolerant computer systems. IEEE Transactions on Computers, 39 (7): 832–836, July 1990.
A.V. Ramesh and K.S. Trivedi. Semi-numerical transient analysis of markov models. Submitted for publication, 1993.
A. Reibman and K.S. Trivedi. Numerical transient analysis of Markov models. Computers and Operations Research, 15 (1): 19–36, 1988.
A. Reibman, K.S. Trivedi, and R. Smith. Markov and Markov reward model transient analysis: An overview of numerical approaches. European Journal of Operations Research, 40 (2): 257–267, 1989.
G. Rubino and B. Sericola. Interval availability analysis using operational periods. Performance Evaluation, 14: 257–272, 1992.
R.A. Sahner and K.S. Trivedi. Performance and reliability analysis using directed acyclic graphs. IEEE Transactions on Software Engineering, 14 (10): 1105–1114, Oct. 1987.
R.A. Sahner and K.S. Trivedi. Reliability modeling using SHARPE. IEEE Transactions on Reliability, R-36(2): 186–193, June 1987.
R.A. Sahner and K.S. Trivedi. A software tool for learning about stochastic models. IEEE Transactions on Education, 36 (1): 56–61, Feb. 1993.
W.H. Sanders and J.F. Meyer. METASAN: A performability evaluation tool based on stochastic activity networks. In Proc. ACM-IEEE Computer Soc. Fall Joint Computer Conf., pages 807–816, Los Alamitos, Calif., July 1986.
W.H. Sanders and J.F. Meyer. Reduced base model construction methods for stochastic activity networks. IEEE Selected Areas of Communications, pages 25–36, Jan. 199-1.
O. Schoen. On a class of integrated performance/reliability models based on queuing networks. In Proc. FTCS 16, pages 90–95, 1986.
C. Singh, R. Billinton, and S. Lee. The method of stages for non-Markovian models. IEEE Transactions on Reliability, R-26(l): 135–137, June 1977.
R.M. Smith, K.S. Trivedi, and A.V. Ramesh. Performability analysis: measures, an algorithm, and a case study. IEEE Transactions on Computers, 37 (4): 406–417, April 1988.
H. Szczerbicka. A combined queuing network and stochastic Petri net approach for evaluating the performability of fault-tolerant computer systems. Performance Evaluation, 14: 217–226, 1992.
J. Sztrik and T. Gal. A recursive solution of a queuing model for a multi-terminal system subject to breakdowns. Performance Evaluation, 11: 1–7, 1990.
K.S. Trivedi. Probability and Statistics with Reliability, Queuing, and Computer Science Applications. Prentice-Hall, Englewood-Cliffs, NJ, 1982.
K.S. Trivedi, J.K. Muppala, S.P. Woolet, and B.R. Haverkort. Composite performance and dependability analysis. Performance Evaluation, 14: 197–215, 1992.
M. Veeraraghavan and K.S. Trivedi. Composite performance and reliability analysis using combinatorial multistate models. To appear in IEEE Transactions on Computers.
M. Veeraraghavan and K.S. Trivedi. An improved algorithm for symbolic reliability analysis. IEEE Transactions on Reliability, R-40(3): 347–358, August 1991.
H.C. White and L.S. Christie. Queuing with preemptive priorities or breakdown. Operations Research, 6: 79–95, 1958.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1993 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Trivedi, K.S., Malhotra, M. (1993). Reliability and Performability Techniques and Tools: A Survey. In: Walke, B., Spaniol, O. (eds) Messung, Modellierung und Bewertung von Rechen- und Kommunikationssystemen. Informatik aktuell. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-78495-8_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-78495-8_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-57201-5
Online ISBN: 978-3-642-78495-8
eBook Packages: Springer Book Archive