Learning to improve reliability during system development

https://doi.org/10.1016/S0377-2217(99)00149-6Get rights and content

Abstract

Research, conducted in collaboration with a leading aerospace manufacturer, aimed to facilitate learning in order to improve the reliability of engineering systems during their development phase. In particular, the processes and mathematical models used during reliability growth testing were investigated to assess how they might be better used to support this improvement. This required both soft and hard OR approaches to be adopted. For example, information flows were mapped and reengineered in order to provide a basis for more effective data collection and feed-back to decision-makers. A new mathematical model that combines failure data with engineering judgement was developed to estimate reliability growth. The paper presents a case study describing the problem, the modelling conducted, the recommendations made and the actions implemented. The ways in which the researchers and the manufacturer learnt to improve both the modelling and the reliability growth testing process are reflected upon.

Introduction

Reliability growth testing is a planned test, analyse and fix process, in which engineering systems are tested under simulated and accelerated environments, not outside design limits, for the purpose of disclosing design deficiencies and providing an indication of in-service reliability (Leitch, 1995). It is a standard element of the reliability programme (BS5760) for the safety-critical electronic systems designed and produced by a major aerospace manufacturer. Prior to this project the popular Duane model (Duane, 1964) was used to estimate the growth in reliability of the systems. However, it was perceived to be too simplistic to capture the complexities of the reliability growth testing process. In addition, it provided poor estimates of both the level and the rate of growth in reliability of the systems. For example, the manufacturer's records revealed that reliability performance of the systems in the field was always significantly better than that estimated and, in some instances, the estimates of growth derived during testing were counterintuitive to the engineers understanding of the failure mechanisms that redesigns of the systems were intended to overcome. These concerns are not restricted to this manufacturer. Indeed the shortcomings of Duane modelling are well documented (O'Connor, 1991). Consequently, there is a need to improve reliability growth modelling so that more accurate estimates are generated from models of the reliability growth testing process actually experienced in practice. Further, there is a need to return to the original premise of reliability growth modelling and, that is, to support learning during system development. This is particularly important if manufacturers are to develop engineering systems both effectively and efficiently. In the aerospace sector, with which we are concerned, accurate estimates of reliability growth and a sound understanding of the failure behaviour demonstrated during testing will always be very important from a safety perspective. However, the competitive nature of the industry implies that learning how to achieve high reliability faster and more cheaply is increasingly important.

Other authors have made attempts to address these issues. However, the existing literature either tends to consider high level management issues surrounding reliability improvement (O'Connor, 1994) or to introduce technical improvements relating to aspects of theoretical reliability growth models (Ebrahimi, 1996). Our aim is to bridge the gap by introducing a sound theoretical model that embraces the practical elements of reliability growth testing and addresses the issues surrounding its implementation from data collection to communication of the results back to the decision makers. Thus our focus is upon providing a modelling framework that will encourage everyone involved to think about the implications of the process and the modelling of reliability growth tests. Consequently, we believe that this will put us in a stronger position to learn how to improve the system design more effectively and efficiently.

The paper begins by describing the nature of reliability growth testing for the family of electronic systems designed by our client manufacturer. The sequence of decisions to be made are identified and the idiosyncrasies of the data usually available to support them are discussed. Based on this review we suggest new ways of managing hard failure data from the test so that its information content is fully exploited. By the very nature of their functionality, the type of systems considered tend to be intrinsically reliable so that there is in fact limited hard failure data. However, since their design follows an evolutionary process there is a wealth of expertise and understanding about the failure mechanisms of the systems, or at least aspects of them. Therefore, we describe procedures used to elicit soft data from engineers who are experts about some aspect of the system. This should instigate further thinking about reliability issues and provide more information that can be combined with the hard test data through modelling to provide measures of reliability performance that are in turn fed back to the experts as well as the decision-makers. Both the elicitation process and the mathematical model are given a brief overview, but here we concentrate upon illustrating their application through a case analysis. To conclude, we reflect upon both the theoretical and practical issues addressed during the project.

Section snippets

The systems and their reliability growth testing process

The electronic aerospace systems of interest are safety-critical and purpose built to stringent customer requirements using robust and known technology. Modular design is favoured and there is a relatively long design and development phase during which the intrinsic reliability is built in. This is followed by a low volume manufacturing operation. The total time-to-market is about three years with the final product costing between £10K and £100K per unit.

Reliability growth tests are conducted

Construction of a customised database for storing hard failure data from the test

Much has been written about the challenges of collecting data and constructing appropriate databases to support reliability modelling in general. See, for example, Ansell and Phillips (1994), Cannon and Bendell (1991) and Cooke (1996). All report that most practical databases fail to provide the coverage and the structure of data required for modelling. This was also the case for the historical data available from reliability growth testing of the electronic systems under consideration. To

Elicitation of soft data from engineering experts during the test

There is a vast source of largely untapped data concerning the systems under consideration in the form of engineering judgement and exploiting this immediately provides more information about potential weaknesses in the system design. In addition to the system designers, engineering experts in components, manufacturing, test and other specialities can contribute their knowledge. We recommend that the entire process of eliciting the engineering judgement is managed by the reliability analyst in

Modelling the reliability growth process using hard and soft data

A mathematical model has been developed that captures the process underpinning reliability growth testing and uses the failure data collected during test as well as the judgements elicited from the engineering experts. The model essentially describes the accumulated test time to failure for different classes of failure. The first corresponds to those failures that were due to faults that were inherent in the design at the start of the test, including those raised as engineering concerns. The

Using the model to support evaluation and make decisions about the system reliability and the effectiveness of testing

The application of the above procedures within the framework shown in Fig. 1 can be illustrated for a case concerning reliability growth testing for a relatively complex electronic system. The plan was to test the system under stressed conditions for 1000 h followed by 500 h test under normal stresses. Therefore, interviews were scheduled with relevant engineering experts prior to the test and after 1000 h. Partial analysis of this problem is reported to show the use of the model only.

Using the

Reflections on implementation of the model and the revised process for reliability growth testing

The data collection systems and modelling procedures described have all been implemented in the manufacturer's operations in accordance with the framework presented in Fig. 1. In-house personnel have been trained to develop expertise in the more novel approaches advocated by the new reliability growth model. But what have we all learnt from this experience?

From the manufacturer's perspective, evidently quite a lot, since the engineers have changed their way of working. The interest generated by

Acknowledgements

We would like to thank all the engineers for their time and for sharing their views with us.

References (16)

  • R.M. Cooke

    Design of reliability databases, Part 1: Review of standard design concepts

    Reliability Engineering and System Safety

    (1996)
  • Ansell, J.I., Phillips, M.J., 1994. Practical Methods for Reliability Data. Oxford Science,...
  • Ansell, J.I., Walls, L.A., Quigley, J.L., 1999. Achieving growth in reliability. Annals of OR, to...
  • BS5760, Part 1, British Standards Institute, Milton...
  • S. Campodonico et al.

    Inference and predictions from Poisson point processes incorporating expert judgement

    Journal of the American Statistical Association

    (1995)
  • Cannon, A.G., Bendell, A., 1991. Reliability Data Banks. Elsevier Applied Science,...
  • Crowder, M.J., Kimber, A.C., Smith, R.L., Sweeting, T.J., 1991. Statistical Analysis of Reliability Data. Chapman &...
  • Duane, J.T., 1964. Learning curve approach to reliability monitoring. IEEE Transactions on Aerospace 2,...
There are more references available in the full text version of this article.

Cited by (9)

  • Allocation of tasks for reliability growth using multi-attribute utility

    2016, European Journal of Operational Research
    Citation Excerpt :

    For more information see Hodge, Evans, Quigley, and Walls, 2001. Similarly, Walls and Quigley (1999) developed a Bayesian model based on observable quantities for reliability growth in the TAAF cycle. Assume that each time a fault is found and removed no new fault is added to the system.

  • A methodology for constructing subjective probability distributions with data

    2018, International Series in Operations Research and Management Science
View all citing articles on Scopus
View full text