On Diversity, and the Elusiveness of Independence

Littlewood, Bev

doi:10.1007/3-540-45732-1_24

On Diversity, and the Elusiveness of Independence

Bev Littlewood⁶

Conference paper
First Online: 01 January 2002

582 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2434))

Abstract

Diversity, as a means of avoiding mistakes, is ubiquitous in human affairs. Whenever we invite someone else to check our work, we are taking advantage of the fact that they are different from us. In particular, we expect that their different outlook may allow them to see problems that we have missed. In this talk I shall look at the uses of diversity in systems dependability engineering.

In contrast to diversity, redundancy has been used in engineering from time immemorial to obtain dependability. Mathematical theories of reliability involving redundancy of components go back over half a century. But redundancy and diversity are not the same thing.

Redundancy, involving the use of multiple copies of similar (‘identical’) components (e.g. in parallel) can be effective in protecting against random failures of hardware. In some cases, it is reasonable to believe that failures of such components will be statistically independent: in that case very elementary mathematics can show that systems of arbitrarily high reliability can be built from components of arbitrarily low reliability. In practice, assumptions of independence need to be treated with some scepticism, but redundancy can nevertheless still bring benefits in reliability.

What redundancy cannot protect against, of course, is the possibility of different components containing common failure modes — for example, design defects which will show themselves on every component of a particular type whenever certain conditions arise. Whilst this problem has been familiar to reliability and safety engineers for decades, it became particularly acute when systems dependability began to depend heavily on the correct functioning of software.

Clearly, there are no software reliability benefits to be gained by the use of simple redundancy, i.e. merely exact replication of a single program. Since software, unlike hardware, does not suffer from ‘random failures’ — in the jargon its failures are ‘systematic’ — failures of identical copies will always be coincident.

Design diversity, on the other hand — creating different versions using different teams and perhaps different methods — may be a good way of making software reliable, by providing some protection against the possibility of common design faults in different versions. Certainly, there is some industrial experience of design-diverse fault tolerant systems exhibiting high operational reliability (although the jury is out on the issue of whether this is the most cost-effective way of obtaining high reliability).

What is clear, however — from both theoretical and experimental evidence — is that claims for statistical independence of diverse software versions¹ are not tenable. Instead, it is likely that two (or more) versions will show positive association in their failures. This means that the simple mathematics based on independence assumptions will be incorrect — indeed it will give dangerously optimistic answers. To assess the system reliability we need to estimate not just the version reliabilities, but the level of dependence between versions as well.

In recent years there has been considerable research into understanding the nature of this dependence. It has centred upon probabilistic models of variation of ‘difficulty’ across the demand (or input) space of software. The central idea is that different demands vary in their difficulty — to the human designer in providing a correct ‘solution’ that will become a part of a program, and thus eventually to the program when it executes. The earliest models assume that what is difficult for one programming team will be difficult for another. Thus we might expect that if program version A has failed on a particular demand, then this suggests the demand is a ‘difficult’ one, and so program version B becomes more likely to fail on that demand. Dependence between failures of design-diverse versions therefore arises as a result of variation of ‘difficulty’: the more variation there is, the greater the dependence.

This model, and subsequent refinements of it, go some way to providing a formal understanding of the relationship between the process of building diverse program versions and their subsequent failure behaviour. In particular, they shine new light on the different meanings of that over-used word ‘independence’. They show that even though two programs have been developed ‘independently’, they will not fail independently. The apparent goal of some early work on software fault tolerance — to appeal to ‘independence’ in order to claim high system reliability using software versions of modest reliability, as had been done for hardware — turns out to be illusory. On the other hand, the models tell us that diversity is nevertheless ‘a good thing’ in certain precise and formally-expressed ways.

In this talk I shall briefly describe these models, and show that they can be used to model diversity in other dependability-related contexts. For example, the formalism can be used to model diverse methods of finding faults in a single program: it provides an understanding of the trade-off between ‘effectiveness’ and ‘diversity’ when different fault-finding methods are available (as is usually the case). I shall also speculate about their applicability to the use of diversity in reliability and safety cases: e.g. ‘independent’ argument legs; e.g. ‘independent’ V&V.

As will be clear from the above, much of the talk will concern the vexed question of ‘independence’. These models, for the most part, are bad news for seekers after independence. Are there novel ways in which we might seek, and make justifiable claims about, independence?

Software is interesting here because of the possibility that it can be made ‘perfect’, i.e. fault-free and perfectly reliable, in certain circumstances. Such a claim for perfection is rather different from a claim for high reliability. Indeed, I might believe a claim that a program has a zero failure rate, whilst not believing a claim that another program has a failure rate of less than 10^-9 per hour. The reason for my apparently paradoxical view is that the arguments here are very different. The claim for perfection might be based upon utter simplicity, a formal specification of the engineering requirements, and a formal verification of the program. The claim for better than 10^-9 per hour, on the other hand, seems to accept the presence of faults (presumably because the program’s complexity precludes claims of perfection), but nevertheless asserts that the faults will have an incredibly small effect. I shall discuss a rather speculative approach to design diversity in which independence may be believable between claims for fault-freeness and reliability.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Author information

Authors and Affiliations

Centre for Software Reliability, City University, Northampton Square, EC1V0HB, London
Bev Littlewood

Authors

Bev Littlewood
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

LFCS, Division of Informatics, The University of Edinburgh, Mayfield Road, EH9 3JZ, Edinburgh, UK
Stuart Anderson & Massimo Felici &
ENEA CR Casaccia, Via Anguillarese, 301, 00060, Rome, Italy
Sandro Bologna

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Littlewood, B. (2002). On Diversity, and the Elusiveness of Independence. In: Anderson, S., Felici, M., Bologna, S. (eds) Computer Safety, Reliability and Security. SAFECOMP 2002. Lecture Notes in Computer Science, vol 2434. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45732-1_24

Download citation

DOI: https://doi.org/10.1007/3-540-45732-1_24
Published: 18 September 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44157-1
Online ISBN: 978-3-540-45732-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics