Information integration for complex systems

https://doi.org/10.1016/j.ress.2006.07.003Get rights and content

Abstract

This paper develops a framework to determine the performance or reliability of a complex system. We consider a case study in missile reliability that focuses on the assessment of a high fidelity launch vehicle intended to emulate a ballistic missile threat. In particular, we address the case of how to make a system assessment when there are limited full-system tests. We address the development of a system model and the integration of a variety of data using a Bayesian network.

Introduction

At 8:40 p.m. on February 25, 1991, parts of an Iraqi Scud missile destroyed the barracks housing members of the US Army's 14th Quartermaster Detachment. This was the single, most devastating attack on US forces during the First Gulf War: 29 soldiers died and 99 were wounded. In the aftermath of this attack, there has been great focus on developing air defense systems capable of defending against ballistic missile attacks. The Critical Measurements and Counter Measures Program (CMCM) run by the US Army Space and Missile Defense Command conducts exercises to replicate projected ballistic missile threats. These exercises help the US military collect realistic data to evaluate potential defensive measures. The high-fidelity hardware and realistic scenarios created for the exercises provide extensive optical, radar, and telemetry data [1].

CMCM is organized into campaigns. Each campaign chooses a new ballistic missile threat and develops two to four high-fidelity launch vehicles that emulate the threat as closely as possible given intelligence information. That is, if country “A” has a ballistic missile that might be used against the US, a CMCM campaign may involve building a small number of replicas that can be launched to test and train US tracking and intercept capabilities. Assessing the reliability of these missile targets is difficult for a variety of reasons. While there is some reuse across campaigns, each set of launch vehicles is essentially a complex, one-of-a-kind, one-time-use system built for a specific data collection purpose. Typically, due to cost and schedule constraints, there are no “risk reduction” flights performed, so there is no full-system checkout before the actual flights. The systems are designed and built in a distributed fashion, with scientists and engineers from different companies designing, building, and integrating various parts of the vehicle. These campaigns are expensive (millions of dollars) and politically high profile.

The issue that we address in this paper is how to determine a preflight probability of mission success and how to assess areas of risk to the flight. Since there are no preflight full-system tests, this involves careful system modeling and the integration of as much component, historical, and engineering data as possible. The applied problem described here is large and complex. The system itself is well-understood in some dimensions by the groups working on the project, but not in terms of its overall reliability and performance. Knowledge of the total system is distributed across two primary research and development contractors and several subcontractors, all of which are located in different parts of the country. Each research group understands its area of responsibility at a local/granular level, and there is working knowledge of how to build a missile that will fly, but the project teams do not have methodology or tools to assess or predict full-system performance or reliability. The government agency that sets technical and scheduling requirements and oversees budgets is in yet another location, limiting its opportunity to assess the problems and progress of the project to weekly conference calls and periodic technical meetings where all of the contractors gather at a single location to brief the status of their efforts.

Modeling a system of this type presents challenges. There is heterogeneous data that explains different aspects of component and subcomponent performance, but very little sense of how that data is interrelated or how to sensibly combine the data and propagate reliability estimates and their uncertainties to understand overall system reliability. There are hundreds of components and subcomponents that all perform differently. Our approach to grappling with this problem was to first build a qualitative model of the problem space (its parts and relationships) and then to migrate that qualitative model to a graphical statistical model. The project involved collaboration between a social scientist who studies technical communities and a statistician who studies reliability and information integration for complex systems. We used ethnographic interviewing and observation techniques to elicit the problem structure, which was then represented in conceptual graphs. The framework that we use to quantitatively model the system and integrate the data is Bayesian networks (BN).

Ethnographic methods were originally developed by Western anthropologists to study foreign cultures. In the 20th century, these methods were deployed to study a variety of subcultures of Western society as well [2] (e.g., street gangs [3], long distance truck drivers [4], single parents [5], cigar smokers [6], endocrinologists [7], nuclear weapons designers [8]). In all instances, the goal of the anthropologist is largely the same: to better understand the internal logic of particular cultures, including beliefs, rituals, rules, problem solving strategies, and ways of producing and preserving knowledge. Modern ethnographic methods include interviewing, observation, and textual analysis with an eye towards understanding the culture in its own terms [9], [10], [11]. Considerable effort goes into not imposing outside preconceptions that would color interpretation of the information and data collected during fieldwork.

For several reasons, we employ ethnographic methods to create initial qualitative models of complex systems like the one discussed in this article. First, we want to capture how the technical community understands both the problem and their technical system and let that drive the statistical analysis that is ultimately performed. American industry is littered with effective statistical models that had a short useful life because they were unintelligible to the client or never gained cultural buy-in from the organization. Second, much technical knowledge is tacit and not explicit. To appropriately model the reliability of a complex system, we need to capture the sort of things that are left off the wiring schematics and engineering block diagrams. Third, we want to understand what the technical experts think is important to system performance, where they collect data, what that data means to them, and what their engineering judgment tells them about the system. When an engineer says “When it's cold this part doesn’t work well, and when it's really cold it never works,” we want to make sure we capture the logic of that knowledge in the system model.

Finally, many of the systems we work with are untestable or very expensive to test. There is not much data that can be used to indicate overall system performance. In the case of the CMCM campaigns, no integrated system test data is available before the missile is actually flown. What might be available for a system like this is subcomponent data, computer models, historical data on related parts/systems, and expert judgment. Working from these data sets necessitates a model that looks at reliability at the part and component level, and then rolls up that granular information into a reliability number for the overall system.

Because ethnographic methods explicitly require the researcher to avoid imposing external conceptual frames on a subject of study, ethnographic methods are well-suited to studying parts of a larger whole to document the rules and logic by which the parts relate to each other. Recognizing the value of ethnographic field methods in technology development, an increasing number of high-tech companies are using ethnographers in their design process [12], [13], [14].

What is unique about this project is the use of ethnographic methods to understand an engineering design process as part of a larger effort to design a statistical model. In this case, ethnographic methods were particularly useful because the CMCM missile was being developed by several stakeholder communities, each of which had very specific areas of expertise. Ethnographic field methods ensure that perceptions and views of one group of experts are not privileged over those of another group; in other words, no single community is allowed to define the technology in its local terms.

Instead, an integrated view of the relationships among parts, functions, and outcomes emerges iteratively as the researcher engages with, questions, and documents the responses of individual communities. The end result is a graphical system model that not only makes sense to all members of the project, but that is owned by all members of the expert community developing the technology. This graphical system model provides a big picture view of the system's functionality that can form the basis for a statistical analysis.

When the researcher understands the logical relationships of the components and the data that is available for each component, there are classes of graphical statistical models that can be used to understand overall system performance. In our case, the information was well-modeled using a BN. There is limited literature on the use of BNs in failure modes and effects analysis [15] and reliability [16], [17], although there is quite a broad literature on using BNs for probabilistic modeling (e.g., [18], [19], [20], [21]).

This paper will describe our approach to the assessment of Campaign 4 of the CMCM program (CMP-4). Section 2 details the development of the system representation using both qualitative and quantitative methods. Section 3 discusses the statistical model for the system and the information and data available to populate the model. Section 4 shows how the information was combined to make estimates. Section 5 contains conclusions and discussion.

Section snippets

Representing the system

The week before a previous campaign mission was supposed to fly, the project manager looked around the table at the subcontractors who had built various parts of the missile. She asked the question, “What's the probability this thing's going to fly?” The people at the table could only answer “My part is going to work!” There was no perspective, however, on how the integrated collection of these parts was going to perform. The authors of this paper had been working at Los Alamos National

System model

The joint distribution of V, the set of nodes in a BN, is given byvVP(v|parents[v]),where the parents of a node are the set of nodes with an edge pointing to the node. For example, in the serial structure in Fig. 2a, the parent of node C is node B, and node A has no parents.

Eq. (1) shows that the joint distribution of the nodes in the BN is determined by a set of conditional distributions. For example, in Fig. 1, one of the probabilities that needs to be assessed to determine the joint

Statistical inference

Suppose that we have developed a system model and identified data sources using the methodology described in Section 3. We now need to calculate the marginal probabilities for various nodes of interest in the BN. We illustrate the methodology using two simple examples.

Suppose that we have the BN in Fig. 7. Each node has three possible states RED (R), Yellow (Y), and GREEN (G). We have the following data and information:

  • A is RED with Φ=1, A is YELLOW with Φ=2.

  • B is RED with Φ=1, B is YELLOW with Φ

Conclusions

The final BN for CMP-4 contained approximately 600 nodes. Neil et al. [19] summarizes many of the issues that surround working with a model of this size:

Large knowledge-based systems, including BNs, are subject to the same forces as any other substantial engineering undertaking. The customer might not know what they want; the knowledge engineer may have difficulty understanding the domain; the tools and methods applied may be imperfect; dealing with multiple ever-changing design abstraction is

References (33)

  • M.H. Agar

    The professional stranger: an informal introduction to ethnography

    (1996)
  • R.S. Weiss

    Learning from strangers: the art and method of qualitative interview studies

    (1994)
  • B.S. Sunstein et al.

    Field working: reading and writing research

    (2002)
  • L. Suchman

    Plans and situated actions: the problem of human–machine communication

    (1987)
  • L. Suchman

    Building bridges: practice-based ethnographies of contemporary technology

  • J. Orr

    Talking about machines: an ethnography of a modern job

    (1996)
  • Cited by (0)

    View full text