Techniques for generic probabilistic inversion

doi:10.1016/j.csda.2005.01.002

Computational Statistics & Data Analysis

Volume 50, Issue 5, 1 March 2006, Pages 1164-1187

https://doi.org/10.1016/j.csda.2005.01.002 Get rights and content

Abstract

Probabilistic inversion problems are defined, existing algorithms for solving such problems are discussed, and new algorithms based on iterative re-weighting of a sample are introduced. One of these is the well-known iterative proportional fitting whose properties were already studied (Csiszar, Ann. Probab. (3) (1975) 146). A variant on this is shown to have fixed points minimizing an information functional, even if the problem is not feasible, and is shown to have only feasible fixed points if the problem is feasible. The algorithm is not shown to converge, but the relative information of successive iterates is shown to converge to zero. Applications to atmospheric dispersion and environmental transport are discussed.

Introduction

Probabilistic inversion problems can be formulated either from a measure theoretic or from a random variable viewpoint. The former may be more suggestive for theoreticians and the latter is more recognizable for practitioners.

We follow the approach of Kraan (2002) and Kraan and Bedford (2004). Let $(M, B (M), λ)$ and $(N, B (N), ν)$ be two Borel probability spaces, where $M, N$ are compact non-empty subsets of $R^{m}$ and $R^{n}$ , respectively. Let $T : R^{m} \mapsto R^{n}$ be a continuous mapping. $γ = λ \circ T^{- 1}$ is called the push forward of $λ$ under T, and similarly, $λ$ is called the pull back of $γ$ under T. In the following $λ$ plays the role of a background measure. Measure $μ$ on $(M, B (M))$ is called an inverse of T at $ν$ if $μ$ is the pull back of $ν$ under T; that is $\forall B \in B (N), μ \circ T^{- 1} (B) = ν (B) .$ The problem can then be formulated as

Definition 1 Probabilistic inversion problem

Given $(M, B (M), λ)$ and $(N, B (N), ν)$ , with $T : R^{m} \mapsto R^{n}$ continuous, find a measure $μ$ absolutely continuous with respect to $λ$ , $μ ⪡ λ$ , on $(M, B (M))$ such that $μ$ is an inverse of T at $ν$ .

Such problems may be infeasible, and if feasible may have many solutions (Kraan, 2002). Under certain conditions a measure $μ$ can be found solving (1). If $ν ⪡ γ$ the Radon–Nikodym derivative $g : N \mapsto R$ exists, is non-negative, unique up to sets of $γ$ -measure zero, and satisfies $\forall B \in B (N), ν (B) = \int_{B} g (y) d γ (y) .$ If in addition g is continuous, $g (y) > 0$ , $y \in N$ , and $\int_{N} g (y) \log g (y) d γ (y) < \infty$ , then define $\tilde{f} ≔ g \circ T$ , and define $\tilde{μ}$ as a new probability measure: $\forall A \in B (M), \tilde{μ} ≔ \int_{T^{- 1} (B)} \tilde{f} d λ (x) .$ It is easy to check that $\tilde{μ}$ is an inverse of T at $ν$ : $\tilde{μ} \circ T^{- 1} (B) = \int_{T^{- 1} (B)} \tilde{f} d λ (x)$ $= \int_{T^{- 1} (B)} g \circ T (x) d λ (x) = \int_{B} g (y) d γ (y) = ν (B) .$ It can be shown that the measure $\tilde{μ}$ is the unique measure satisfying Eq. (1) and minimizing the relative information with respect to $λ$ in the class of measures satisfying Eq. (1) (Kraan, 2002).

In practical problems a reformulation of the problem in terms of random variables may be more convenient. Given a random vector $Y$ taking values in $R^{M}$ and a measurable function $G : R^{N} \to R^{M}$ , find a random vector $X$ such that $G (X) \sim Y$ , where $\sim$ means that $G (X)$ and $Y$ share the same distribution. If $G (X) \in {Y | Y \in C}$ where $C$ is a subset of random vectors on $R^{M}$ , then $X$ is called a probabilistic inverse of G at $C$ . $X$ is sometimes termed the input to model G, and $Y$ the output. G corresponds to the mapping T in the above paragraph. Note that T was assumed continuous, whereas G is only assumed to be measurable. If the problem is feasible it may have many solutions and we require a preferred solution; if it is infeasible we seek a random vector $X$ for which $G (X)$ is ‘as close as possible’ to $Y$ . Usually as a measure of closeness the relative information is used (Kullback, 1959).

Probabilistic inversion problems arise in quantifying uncertainty in physical models with expert judgement (Kraan and Cooke, 2000a). We wish to quantify the uncertainty on parameters $X$ of some model using expert judgement, but the parameters do not possess a clear physical meaning and are not associated with physical measurements with which experts are familiar. Often the models are derived under assumptions to which the experts do not subscribe. We must then find the observable quantities $Y$ functionally related with $X$ that can be assessed by experts. Extracting uncertainties of $X$ from uncertainties of $Y$ specified by experts is clearly an inverse problem (see examples in Section 5).

In practical applications the random vector $Y$ is characterized in terms of some percentiles or quantiles of the marginal distributions $Y_{1}, \dots, Y_{M}$ . In this case, we seek a random vector $X$ such that $G (X)$ satisfies quantile constraints imposed on $Y_{1}, \dots, Y_{M}$ . There may be other constraints. These constraints may reflect mathematical desiderata, as when we require independence between variables in Section 5. Physical considerations may also impose constraints on $X$ . In some cases probabilistic inversion problems may have trivial solutions, e.g. if the $X$ makes the coordinates of $Y$ completely rank correlated. Such solutions may be rejected on physical grounds; hence physical constraints may stipulate the support of $X$ . Other physical constraints are discussed in the example in Section 6. A few algorithms for solving such problems are available in literature namely: conditional sampling, parameter fitting for uncertain models (PARFUM) (Cooke, 1994), Hora and Young algorithm (Harper et al., 1994) and PREJUDICE (Kraan and Cooke, 2000a, Kraan and Cooke, 2000b). We summarize existing approaches to probabilistic inversion in Section 2.

In this paper we study iterative algorithms for numerically solving probabilistic inversion problems. These methods do not require model inversion they are based on sample re-weighting techniques and are promising because they do not require special knowledge about the problem at hand, or complicated heuristic steering on the part of the user. Moreover, operations on the sample are performed one-at-a-time, so the entire sample need not be held in memory. This means that there is virtually no problem size limitation.

After reviewing existing approaches, we discuss two iterative algorithms. First is known in the literature as iterative proportional fitting (IPF) (Kruithof, 1937). IPF finds a joint distribution satisfying marginal constraints by successively I-projecting an initial distribution on the set of distributions satisfying each marginal constraint. The I-projection of a probability measure p on a set of measures Q is ${argmin}_{q \in Q} I (q | p)$ , where $I (q | p)$ is the relative information of q with respect to p. IPF need not converge, but if it converges, it converges to a solution which is minimally informative with respect to the starting distribution (Csiszar, 1975). A variation on this is an iterative version of the PARFUM algorithm. We show that this algorithm has fixed points minimizing an information functional, even if the problem is not feasible, and that it has only feasible fixed points if the problem is feasible. The algorithm is not shown to converge, but the relative information of successive iterates is shown to converge to zero.

In Section 4 we discuss a sample re-weighting iterative approach to probabilistic inversion. In Section 5 we illustrate the IPF and PARFUM algorithms with an example from atmospheric dispersion modelling (Kraan and Cooke, 2000a) and transfer coefficients in chicken processing line. Iterative algorithms can easily be adopted to satisfy joint as well as marginal constraints on $Y_{1}, \dots, Y_{M}$ . We illustrate this by imposing (approximate) convolution constraints in Section 6. The final section gathers conclusions.

Section snippets

Conditional sampling

Let $Y$ consist of only one variable Y. A simple conditional sampling technique can be used, based on the following result (Kraan and Cooke, 2000a).

Proposition 2

Let X and Y be independent random variables with range ${1, \dots, n}$ . $P (X = i) > 0$ , $P (Y = i) > 0$ , $i = 1, \dots, n$ . Let $X_{X = Y}$ denote a random variable with distribution $P (X_{X = Y} = i) = P (X = i | X = Y) .$ Then $X_{X = Y} \sim Y$ if and only if X is uniformly distributed.

Proof

Put $p_{i} = P (X = i), q_{i} = P (Y = i), i = 1, \dots, n$ . Then $P (X_{X = Y} = i) = \frac{p_{i} q_{i}}{\sum p_{j} q_{j}} .$ For all $i = 1, \dots, n, p_{i} q_{i} / \sum p_{j} q_{j} = q_{i}$ if and only if $p_{i} = \sum p_{j} q_{j}$ , that is, $p_{i}$ does not

Iterative algorithms

In this section we introduce two iterative algorithms applied to solve the probabilistic problem in the next section. First we introduce necessary notation, definitions, simple facts and present IPF and PARFUM algorithms for the discrete distributions. We formulate the problem and show results only for two-dimensional case $(M = 2)$ . We indicate which results can be generalized.

Let $p_{\cdot, j} = \sum_{i = 1}^{K} p_{ij}$ , $j = 1, \dots, K$ and $p_{i, \cdot} = \sum_{j = 1}^{K} p_{ij}$ , $i = 1, \dots, K$ , and $S^{K \times K} = \{p \in R^{K \times K} | p_{ij} ⩾ 0, \sum_{i, j = 1}^{K} p_{ij} = 1\},$ $S^{* K \times K} = \{p \in S^{K \times K} | p_{i,.} > 0, p_{., j} > 0, i, j = 1,\}$

Sample re-weighting

In this section we show how sample re-weighting combined with iterative algorithms presented in Section 3 can solve probabilistic inversion problems. This yields generic methods for probabilistic inversion which do not require model inversion. The idea of re-weighting a sample to perform probabilistic inversion can be sketched roughly as follows. Starting with a random vector $X$ , we generate a large sample from $X : [X_{1}, \dots, X_{n}]$ , $Y_{1} = G_{1} (X), \dots, Y_{M} = G_{M} (X)$ . Let the ith sample be denoted $s_{i} \in R^{n + M}$ . Obviously

Examples

We first illustrate PARFUM and IPF with simple example involving dispersion coefficients from Harper et al. (1994), which is also extensively used in Kraan (2002) to explain steps of probabilistic inversion technique involving model inversion. We then present results from a recent study involving a chicken processing line.

Convolution constraints with prescribed margins

The iterative re-weighting algorithms can be used to impose constraints on the joint distribution, when these can be expressed as quantile constraints on functions of the margins. To illustrate, assume that samples $(x_{i}, y_{i})$ , $i = 1, \dots, N$ from variables $(X, Y)$ are given. We want to re-weight the samples in such a way that the quantiles of the distribution of $X + Y$ will agree with those of the convolution $X ★ Y$ . $X ★ Y$ is the random variable whose characteristic function is a product of the characteristic

Conclusions

We see that iterative algorithms possess attractive features for solving probabilistic inversion problems. These algorithms do not require intelligent steering, other than the choice of the initial sample. Further they do not pose size restrictions on the sample, so the sample may be large. Of course large samples will increase run time.

We note that in practice probabilistic inversions are typically infeasible. In such cases IPF is problematic, as little is known about its behavior in the case

Acknowledgements

The authors are grateful to V. Girardin for helpful discussions of iterative proportional fitting and convolution constraints and F.M. Dekking for improving the presentation of the proof of Theorem 7.

References (23)

D.T. Brown
A note on approximations to discrete probability distributions
Inform. and Control
(1959)
R.M. Cooke
Parameter fitting for uncertain modelsmodelling uncertainty in small models
Reliab. Engrg. System Safe.
(1994)
B.C.P. Kraan et al.
Uncertainty in compartmental models for hazardous materials—a case study
J. Hazard. Mater.
(2000)
Bishop, Y.M.M., 1967. Multidimensional contingency tables: cell estimates. Ph.D. Dissertation, Harvard...
R.M. Cooke et al.
Procedures guide for structured expert judgement in accident consequence modelling
Radiat. Prot. Dosim.
(2000)
I. Csiszar
I-divergence geometry of probability distributions and minimization problems
Ann. Probab.
(1975)
I. Csiszar et al.
Information geometry and alternating minimization procedures
Statist. Decisions
(1984)
S.E. Fienberg
An iterative procedure for estimation in contingency tables
Ann. Math. Statist.
(1970)
V. Girardin et al.
Probabilités en vue des applications
(2001)
S.J. Haberman
An Analysis of Frequency Data
(1974)

S.J. Haberman

Adjustment by minimum discriminant information

Ann. Statist.

(1984)

Cited by (0)

View full text

Techniques for generic probabilistic inversion

Abstract

Introduction

Section snippets

Conditional sampling

Iterative algorithms

Sample re-weighting

Examples

Convolution constraints with prescribed margins

Conclusions

Acknowledgements

Inform. and Control

Reliab. Engrg. System Safe.

J. Hazard. Mater.

Procedures guide for structured expert judgement in accident consequence modelling

Radiat. Prot. Dosim.

I-divergence geometry of probability distributions and minimization problems

Ann. Probab.

Information geometry and alternating minimization procedures

Statist. Decisions

An iterative procedure for estimation in contingency tables

Ann. Math. Statist.

Probabilités en vue des applications

An Analysis of Frequency Data

Adjustment by minimum discriminant information

Ann. Statist.