Unified generalized iterative scaling and its applications

doi:10.1016/j.csda.2009.10.017

Computational Statistics & Data Analysis

Volume 54, Issue 4, 1 April 2010, Pages 1066-1078

https://doi.org/10.1016/j.csda.2009.10.017 Get rights and content

Abstract

Generalized iterative scaling (GIS) has become a popular method for getting the maximum likelihood estimates for log-linear models. It is basically a sequence of successive $I$ -projections onto sets of probability vectors with some given linear combinations of probability vectors. However, when a sequence of successive $I$ -projections are applied onto some closed and convex sets (e.g., marginal stochastic order), they may not lead to the actual solution. In this manuscript, we present a unified generalized iterative scaling (UGIS) and the convergence of this algorithm to the optimal solution is shown. The relationship between the UGIS and the constrained maximum likelihood estimation for log-linear models is established. Applications to constrained Poisson regression modeling and marginal stochastic order are used to demonstrate the proposed UGIS.

Introduction

Let $p$ and $π$ be two probability vectors in $R^{k}$ . The $I$ -divergence of $p$ with respect to $π$ (also known as the Kullback–Liebler information number, cross entropy and information for discrimination) is defined by $I (p | π) = \sum_{i = 1}^{k} p_{i} log (p_{i} / π_{i}) .$ For any given $π$ , it is often of interest to find $p^{*}$ in some set $E$ such that $I (p^{*} | π) = \sum_{i = 1}^{k} p_{i}^{*} log (p_{i}^{*} / π_{i}) = min_{p \in E} \sum_{i = 1}^{k} p_{i} log (p_{i} / π_{i}) .$ The above minimization problem is usually called the $I$ -projection problem, for $π$ onto $E$ , and the associated solution $p^{*}$ is called the $I$ -projection of $π$ onto $E$ . It has long been known that $I$ -projection plays a key role in the information theoretic approach to statistics (Kullback, 1959, Good, 1963, Bishop et al., 1975).

In some applications, $E$ is usually assumed to be of the form ${p : h - A^{'} p \in K}$ for some given vector $h$ , matrix $A$ and convex cone $K$ . This form often occurs in statistics. For $K = {{(0, \dots, 0)}^{'}}$ , Deming and Stephan (1940) formally introduced the iterative proportional fitting procedure (IPFP) to adjust cell frequencies of contingency tables when all elements of $A$ are equal to 0 or 1. Ireland and Kullback (1968) showed the convergence of IPFP to the $I$ -projection when marginals of contingency tables are given. Darroch and Ratcliff (1972) proposed the generalized iterative scaling (GIS) to obtain the $I$ -projection for general $A$ and established the relations between maximum likelihood estimation for log-linear models and $I$ -projections. Dykstra and Lemke (1988) demonstrated that the maximum likelihood estimation for discrete distributions has close relationships with $I$ -projections onto $E = {p : h - A^{'} p \in K}$ for some $h$ and $A$ . Dykstra (1985) proposed an iterative procedure for obtaining $I$ -projections onto the intersection of convex sets, and Dykstra and Wollan (1987) devised a computer program based on the iterative procedure. Winkler (1990), Bhattacharya and Dykstra, 1995, Bhattacharya and Dykstra, 1997 and Kuroda and Geng (1999) considered similar problems.

Kullback (1968), Csiszar, 1975, Csiszar, 1989, Haberman (1984) and Ruschendorf and Thomsen (1993) considered $I$ -projection problems for probability measures. Ruschendorf (1995) demonstrated that the IPFP for probability also converges to the $I$ -projection with given marginals. Bhattacharya (2006) considered an iterative procedure for probability measures to obtain $I$ -projections onto the intersection of convex sets.

Gao and Shi (2003) considered the $I$ -projection problem when $K$ consists of some inequality constraints. They proposed an iterative algorithm for finding the solutions and proved that the proposed algorithm converges to an $I$ -projection. The algorithm partly generalizes GIS. They also established the relationship between $I$ -projections and the maximum likelihood estimation for log-linear models with ordered parameters.

Analysis of ordinal data is a challenging problem. One of the popular models for ordinal data is the log-linear model, whose parameters are often restricted by some ordering such as odds ratios increasing with ordinal categories (Agresti and Coull, 2002). The aim of this paper is to provide new algorithms for analyzing ordinal data. An iterative algorithm is proposed for computing $I$ -projections for when $K = C^{*}$ and $C^{*}$ is the Fenchel dual cone of an isotonic cone $C$ . The algorithm reduces to the famous GIS when $C^{*} = {{(0, \dots, 0)}^{'}}$ is chosen. The new method is called the unified generalized iterative scaling (UGIS). Relationships between $I$ -projections and maximum likelihood estimations of restricted parameters for log-linear models are demonstrated. This paper is organized as follows. In Section 2, an $I$ -projection problem is considered and relations between $I$ -projections and log-linear models are given. The UGIS is introduced and the related algorithms are proposed in Section 3. The relationships between UGIS and maximum likelihood estimation of constrained parameters for log-linear models are established. Poisson regression modeling and marginal stochastic order are used to demonstrate the proposed algorithms in Section 4.

Section snippets

$I$ -projections and log-linear models

In this section, we will describe the relation between the $I$ -projection problem on the Fenchel dual cone of an isotonic cone and log-linear models with restricted ordered parameters. For this purpose, we begin with some necessary definitions. A binary relation $⪯$ on a finite set { $x_{1}$ , $x_{2}$ , …, $x_{s}$ } is a quasi-order if it is reflexive (i.e., $x$ $⪯$ $x$ , $\forall$ $x$ $\in$ { $x_{1}$ , $x_{2}$ , …, $x_{s}$ }) and transitive (i.e., $x$ $⪯$ $y$ and $y$ $⪯$ $z$ imply $x$ $⪯$ $z$ , $\forall$ $x$ , $y$ , $z$ $\in$ { $x_{1}$ , $x_{2}$ , …, $x_{s}$ }) only.

Definition 2.1

Let $⪯$ be a quasi-order defined on ${1, 2, \dots, s}$ . Then, $C = {x \in R^{s} :$

Unified generalized iterative scaling algorithms

Unified generalized iterative scaling (UGIS) is a method of finding the optimal solution of (3). In this section, we propose UGIS algorithms for the kinds of (3). We prove that the proposed algorithms will lead to convergent optimal solutions to the corresponding $I$ -projections. According to Lemma 2.2, without loss of generality, suppose that the matrix of $A$ given in (3) is $a_{i j} \geq 0$ and $\sum_{j = 1}^{s} a_{i j} = 1$ for $i = 1, \dots, k; j = 1, \dots, s$ . For any $x \in R^{s}$ and weight $w = {(w_{1}, \dots, w_{s})}^{'}$ , denote the projection of $x$ onto $C$ by $\hat{x} = P_{w} (x$

Examples

The following examples were chosen to demonstrate the applications of the proposed algorithms for log-linear modeling. The first example concerns a Poisson regression model with the regression coefficient being restricted to an isotonic cone. In the second example, we illustrate how the problem of marginal stochastic ordering in a square contingency table is transformed into an $I$ -projection problem on (14).

Poisson regression modeling. Suppose that the $Y_{i}$ given the covariates $x_{i}$ ( $i = 1, \dots, N$ ) are

Acknowledgements

The authors thank the associate editor and two anonymous reviewers for helpful comments and suggestions on an earlier version of this article. This work was supported by NSFC:10701021, NSFC:10931002, NSFC:10828102 and NENU-STC07001. M.L. Tang’s research was fully supported by a grant from the Research Grant Council of the Hong Kong Special Administrative Region (Project No. KBU261508) and the Hong Kong Baptist university Grant FRG2/08-09/066.

References (28)

A. Agresti et al.
The analysis of contingency tables under inequality constraints
J. Statist. Plann. Inference
(2002)
B. Bhattacharya et al.
A general duality approach to $I$ -projections
J. Statist. Plann. Inference
(1995)
L. Ruschendorf et al.
Note on the Schrodinger equation and $I$ -projections
Statist. Probab. Lett.
(1993)
A. Agresti
Categorical Data Analysis
(2002)
B. Bhattacharya et al.
A Fenchel duality aspect of iterative $I$ -projection procedures
Ann. Inst. Statist. Math.
(1997)
B. Bhattacharya
An iterative procedure for general probability measures to obtain $I$ -projection onto intersection of convex sets
Ann. Statist.
(2006)
Y.M.M. Bishop et al.
Discrete Multivariate Analysis: Theory and Practice
(1975)
I. Csiszar
Information-type measures of difference of probability distributions and indirect observations
Studia Sci. Math. Hungar.
(1967)
I. Csiszar
$I$ -divergence geometry of probability distributions and minimization problems
Ann. Probab.
(1975)
I. Csiszar
A geometric interpretation of Darroch and Ratcliff’s generalized iterative scaling
Ann. Statist.
(1989)

J.N. Darroch et al.

Generalized iterative scaling for loglinear models

Ann. Math. Statist.

(1972)

W.E. Deming et al.

On a least squares adjustment of a sampled frequency table when the expected marginal totals are known

Ann. Math. Statist.

(1940)

R. Dykstra

An iterative procedure for obtaining $I$ -projections onto the intersection of convex sets

Ann. Probab.

(1985)

R. Dykstra et al.

Duality of $I$ -projections and maximum likelihood estimation for log-linear models under cone constraints

J. Amer. Statist. Assoc.

(1988)

Cited by (0)

View full text

Unified generalized iterative scaling and its applications

Abstract

Introduction

Section snippets

I-projections and log-linear models

Unified generalized iterative scaling algorithms

Examples

Acknowledgements

J. Statist. Plann. Inference

J. Statist. Plann. Inference

Statist. Probab. Lett.

Categorical Data Analysis

A Fenchel duality aspect of iterative I-projection procedures

Ann. Inst. Statist. Math.

An iterative procedure for general probability measures to obtain I-projection onto intersection of convex sets

Ann. Statist.

Discrete Multivariate Analysis: Theory and Practice

Information-type measures of difference of probability distributions and indirect observations

Studia Sci. Math. Hungar.

I-divergence geometry of probability distributions and minimization problems

Ann. Probab.

A geometric interpretation of Darroch and Ratcliff’s generalized iterative scaling

Ann. Statist.

Generalized iterative scaling for loglinear models

Ann. Math. Statist.

On a least squares adjustment of a sampled frequency table when the expected marginal totals are known

Ann. Math. Statist.

An iterative procedure for obtaining I-projections onto the intersection of convex sets

Ann. Probab.

Duality of I-projections and maximum likelihood estimation for log-linear models under cone constraints

J. Amer. Statist. Assoc.

$I$ -projections and log-linear models

A Fenchel duality aspect of iterative $I$ -projection procedures

An iterative procedure for general probability measures to obtain $I$ -projection onto intersection of convex sets

$I$ -divergence geometry of probability distributions and minimization problems

An iterative procedure for obtaining $I$ -projections onto the intersection of convex sets

Duality of $I$ -projections and maximum likelihood estimation for log-linear models under cone constraints