Nonparametric estimation of the coefficient of overlapping—theory and empirical application

https://doi.org/10.1016/j.csda.2005.01.014Get rights and content

Abstract

The coefficient of overlapping OVL measures the amount of agreement of two probability distributions. Statistical inference for OVL has been mainly investigated in a parametric framework. Five strongly consistent nonparametric estimators for OVL based on kernel density estimation are suggested. A Monte-Carlo simulation investigates bias and standard deviation of the estimators in finite samples. Results of an empirical application to German labor income data of men and women (based on GSOEP data) are presented. It is shown that there is much more agreement of labor income distributions of men and women in East Germany (new Federal States) than in West Germany (old Federal States).

Introduction

The coefficient of overlapping OVL was first introduced by Weitzman (1970). It is defined as the common area under two probability densities and was used as a measure of agreement of two income distributions.

Estimation and inference for OVL has been mainly investigated in a parametric framework, especially under the normality assumption for both distributions (see Inman and Bradley Jr. (1989), Mishra et al. (1986), Mulekar and Mishra (2000) and Reiser and Faraggi (1999)).

The only attempt to estimate OVL in a purely nonparametric way we know of is Clemons and Bradley Jr. (2000), and Clemons (2001). These authors apply kernel density techniques for estimation. Their method is described verbally using available computer programs in Fortran. The estimator can be computed by following several steps as stated in Section 2.3 of their paper.

This paper suggests five estimators for OVL which also make use of kernel density estimation techniques. They are based on five different representations of OVL. In particular, they can be viewed as empirical versions of the various representations of OVL where densities are replaced by kernel density estimates and integrals are replaced by simple quadrature formulas.

The paper is organized as follows. Section 2 introduces OVL according to Weitzman and presents different representations of OVL. They are derived from the basic definition of OVL by applying simple transformation rules for integrals and expectations. An interesting relation of OVL to the total variation distance is indicated at the end of the section. Section 3 derives the five estimators for OVL which are investigated in this paper. Their strong consistency is shown under mild assumptions. Bias and variance of the estimators are investigated by Monte-Carlo (MC) simulation for samples of size n=100 and n=500.

Section 4 gives an empirical application to German labor income data for men and women taken from the German Socioeconomic Panel (GSOEP).

Section snippets

The coefficient of overlapping

Let X and Y denote two univariate random variables with corresponding absolutely continuous distribution functions F and G and Lebesgue densities f and g, respectively. The coefficient of overlapping is defined according to Weitzman (1970) byOVL(X,Y)=-+min{f(x),g(x)}dx.Obviously OVL(X,Y)=1 if and only if the distributions of X and Y are equal and OVL(X,Y)=0 if and only if the supports of the distributions of X and Y have no interior points in common. Therefore, OVL(X,Y) can be interpreted as a

Derivation of the estimators

The estimators presented in this section are based on (2.1) to (2.6). In all cases

  • the densities are replaced by appropriate kernel density estimators,

  • the integrals are replaced by an appropriate quadrature formula or the sample mean.

Let x1,,xn and y1,,ym denote i.i.d. observations of the random variables X and Y. A density estimator for f is obtained by f^n(x)=f^n(x|xi,i=1,,n)=1ni=1n1bKx-xib,where kernel K and bandwidth b are to be determined. We define g^m analogously.

Note that-+min{f^n(

Empirical application to German labor income data

The five estimators investigated in Section 3 are applied to labor income data of German men and women as collected in the German Socioeconomic Panel (GSOEP) (see e.g. SOEP Group, 2001) from 1991 to 2000.

Data refers to old Federal States (West Germany), to new Federal States (East Germany) and to all Federal States (Unified Germany). Labor earnings is the sum of income from primary job, secondary job, self-employment, 13th month pay, 14th month pay, Christmas bonus pay, holiday bonus pay,

References (14)

There are more references available in the full text version of this article.

Cited by (0)

View full text