Model selection based on penalized ϕ-divergences for multinomial data

https://doi.org/10.1016/j.cam.2020.113181Get rights and content

Abstract

A test approach to the model selection problem for multinomial data based on penalized ϕ-divergences is proposed. The test statistic is a sample version of the difference of the distances between the population and each competing model. The null distribution of the test statistic is derived, showing that it depends on whether the competing models intersect or not and whether certain parameter is positive or not. All possible cases are characterized, and we give rules to decide if a model provides a better explanation for the available data than the other. The practical behavior of the proposal is evaluated by means of an extensive simulation experiment. The method is applied to a real data set related to the classification of individuals according to their social preferences.

Introduction

In many practical settings, individuals are classified into a finite number of unique non-overlapping categories, say k, and the experimenter collects the number of observations falling into each of such categories. That sort of data is called multinomial data. Along this paper it will be assumed that the available information can be summarized by means of a random vector X=(X1,,Xk) having a k-cell multinomial distribution with parameters n and π=(π1,,πk)Δ0k={(π1,,πk):πi0,1ik,i=1kπi=1}, XMk(n;π) in short. Notice that, if πΔ0k, then some components of π may be equal to 0, implying that some cell frequencies can be equal to zero, even for large samples.

In many instances, it is assumed that π belongs to a parametric family πP={P(θ)=(p1(θ),,pk(θ)),θΘ}Δk={(π1,,πk):πi>0,1ik,i=1kπi=1}, where ΘRs, ks1>0 and p1(), …, pk() are known real functions. In such a case π is usually estimated through P(θˆ)=(p1(θˆ),,pk(θˆ)), for some estimator θˆ of θ. A common choice for θˆ is the maximum likelihood estimator (MLE), which is known to have good asymptotic properties. Basu and Sarkar [1] and Morales et al. [2] have shown that these properties are shared by a larger class of estimators: the minimum ϕ-divergence estimators (MϕE). This class includes MLEs as a particular case. If the parametric model is not correctly specified, Jiménez-Gamero et al. [3] have shown that, under certain assumptions, the MϕEs still have a well defined limit and, conveniently normalized, they are asymptotically normal.

However, the presence of zero cell frequencies in practice introduce some drawbacks in inference (e.f. low converging rate to the distribution limit or less sensitivity against departures to the null hypothesis), and the finite sample performance of the MϕEs can be improved by modifying the weight that each ϕ-divergence assigns to the empty cells. The resulting estimator is called the minimum penalized ϕ-divergence estimator (MPϕE). Moreover, Mandal et al. [4] have shown that such estimators have the same asymptotic properties as the MϕEs when the model is true. Alba-Fernández et al. [5] have studied some asymptotic properties of MPϕEs under model misspecification, showing that, under certain assumptions, they converge to a well defined limit and, conveniently normalized, they are asymptotically normal.

In the above discussion we have assumed that there is one parametric family (possibly misspecified) that can be used to fit the data. In practice, two (or more) parametric families, which can be separate, overlapping or nested, could be considered. In such a situation the practitioner should decide which of the candidate families provides a better fit to the observed data. This is known as the model selection problem. Here we consider the model selection test problem, which consists in testing if the two competing models are equally close to the true population, against the hypothesis that one model is closer than the other. Vuong [6] has proposed tests for this problem that are based on the likelihood ratio statistic, which estimate the difference between the Kullback–Leibler distance between the each model and the true distribution. Although this approach is nice and well founded, some alternative procedures have been proposed by using other distances between distributions. Some examples are the approaches in Jiménez-Gamero et al. [7], that uses a distance between characteristic functions; Jiménez-Gamero and Batsidis [8], that uses a distance between probability generating functions for count data; Vuong and Wang [9], that uses a chi-square type distance for multinomial data; also for multinomial data Jiménez-Gamero et al. [3], [10] use ϕ-divergence and Kϕ-divergence measures, respectively; among other alternatives.

As said before, Jiménez-Gamero et al. [3] have studied the model selection test problem for multinomial data by using ϕ-divergence measures. Since, as illustrated in Mandal et al. [4], [11], the use of penalized ϕ-divergences can improve inferences, the objective of this manuscript is to study the model selection test problem for multinomial data by using penalized ϕ-divergences. As will be seen, the main advantage is the tests proposed outperform the ones existent based on non-penalized ϕ-divergences in terms of power, especially for small and moderate sample sizes.

The paper unfolds as follows. Section 2 describes the test approach to the model selection problem. The proposed test statistic is a sample version of the difference of the distances between the population and each competing model. For separate models, the approach is similar to the one studied in [3]. For non-separate models, the distribution of the test statistic depends on whether certain parameter (specifically, the asymptotic variance of the test statistic) is positive on not. This situation is characterized, and the obtained characterization let us propose a test. Finally, decision rules are developed for the model selection problem for separate, overlapping and nested models. Section 3 explores the behavior of proposed methods for small and moderate sample sizes by means of an extensive simulation experiment, whose results are summarized and reported. Section 4 illustrates the usefulness of the proposal to the classification of subjects according to their social preferences. All technical details such as assumptions and proofs are deferred to the last section.

Before ending this section we introduce some notation: all vectors are column vectors; the superscript denotes transpose; all limits in this paper are taken when n; L denotes convergence in distribution; P denotes convergence in probability; a.s. denotes the almost sure convergence; if xRk, with x=(x1,,xk), then Diag(x) is the k×k diagonal matrix whose (i,i) entry is xi, 1ik, and Σx=Diag(x)xx; Ik denotes the k×k identity matrix; to simplify notation, all 0s appearing in the paper represent vectors of the appropriate dimension.

Section snippets

The model selection test problem

Let ϕ:[0,)R{} be a continuous convex function. For arbitrary Q=(q1,,qk)Δ0k and P=(p1,,pk)Δk, the ϕ-divergence between Q and P is defined by (Csiszár [12]) Dϕ(Q,P)=i=1kpiϕ(qi/pi).Assume without loss of generality that ϕ(1)=0. Then the convexity of ϕ and ϕ(1)=0 immediately imply that Dϕ(Q,P)0. Notice that Dϕ(Q,P)=i/qi>0piϕ(qi/pi)+ϕ(0)i/qi=0pi.The penalized ϕ-divergence for the tuning parameter h>0 between Q and P is defined from the above expression by replacing ϕ(0) with h as

Numerical results

The results in Section 2 are asymptotic, that is, they are valid for large samples. To numerically evaluate the proposal for small or moderate sample sizes, we have carried out an extensive simulation experiment. This section summarizes and report the obtained results.

Separate models. Let us consider the following two separate parametric families,

Model P: XM5(n;π), with πP so that p1(θ)=12θ,p2(θ)=32θ,p3(θ)=θ(12θ)6,p4(θ)=θ3,p5(θ)=θ23,0<θ<1/2.

Model Q: XM5(n;π), with πQ so that q1(γ)=γ2,q2(

Application to a real data set: modeling social preferences

Standard economic theory typically assumes that agents make optimal decisions based on selfishness and perfect rationality. However, experimental research provides robust evidence of important deviations from the theoretical predictions. One of the most studied contexts is social dilemma, like public goods games, where social preferences of individuals interacting each other play a determining role [15].

Experimental literature shows a considerable heterogeneity in preferences. Therefore,

Conclusions

This paper deals with the model selection test problem for multinomial data based on penalized ϕ-divergences, with the aim of handling the drawbacks that the presence of zero cell frequencies in practical applications introduce in inferences. From the theory developed, we have given rules to decide if a model provides a better explanation for the available data than the other. To learn about the practical performance of such rules and compare them with the ones based on non-penalized

Assumptions

Assumption 1

P={P(θ)=(p1(θ),,pk(θ)),θΘ}Δk, where ΘRs, ks1>0 and p1(.), …, pk(.):Θ(0,1) are known twice continuously differentiable in intΘ functions.

Q={Q(γ)=(q1(γ),,qk(γ)),γΓ}Δk, where ΓRr, kr1>0 and q1(.), …, qk(.):Γ(0,1) are known twice continuously differentiable in intΓ functions.

Assumption 2

ϕ:[0,)R is a strictly convex function, twice continuously differentiable in (0,).

Assumption 3

Dϕ,h(π,P(θ)) has a unique minimum at θ0intΘ and Dϕ,h(π,Q(γ)) has a unique minimum at γ0intΓ.

Proofs

Notice that Dϕ,h(π,P(θ))=i=1mpi(θ

Acknowledgments

The research in this paper has been partially funded by grants: CTM2015–68276–R of the Spanish Ministry of Economy and Competitiveness (M.V. Alba-Fernández) and MTM2017-89422-P of the Spanish Ministry of Economy, Industry and Competitiveness , ERDF support included (M.D. Jiménez-Gamero).

References (17)

There are more references available in the full text version of this article.

Cited by (0)

View full text