Model selection based on penalized -divergences for multinomial data
Introduction
In many practical settings, individuals are classified into a finite number of unique non-overlapping categories, say , and the experimenter collects the number of observations falling into each of such categories. That sort of data is called multinomial data. Along this paper it will be assumed that the available information can be summarized by means of a random vector having a -cell multinomial distribution with parameters and , in short. Notice that, if , then some components of may be equal to 0, implying that some cell frequencies can be equal to zero, even for large samples.
In many instances, it is assumed that belongs to a parametric family , where , and , …, are known real functions. In such a case is usually estimated through , for some estimator of . A common choice for is the maximum likelihood estimator (MLE), which is known to have good asymptotic properties. Basu and Sarkar [1] and Morales et al. [2] have shown that these properties are shared by a larger class of estimators: the minimum -divergence estimators (ME). This class includes MLEs as a particular case. If the parametric model is not correctly specified, Jiménez-Gamero et al. [3] have shown that, under certain assumptions, the MEs still have a well defined limit and, conveniently normalized, they are asymptotically normal.
However, the presence of zero cell frequencies in practice introduce some drawbacks in inference (e.f. low converging rate to the distribution limit or less sensitivity against departures to the null hypothesis), and the finite sample performance of the MEs can be improved by modifying the weight that each -divergence assigns to the empty cells. The resulting estimator is called the minimum penalized -divergence estimator (MPE). Moreover, Mandal et al. [4] have shown that such estimators have the same asymptotic properties as the MEs when the model is true. Alba-Fernández et al. [5] have studied some asymptotic properties of MPEs under model misspecification, showing that, under certain assumptions, they converge to a well defined limit and, conveniently normalized, they are asymptotically normal.
In the above discussion we have assumed that there is one parametric family (possibly misspecified) that can be used to fit the data. In practice, two (or more) parametric families, which can be separate, overlapping or nested, could be considered. In such a situation the practitioner should decide which of the candidate families provides a better fit to the observed data. This is known as the model selection problem. Here we consider the model selection test problem, which consists in testing if the two competing models are equally close to the true population, against the hypothesis that one model is closer than the other. Vuong [6] has proposed tests for this problem that are based on the likelihood ratio statistic, which estimate the difference between the Kullback–Leibler distance between the each model and the true distribution. Although this approach is nice and well founded, some alternative procedures have been proposed by using other distances between distributions. Some examples are the approaches in Jiménez-Gamero et al. [7], that uses a distance between characteristic functions; Jiménez-Gamero and Batsidis [8], that uses a distance between probability generating functions for count data; Vuong and Wang [9], that uses a chi-square type distance for multinomial data; also for multinomial data Jiménez-Gamero et al. [3], [10] use -divergence and -divergence measures, respectively; among other alternatives.
As said before, Jiménez-Gamero et al. [3] have studied the model selection test problem for multinomial data by using -divergence measures. Since, as illustrated in Mandal et al. [4], [11], the use of penalized -divergences can improve inferences, the objective of this manuscript is to study the model selection test problem for multinomial data by using penalized -divergences. As will be seen, the main advantage is the tests proposed outperform the ones existent based on non-penalized -divergences in terms of power, especially for small and moderate sample sizes.
The paper unfolds as follows. Section 2 describes the test approach to the model selection problem. The proposed test statistic is a sample version of the difference of the distances between the population and each competing model. For separate models, the approach is similar to the one studied in [3]. For non-separate models, the distribution of the test statistic depends on whether certain parameter (specifically, the asymptotic variance of the test statistic) is positive on not. This situation is characterized, and the obtained characterization let us propose a test. Finally, decision rules are developed for the model selection problem for separate, overlapping and nested models. Section 3 explores the behavior of proposed methods for small and moderate sample sizes by means of an extensive simulation experiment, whose results are summarized and reported. Section 4 illustrates the usefulness of the proposal to the classification of subjects according to their social preferences. All technical details such as assumptions and proofs are deferred to the last section.
Before ending this section we introduce some notation: all vectors are column vectors; the superscript denotes transpose; all limits in this paper are taken when ; denotes convergence in distribution; denotes convergence in probability; denotes the almost sure convergence; if , with , then is the diagonal matrix whose entry is , , and ; denotes the identity matrix; to simplify notation, all 0s appearing in the paper represent vectors of the appropriate dimension.
Section snippets
The model selection test problem
Let be a continuous convex function. For arbitrary and , the -divergence between and is defined by (Csiszár [12]) Assume without loss of generality that . Then the convexity of and immediately imply that . Notice that The penalized -divergence for the tuning parameter between and is defined from the above expression by replacing with as
Numerical results
The results in Section 2 are asymptotic, that is, they are valid for large samples. To numerically evaluate the proposal for small or moderate sample sizes, we have carried out an extensive simulation experiment. This section summarizes and report the obtained results.
Separate models. Let us consider the following two separate parametric families,
Model : , with so that
Model : , with so that
Application to a real data set: modeling social preferences
Standard economic theory typically assumes that agents make optimal decisions based on selfishness and perfect rationality. However, experimental research provides robust evidence of important deviations from the theoretical predictions. One of the most studied contexts is social dilemma, like public goods games, where social preferences of individuals interacting each other play a determining role [15].
Experimental literature shows a considerable heterogeneity in preferences. Therefore,
Conclusions
This paper deals with the model selection test problem for multinomial data based on penalized -divergences, with the aim of handling the drawbacks that the presence of zero cell frequencies in practical applications introduce in inferences. From the theory developed, we have given rules to decide if a model provides a better explanation for the available data than the other. To learn about the practical performance of such rules and compare them with the ones based on non-penalized
Assumptions
Assumption 1 , where , and , …, are known twice continuously differentiable in functions. , where , and , …, are known twice continuously differentiable in functions.
Assumption 2 is a strictly convex function, twice continuously differentiable in .
Assumption 3 has a unique minimum at and has a unique minimum at .
Proofs
Notice that
Acknowledgments
The research in this paper has been partially funded by grants: CTM2015–68276–R of the Spanish Ministry of Economy and Competitiveness (M.V. Alba-Fernández) and MTM2017-89422-P of the Spanish Ministry of Economy, Industry and Competitiveness , ERDF support included (M.D. Jiménez-Gamero).
References (17)
- et al.
On disparity based goodness-of-fit tests for multinomial models
Statist. Probab. Lett.
(1994) - et al.
Asymptotic divergence of estimates of discrete distributions
J. Statist. Plann. Inference
(1995) - et al.
Minimum -divergence estimation in misspecified multinomial models
Comput. Stat. Data Anal.
(2011) - et al.
Minimum chi-square estimation and tests for model selection
J. Econometrics
(1993) - et al.
Bootstrapping divergence statistics for testing homogeneity in multinomial populations
Math. Comput. Simulation
(2009) - et al.
Minimum disparity inference and the empty cell penalty: asymptotic results
Sankhya A
(2010) - et al.
Minimum penalized -divergence estimation under model misspecification
Entropy
(2018) Likelihood ratio tests for model selection and non-nested hypotheses
Econometrica
(1989)