Abstract
We propose a new, easy to implement, semiparametric estimator for binary-choice single-index models which uses parametric information in the form of a known link (probability) function and nonparametrically corrects it. Asymptotic properties are derived and the finite sample performance of the proposed estimator is compared to those of the parametric probit and semiparametric single-index model estimators of Ichimura (J Econ 58:71–120, 1993) and Klein and Spady (Econometrica 61:387–421, 1993). Results indicate that if the parametric start is correct, the proposed estimator achieves significant bias reduction and efficiency gains compared to Ichimura (1993) and Klein and Spady (1993). Interestingly, the proposed estimator still achieves significant bias reduction and efficiency gains even if the parametric start is not correct.
Similar content being viewed by others
Notes
For an exception see Ruud (1983).
Furthermore, fully nonparametric methods suffer from the so-called ‘curse of dimensionality’, i.e., as the number of regressors increases, estimation precision decreases rapidly. Single-index models, which are explained below, reduce this dimensionality problem to a scalar.
The fact that \(\beta \) is unknown and has to replaced with an estimator does not change this result as long as the estimator of \(\beta \) is \(\sqrt{n}\)-consistent (see Horowitz 1998, pp. 21–22).
Single-index models, unlike models which assume independence of u and x, allow for limited forms of heteroscedasticity (general but known form and unknown form if it depends only on the index). This limitation can be serious since, for instance, the assumption that \(Pr(y=1|v)\) depends only on the index does not allow a certain form of heteroscedasticity (random coefficients model) which may be important in applications.
The maximum score estimator of Manski (1975) and its smoothed version by Horowitz (1992) make zero conditional median assumption (median\((u|x)=0\)) which identifies the intercept term (zero conditional mean assumption is not sufficient for identification in a binary response model, see Manski (1988, p. 731); Horowitz 1998, section 3.2). These models allow for different forms of heteroscedasticity including random coefficients models although at the cost of a rate of convergence slower than \(\sqrt{n}\). In fact under this conditional median independence assumption \(\sqrt{n}\) consistency is not possible, see Pagan and Ullah 1999, p. 278 and Horowitz (1993).
An alternative scale normalization would be \(||\beta ||=1\) where \(||\cdot ||\) is the Euclidean norm.
Since there is no location restriction on u in single-index models and thus the intercept term is not identified, (4) is actually a nonparametric estimator of the distribution of \(u+\beta _0\) where \(\beta _0\) is the intercept. Also, Klein and Spady (1993) have additive terms in the numerator and denominator of (4) to control the rate at which numerator and denominator tend to zero. Ichimura (1993) utilizes indicator variables to trim those observations which correspond to small density values. A similar trimming function and an indicator variable enter multiplicatively to objective functions in (3) and (5) respectively. In this presentation those terms are ignored for simplicity.
A paper by Chen (2000) builds on Klein and Spady (1993) and shows that the intercept can be consistently estimated and there are possible efficiency gains in the estimation of slope coefficients although at the cost of stronger assumptions: a location restriction in the form of conditional symmetry, i.e., the density of u conditional on the regressors is symmetric around zero and an index restriction stronger than the one in Klein and Spady (1993), namely, the conditional density depends on x only through the squared index.
For this efficiency result, the weight function should depend on x only through the index as in binary-choice models where \(Var(y|x)=Var(y|xb)\).
Ichimura (1993) treats \(W(\cdot )\) as a known function.
Glad (1998) generalizes this to local \(p\text {th}\) order polynomial estimator which reduces to the Nadaraya–Watson for \(p=0\).
While the primary interest in single-index models is generally about the parameter estimates and marginal effects, there are cases—such as economic discrimination analyses—where the focus is on the estimated probabilities. For example, Blinder–Oaxaca-type decomposition studies interested in differences between groups (race, gender etc.) regarding a binary outcome such as computer ownership, teenage pregnancy, or school attendance, are primarily interested in differences in average estimated probabilities between groups (Fairlie 2005; Seah et al. 2017). Correctly estimated probabilities are critical to the accuracy of the gap being investigated.
For this to happen, the parametric start \(G(\cdot )\) should be a constant function. In density estimation, the Hjort and Glad (1995) estimator nests the usual Kernel density estimator if the parametric start is the uniform density over the space. But here a distribution function which is constant and which satisfies the continuity (of the index) assumption can not be found. One obvious example of such a constant distribution function, which does not satisfy the continuity assumption, is the unit point mass at a when \(z=a\) a.s.:
$$\begin{aligned} G(z) = \left\{ \begin{array}{ll} 0 &{} \hbox { if}\ z<a \\ 1 &{} \text {if }{z\geqslant a.} \end{array} \right. \end{aligned}$$Here not only is the continuity assumption not satisfied but also \(\beta \) is not identified with this \(G(\cdot )\).
In actual estimations, trimming has very little effect on the performance of the estimators. Klein and Spady paper reports simulation results from untrimmed estimator: “...the estimate obtained without any trimming performed quite similar to that under the trimming that we employed. Accordingly, we report results for the semiparametric estimator obtained without probability or likelihood trimming (Klein and Spady 1993, p. 406).”
There are two cases to consider: when the link function F is monotonic in the index and when it is not. If the underlying distribution is heteroscedastic, for instance, F need not be monotonic in the index.
Klein and Spady (1993) obtain a similar uniform convergence rate.
Even though it is not in the simulation design, at this point, it is instructive to digress to discuss maximum likelihood estimation of misspecified binary choice models. Ruud (1983) showed that when the explanatory variables are multivariate normal, maximum likelihood estimates of slope coefficients can still be estimated consistently up to scale even when the distributional assumption is not correct. More generally the result holds when
$$\begin{aligned} E({\tilde{x}}|xb=t)=c_0+c_1t \end{aligned}$$(10)where \(c_0\) and \(c_1\) are constants, which is satisfied when the explanatory variables are multivariate normal. Note that this consistency result does not hold for the probability estimates.
Another interesting implication of (10) is that when it holds, the semiparametric efficiency bound is the same as the parametric (Cramér–Rao) efficiency bound (see Cosslett 1987).
\(L_1\) norm is \(\int |F-{\hat{F}}|dx\) while \(L_2\) norm is \(\int (F-{\hat{F}})^2dx\) where F is the true unknown link function and \({\hat{F}}\) is an estimate of the unknown link function.
We also run the simulations with \(h=n^{-1/6.02}\) as in Klein and Spady (1993) and the results changed immaterially. In practical applications, the smoothing parameter should be chosen along with the parameter estimates by mazimizing the quasi likelihood function.
In general, \(L_p\)-norm is defined as \((E(|x|^p))^{1/p}\).
The derivation below assumes that \((y_i,x_i)\) is absolutely continuous. Obviously in binary-choice models this is not true as \(y_i\) is a Bernoulli random variable. We will keep the absolute continuity interpretation as it is more general and give the necessary changes here for the binary-response case. Using a notation similar to Klein and Spady (1993), let \(g_x\) be the unconditional density for x and \(g_{x|y}\) be the density for x conditional on y for \(y=0,1\). We have the following series of equalities
$$\begin{aligned} E(y|x)=F(x)=Pr(y=1|x)=\frac{Pr(y=1)g_{x|1}}{g_x}=\frac{g_{1x}}{g_x}=G(x)\frac{g_{1x}/G(x)}{g_x} =G(x)g(x) \end{aligned}$$where \(g(x)=(g_{1x}/G(x))/g_x\). Thus \({\hat{F}}_1=G(x){\hat{g}}(x){\hat{g}}_x\) where \({\hat{g}}(x){\hat{g}}_x=((n-1)h)^{-1}\sum _{j\ne i}\frac{y_j}{G(x_j)}K((x-x_j)/h).\) So there is no change from (11) to (18). In equation (18), we can take an iterated expectation to get \(E_X\left[ 1/G(x_j)K((z-x_j)/h)Pr(y_j=1|x)\right] =E_X\left[ K((z-x_j)/h)g(x)\right] =\int K((z-x)/h)g(x)g_x dx\). And now 1 / h times this last term would replace equation (19) and we can apply Taylor expansion to \(\psi (x)=g(x)g_x\).
References
Andrews DWK (1987) Consistency in nonlinear econometric models: a generic uniform law of large numbers. Econometrica 55:1465–1471
Bierens HJ (1987a) Uniform consistency of Kernel estimators of a regression function under generalized conditions. J Am Stat Assoc 78:699–707
Bierens HJ (1987b) Kernel estimators of regression functions. In: Bewley TF (ed) Advances in econometrics, vol 1. Cambridge University Press, Cambridge
Chen S (2000) Efficient estimation of binary choice models under symmetry. J Econ 96:183–199
Cosslett SR (1987) Efficiency bounds for distribution-free estimators of the binary choice and censored regression models. Econometrica 55:559–586
Diiro G, Sam AG (2015) Agricultural technology adoption and nonfarm earnings in Uganda: a semiparametric analysis. J Dev Areas 49(2):145–62
Diiro G, Ker AP, Sam AG (2015) The Role of gender in fertiliser adoption in Uganda. Afr J Agric Resour Econ 10(2):117–30
Fairlie RW (2005) An extension of the Blinder–Oaxaca decomposition technique to logit and probit models. J Econ Soc Meas 30(4):305–316
Fristedt B, Gray L (1997) A modern approach to probability theory. Birkhäuser, Basel
Frölich M, Huber M, Wiesenfarth M (2017) The finite sample performance of semi-and nonparametric estimators for treatment effects and policy evaluation. Comput Stat Data Anal 115:91–102
Glad IK (1998) Parametrically guided nonparametric regression. Scand J Stat 25:649–668
Hjort NL, Glad IK (1995) Nonparametric density estimation with a parametric start. Ann Stat 23:882–904
Horowitz JL (1992) A smoothed maximum score estimator for the binary response model. Econometrica 60:505–531
Horowitz JL (1993) Semiparametric and nonparametric estimation of quantal response models. In: Maddala GS, Rao CR, Vinod HD (eds) Handbook of statistics, vol 11. North-Holland, Amsterdam
Horowitz JL (1998) Semiparametric methods in econometrics. Springer, Berlin
Horowitz JL, Härdle W (1996) Direct semiparametric estimation of single-index models with discrete covariates. J Am Stat Assoc 91(436):1632–1640
Ichimura H (1993) Semiparametric least squares (SLS) and weighted SLS estimation of single-index models. J Econ 58:71–120
Ichimura H, Lee LF (1991) Semiparametric least squares estimation of multiple index models: single equation estimation. In: Barnett WA, Powell J, Tauchen G (eds) Nonparametric and semiparametric methods in econometrics and statistics. Cambridge University Press, Cambridge
Jones MC, Signorini DF (1997) A comparison of higher-order bias kernel density estimators. J Am Stat Assoc 92:1063–1073
Klein RW, Spady RH (1993) An efficient semiparametric estimator for binary response models. Econometrica 61:387–421
Manski CF (1975) The maximum score estimation of the stochastic utility model of choice. J Econ 3:205–228
Manski CF (1988) Identification of binary response models. J Am Stat Assoc 83:729–738
Mishra K, Sam AG, Miranda MJ (2017) You are approved! Insured loans improve credit access and technology adoption of ghanaian farmers. Working paper, The Ohio State University
Pagan A, Ullah A (1999) Nonparametric econometrics. Cambridge University Press, Cambridge
Powell JL (1994) Estimation of semiparametric models. In: Engle RF, McFadden DL (eds) Handbook of econometrics, vol 4. North-Holland, Amsterdam
Ruud PA (1983) Sufficient conditions for the consistency of maximum likelihood estimation despite misspecification of distribution in multinomial discrete choice models. Econometrica 51:225–228
Sam AG, Jiang GJ (2009) Nonparametric estimation of the short rate diffusion process from a panel of yields. J Financ Quant Anal 44:1197–1230
Sam AG, Ker AP (2006) Nonparametric regression under alternative data environments. Stat Probab Lett 76(10):1037–1046
Schuster E, Yakowitz S (1979) Contributions to the theory of nonparametric regression with application to system identification. Ann Stat 7:139–149
Seah KY, Fesselmeyer E, Le K (2017) Estimating and decomposing changes in the whiteblack homeownership gap from 2005 to 2011. Urban Stud 54(1):119–36
Van Birke MS, Bellegem Van, Keilegom I (2017) Semi-parametric estimation in a single-index model with endogenous variables. Scand J Stat 44(1):168–91
Yatchew A, Griliches Z (1985) Specification error in probit models. Rev Econ Stat 67:134–139
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
We make the following assumptions:
- Assumption 1:
-
Observed sample \((x_i,y_i)\), \(i=1,\ldots ,n\) is i.i.d.
- Assumption 2:
-
\(B\subset {\mathbb {R}}^q\) is compact and the true parameter vector \(b_0\) is in the interior of B.
- Assumption 3:
-
\(A_x\) is compact.
- Assumption 4:
-
K(s) is a density. Furthermore \(\int sK(s)ds=0\), \(K(s)=0\) for \(s<-1\) and \(s>1\), and its second derivative satisfies a Lipschitz condition.
- Assumption 5:
-
Parametric start G is uniformly bounded over x, b and \(G(xb)\ne 0\; \forall x,b\in A_x\times B\).
- Assumption 6:
-
\(\int |\phi (t)|dt < \infty \) where \(\phi (t)\) is the characteristic function of K.
- Assumption 7:
-
There exist \({\underline{F}}\) and \({\overline{F}}\) that do not depend on x such that \(0<{\underline{F}}\leqslant F(xb) \leqslant {\overline{F}}~<~1 \quad \forall b \in B\).
Proof of lemma 1
Our proof of lemma 1 follows closely Bierens (1987a, b) and Pagan and Ullah (1999, pp. 36–39). Note that \({\hat{F}}\) in (9) can be written as \({\hat{F}}_{1} / {\hat{F}}_{2}\) where
Since \({\hat{F}}_{2}\) is \({\hat{F}}_{1}\) with \(y_j=1\) and \(G(\cdot )\) a constant function, we will only show uniform convergence of \({\hat{F}}_{1}\).Footnote 22 Now observe that
Let \(g(x)=\int y \frac{1}{G(x)} f(y,x)dy / \int f(y,x)dy\) and \(h(x)=\int f(y,x)dy\). So \(F(x)=G(x)g(x)\) and \(G(x)g(x)h(x)=G(x)\int y\frac{1}{G(x)}f(y,x)dy\). Thus \({\hat{F}}_{1}=G(x){\hat{g}}(x){\hat{h}}(x)\). Notice that by assumption 5, G is uniformly bounded and we have \(\sup _x |G(x)|=O(1)\). In fact a plausible start is a distribution function in which case \(\sup _x |G(x)|=1\). Hence it suffices to show \(\sup _x |{\hat{g}}(x){\hat{h}}(x) - g(x)h(x)|\rightarrow 0\). So we have
Like Ichimura (1993), we will refer to the second term of the right-hand side as bias term and show that it converges to 0 at the rate \(h^2\). But first notice that
From the inversion formula (see Fristedt et al. 1997, p. 231) and by assumption 6 we have
where \(\phi (t)\) is the characteristic function of K and \(i^2=-1\). Using (12) and (13) and letting \(s=t/h\) we get
From (14) we get
From (14) and (15) and noting that \(|\exp (-itx)|=1\)
So
Using \(\exp (itx_j)=\cos (tx_j)+i\sin (txj)\) we can write
Note that we can write \(|A+iB|=(A^2+B^2)^{1/2}\) and so \(E|A+iB|=E(A^2+B^2)^{1/2}\leqslant (EA^2+EB^2)^{1/2}=(Var(A)+Var(B))^{1/2}\) where the inequality comes from Jensen’s inequality and by construction \(EA=0\) and \(EB=0\). So (16) is
Note that \(VarX\leqslant EX^2\) so
noting that \(\cos ^2 tx_j+\sin ^2 tx_j = 1\). So
after a change of variables (\(s=ht\)) and the last term goes to zero as \(h\sqrt{n}\rightarrow \infty \). Finally using Markov’s inequality with (17) we get
Now for \(|E{\hat{g}}(x){\hat{h}}(x)- g(x)h(x)|\) note that
Now let \(\psi (x)=g(x)h(x)\) and \(s=(z-x)/h\) for the Taylor expansion
So
Thus we can write
so the last term goes to zero at the rate \(h^2\). This completes the proof of lemma 1.
Proof of theorem 1
Let z represent a point on the support of \(z_i\). Given the binary nature of the dependent variable \(y_i\) and the fact that \(P(y_i=1)=F(z_i)\), we have \(E(y_i|z_i)=F(z_i)\). Hence: \(y_i=F(z_i)+\epsilon _i\) where \(\epsilon \) is an i.i.d error term such that \(E(\epsilon _i|z_i=0)\) and \(Var(\epsilon _i|z_i)=F(z_i)(1-F(z_i))~~\forall i.\) In deriving the asymptotic properties of the proposed link function estimator (PGSIM), we require, in addition to Assumptions 1 and 5, the following assumptions:
- Assumption 8:
-
The density function of the index f(z) with bounded support Z, and the unknown link function F(z) \(\in {{{\mathcal {C}}}}^{2}(\Theta )\) with finite second derivatives and \(f(z)\ne 0\) in \(\Theta \), the neighborhood of point z.
- Assumption 9:
-
The Kernel function K(s) is bounded, real-valued, with the following characteristics: (i) \(\int K(s)ds=1\), (ii) K(s) is symmetric about 0, (iii) \(\int s^{2}K(s)ds <\infty \), (iv) \(\vert s\vert K(\vert s \vert )\rightarrow 0\) as \(\vert s \vert \rightarrow \infty \), (v) \(\int K^{2}(s)ds \le \infty .\)
- Assumption 10:
-
\(h\rightarrow 0\) and \(nh\rightarrow \infty \).
- Assumption 11:
-
\(E\left| \frac{G(z)}{G(z_i)}\right| ^{2+\delta },~~E\vert \epsilon _{i}\vert ^{2+\delta }\), and \(\int \vert K(\omega )\vert ^{2+\delta }\) are finite for some \(\delta > 0.\)
The nonparametric Kernel estimator of the link function (Ichimura 1993; Klein and Spady 1993) is:
Denoting \(\mu _{2}=\int s^{2}K(s)ds\) and \( \text{ R }(K)= \int K^{2}(s) dz\), standard properties of \({\tilde{F}}(z)\) are:
where \(\sigma ^2(z)=F(z)(1-F(z))\).
Now consider our parametrically guided nonparametric estimator of the link function:
First, we derive the bias and variance of the proposed estimator. We have
Per assumption 1, we have,
It can be easily seen that \(E(D_n)=0\). Using the fact that \({\hat{f}}(z)=f(z)+o_p(1)\), we obtain
Turning to the variance of the estimator, we have
Hence \(Var(C_n)=o_p((nh)^{-1})\).
Finally,
Piecing the results together, we have
Note that the variance function is the same as the variance of the Klein and Spady or Ichimura estimator.
Next, we show that the proposed semiparametric link function estimator \({\hat{F}}(z)\) has a limiting normal distribution
where \(B(h)=\frac{1}{2}\mu _{2}h^{2}\left( r^{\prime \prime }(z)+2r^{\prime }(z)\frac{f^{\prime }(z)}{f(z)}\right) G(z)\) and \(\Sigma =\frac{\sigma ^{2}(z)}{f(z)}R(K)\).
From the results above, it can be seen that
We have also shown that \(E(D_{n})=0\) and \(Var(D_{n})=(nh)^{-1}\left( \sigma ^{2}(z)R(K)f(z)+o(1)\right) .\) \(D_{n}\) is a triangular array of i.i.d. random variables; thus, under assumption A6,
Hence we can apply Liapounov’s central limit theorem to obtain \(\sqrt{nh} (D_{n}) \rightarrow {\mathcal {N}}(0,f^{2}(x)\Sigma )\). Since \(plim {\hat{f}}(z)=f(z)\), it also follows that
Proof of theorem 2
Note that
where
Let \({\hat{Q}}_{1n}=n^{-1}\sum _{i=1}^n {\mathbf {1}}_{[x_i\in A_x]}y_i \log [{\hat{F}}(x_i b)]\) and similarly for \(Q_{1n}\). Let \({\hat{F}}_i\equiv {\hat{F}}(x_ib)\) and similarly for \(F_i\). \({\hat{Q}}_{1n}\) can be viewed as a function of \({\hat{F}}_i\), so from a Taylor expansion of \({\hat{Q}}_{1n}\) about \(F_i\) we get
where \({\tilde{F}}_i\) is between \({\hat{F}}_i\) and \(F_i\). So we have
where \(q_i={\mathbf {1}}_{[x_i\in A_x]}y_i / {\tilde{F}}_i\). Note that \(\sup _{i,b}|q_i|=O(1)\) and from lemma 1 \(\sup _b|{\hat{F}}_i-F_i|\rightarrow 0\) in probability so \(\sup _b |{\hat{Q}}_{1n}-Q_{1n}|=o_p(1)\). A similar result can be obtained for \(y_i=0\) part of the likelihood. Thus we have \(\sup _b |{\hat{Q}}_n(b) - Q_n(b) |=o_p(1)\).
Now let \(q_i(x_i,y_i,b)={\mathbf {1}}_{[x_i\in A_x]}(y_{i}\log [F(x_i b)]+(1-y_i)\log [1-F(x_i b)])\). So
As Ichimura (1993, p. 91), we can use the uniform law of large numbers by Andrews (1987) and so (28) goes to 0 in probability. This completes the proof of theorem 2.
Rights and permissions
About this article
Cite this article
Ker, A.P., Sam, A.G. Semiparametric estimation of the link function in binary-choice single-index models. Comput Stat 33, 1429–1455 (2018). https://doi.org/10.1007/s00180-017-0779-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-017-0779-2