Skip to main content
Log in

Semiparametric estimation of the link function in binary-choice single-index models

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

We propose a new, easy to implement, semiparametric estimator for binary-choice single-index models which uses parametric information in the form of a known link (probability) function and nonparametrically corrects it. Asymptotic properties are derived and the finite sample performance of the proposed estimator is compared to those of the parametric probit and semiparametric single-index model estimators of Ichimura (J Econ 58:71–120, 1993) and Klein and Spady (Econometrica 61:387–421, 1993). Results indicate that if the parametric start is correct, the proposed estimator achieves significant bias reduction and efficiency gains compared to Ichimura (1993) and Klein and Spady (1993). Interestingly, the proposed estimator still achieves significant bias reduction and efficiency gains even if the parametric start is not correct.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. For an exception see Ruud (1983).

  2. Furthermore, fully nonparametric methods suffer from the so-called ‘curse of dimensionality’, i.e., as the number of regressors increases, estimation precision decreases rapidly. Single-index models, which are explained below, reduce this dimensionality problem to a scalar.

  3. This article will be concerned with a linear index as in (2) instead of a general form \(m(v;\beta )\) where m is a scalar valued function. Ichimura (1993) has a general analysis of single-index models and Ichimura and Lee (1991) extend that general framework to multiple-index models.

  4. The fact that \(\beta \) is unknown and has to replaced with an estimator does not change this result as long as the estimator of \(\beta \) is \(\sqrt{n}\)-consistent (see Horowitz 1998, pp. 21–22).

  5. Single-index models, unlike models which assume independence of u and x, allow for limited forms of heteroscedasticity (general but known form and unknown form if it depends only on the index). This limitation can be serious since, for instance, the assumption that \(Pr(y=1|v)\) depends only on the index does not allow a certain form of heteroscedasticity (random coefficients model) which may be important in applications.

  6. The maximum score estimator of Manski (1975) and its smoothed version by Horowitz (1992) make zero conditional median assumption (median\((u|x)=0\)) which identifies the intercept term (zero conditional mean assumption is not sufficient for identification in a binary response model, see Manski (1988, p. 731); Horowitz 1998, section 3.2). These models allow for different forms of heteroscedasticity including random coefficients models although at the cost of a rate of convergence slower than \(\sqrt{n}\). In fact under this conditional median independence assumption \(\sqrt{n}\) consistency is not possible, see Pagan and Ullah 1999, p. 278 and Horowitz (1993).

  7. An alternative scale normalization would be \(||\beta ||=1\) where \(||\cdot ||\) is the Euclidean norm.

  8. Since there is no location restriction on u in single-index models and thus the intercept term is not identified, (4) is actually a nonparametric estimator of the distribution of \(u+\beta _0\) where \(\beta _0\) is the intercept. Also, Klein and Spady (1993) have additive terms in the numerator and denominator of (4) to control the rate at which numerator and denominator tend to zero. Ichimura (1993) utilizes indicator variables to trim those observations which correspond to small density values. A similar trimming function and an indicator variable enter multiplicatively to objective functions in (3) and (5) respectively. In this presentation those terms are ignored for simplicity.

  9. A paper by Chen (2000) builds on Klein and Spady (1993) and shows that the intercept can be consistently estimated and there are possible efficiency gains in the estimation of slope coefficients although at the cost of stronger assumptions: a location restriction in the form of conditional symmetry, i.e., the density of u conditional on the regressors is symmetric around zero and an index restriction stronger than the one in Klein and Spady (1993), namely, the conditional density depends on x only through the squared index.

  10. For this efficiency result, the weight function should depend on x only through the index as in binary-choice models where \(Var(y|x)=Var(y|xb)\).

  11. Ichimura (1993) treats \(W(\cdot )\) as a known function.

  12. Glad (1998) generalizes this to local \(p\text {th}\) order polynomial estimator which reduces to the Nadaraya–Watson for \(p=0\).

  13. While the primary interest in single-index models is generally about the parameter estimates and marginal effects, there are cases—such as economic discrimination analyses—where the focus is on the estimated probabilities. For example, Blinder–Oaxaca-type decomposition studies interested in differences between groups (race, gender etc.) regarding a binary outcome such as computer ownership, teenage pregnancy, or school attendance, are primarily interested in differences in average estimated probabilities between groups (Fairlie 2005; Seah et al. 2017). Correctly estimated probabilities are critical to the accuracy of the gap being investigated.

  14. For this to happen, the parametric start \(G(\cdot )\) should be a constant function. In density estimation, the Hjort and Glad (1995) estimator nests the usual Kernel density estimator if the parametric start is the uniform density over the space. But here a distribution function which is constant and which satisfies the continuity (of the index) assumption can not be found. One obvious example of such a constant distribution function, which does not satisfy the continuity assumption, is the unit point mass at a when \(z=a\) a.s.:

    $$\begin{aligned} G(z) = \left\{ \begin{array}{ll} 0 &{} \hbox { if}\ z<a \\ 1 &{} \text {if }{z\geqslant a.} \end{array} \right. \end{aligned}$$

    Here not only is the continuity assumption not satisfied but also \(\beta \) is not identified with this \(G(\cdot )\).

  15. In actual estimations, trimming has very little effect on the performance of the estimators. Klein and Spady paper reports simulation results from untrimmed estimator: “...the estimate obtained without any trimming performed quite similar to that under the trimming that we employed. Accordingly, we report results for the semiparametric estimator obtained without probability or likelihood trimming (Klein and Spady 1993, p. 406).”

  16. There are two cases to consider: when the link function F is monotonic in the index and when it is not. If the underlying distribution is heteroscedastic, for instance, F need not be monotonic in the index.

  17. Klein and Spady (1993) obtain a similar uniform convergence rate.

  18. Even though it is not in the simulation design, at this point, it is instructive to digress to discuss maximum likelihood estimation of misspecified binary choice models. Ruud (1983) showed that when the explanatory variables are multivariate normal, maximum likelihood estimates of slope coefficients can still be estimated consistently up to scale even when the distributional assumption is not correct. More generally the result holds when

    $$\begin{aligned} E({\tilde{x}}|xb=t)=c_0+c_1t \end{aligned}$$
    (10)

    where \(c_0\) and \(c_1\) are constants, which is satisfied when the explanatory variables are multivariate normal. Note that this consistency result does not hold for the probability estimates.

    Another interesting implication of (10) is that when it holds, the semiparametric efficiency bound is the same as the parametric (Cramér–Rao) efficiency bound (see Cosslett 1987).

  19. \(L_1\) norm is \(\int |F-{\hat{F}}|dx\) while \(L_2\) norm is \(\int (F-{\hat{F}})^2dx\) where F is the true unknown link function and \({\hat{F}}\) is an estimate of the unknown link function.

  20. We also run the simulations with \(h=n^{-1/6.02}\) as in Klein and Spady (1993) and the results changed immaterially. In practical applications, the smoothing parameter should be chosen along with the parameter estimates by mazimizing the quasi likelihood function.

  21. In general, \(L_p\)-norm is defined as \((E(|x|^p))^{1/p}\).

  22. The derivation below assumes that \((y_i,x_i)\) is absolutely continuous. Obviously in binary-choice models this is not true as \(y_i\) is a Bernoulli random variable. We will keep the absolute continuity interpretation as it is more general and give the necessary changes here for the binary-response case. Using a notation similar to Klein and Spady (1993), let \(g_x\) be the unconditional density for x and \(g_{x|y}\) be the density for x conditional on y for \(y=0,1\). We have the following series of equalities

    $$\begin{aligned} E(y|x)=F(x)=Pr(y=1|x)=\frac{Pr(y=1)g_{x|1}}{g_x}=\frac{g_{1x}}{g_x}=G(x)\frac{g_{1x}/G(x)}{g_x} =G(x)g(x) \end{aligned}$$

    where \(g(x)=(g_{1x}/G(x))/g_x\). Thus \({\hat{F}}_1=G(x){\hat{g}}(x){\hat{g}}_x\) where \({\hat{g}}(x){\hat{g}}_x=((n-1)h)^{-1}\sum _{j\ne i}\frac{y_j}{G(x_j)}K((x-x_j)/h).\) So there is no change from (11) to (18). In equation (18), we can take an iterated expectation to get \(E_X\left[ 1/G(x_j)K((z-x_j)/h)Pr(y_j=1|x)\right] =E_X\left[ K((z-x_j)/h)g(x)\right] =\int K((z-x)/h)g(x)g_x dx\). And now 1 / h times this last term would replace equation (19) and we can apply Taylor expansion to \(\psi (x)=g(x)g_x\).

References

  • Andrews DWK (1987) Consistency in nonlinear econometric models: a generic uniform law of large numbers. Econometrica 55:1465–1471

    Article  MathSciNet  MATH  Google Scholar 

  • Bierens HJ (1987a) Uniform consistency of Kernel estimators of a regression function under generalized conditions. J Am Stat Assoc 78:699–707

    Article  MathSciNet  MATH  Google Scholar 

  • Bierens HJ (1987b) Kernel estimators of regression functions. In: Bewley TF (ed) Advances in econometrics, vol 1. Cambridge University Press, Cambridge

    Google Scholar 

  • Chen S (2000) Efficient estimation of binary choice models under symmetry. J Econ 96:183–199

    Article  MathSciNet  MATH  Google Scholar 

  • Cosslett SR (1987) Efficiency bounds for distribution-free estimators of the binary choice and censored regression models. Econometrica 55:559–586

    Article  MathSciNet  MATH  Google Scholar 

  • Diiro G, Sam AG (2015) Agricultural technology adoption and nonfarm earnings in Uganda: a semiparametric analysis. J Dev Areas 49(2):145–62

    Article  Google Scholar 

  • Diiro G, Ker AP, Sam AG (2015) The Role of gender in fertiliser adoption in Uganda. Afr J Agric Resour Econ 10(2):117–30

    Google Scholar 

  • Fairlie RW (2005) An extension of the Blinder–Oaxaca decomposition technique to logit and probit models. J Econ Soc Meas 30(4):305–316

    Google Scholar 

  • Fristedt B, Gray L (1997) A modern approach to probability theory. Birkhäuser, Basel

    Book  MATH  Google Scholar 

  • Frölich M, Huber M, Wiesenfarth M (2017) The finite sample performance of semi-and nonparametric estimators for treatment effects and policy evaluation. Comput Stat Data Anal 115:91–102

    Article  Google Scholar 

  • Glad IK (1998) Parametrically guided nonparametric regression. Scand J Stat 25:649–668

    Article  MATH  Google Scholar 

  • Hjort NL, Glad IK (1995) Nonparametric density estimation with a parametric start. Ann Stat 23:882–904

    Article  MathSciNet  MATH  Google Scholar 

  • Horowitz JL (1992) A smoothed maximum score estimator for the binary response model. Econometrica 60:505–531

    Article  MathSciNet  MATH  Google Scholar 

  • Horowitz JL (1993) Semiparametric and nonparametric estimation of quantal response models. In: Maddala GS, Rao CR, Vinod HD (eds) Handbook of statistics, vol 11. North-Holland, Amsterdam

    Google Scholar 

  • Horowitz JL (1998) Semiparametric methods in econometrics. Springer, Berlin

    Book  MATH  Google Scholar 

  • Horowitz JL, Härdle W (1996) Direct semiparametric estimation of single-index models with discrete covariates. J Am Stat Assoc 91(436):1632–1640

    Article  MathSciNet  MATH  Google Scholar 

  • Ichimura H (1993) Semiparametric least squares (SLS) and weighted SLS estimation of single-index models. J Econ 58:71–120

    Article  MathSciNet  MATH  Google Scholar 

  • Ichimura H, Lee LF (1991) Semiparametric least squares estimation of multiple index models: single equation estimation. In: Barnett WA, Powell J, Tauchen G (eds) Nonparametric and semiparametric methods in econometrics and statistics. Cambridge University Press, Cambridge

    Google Scholar 

  • Jones MC, Signorini DF (1997) A comparison of higher-order bias kernel density estimators. J Am Stat Assoc 92:1063–1073

    Article  MathSciNet  MATH  Google Scholar 

  • Klein RW, Spady RH (1993) An efficient semiparametric estimator for binary response models. Econometrica 61:387–421

    Article  MathSciNet  MATH  Google Scholar 

  • Manski CF (1975) The maximum score estimation of the stochastic utility model of choice. J Econ 3:205–228

    Article  MathSciNet  MATH  Google Scholar 

  • Manski CF (1988) Identification of binary response models. J Am Stat Assoc 83:729–738

    Article  MathSciNet  MATH  Google Scholar 

  • Mishra K, Sam AG, Miranda MJ (2017) You are approved! Insured loans improve credit access and technology adoption of ghanaian farmers. Working paper, The Ohio State University

  • Pagan A, Ullah A (1999) Nonparametric econometrics. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Powell JL (1994) Estimation of semiparametric models. In: Engle RF, McFadden DL (eds) Handbook of econometrics, vol 4. North-Holland, Amsterdam

    Google Scholar 

  • Ruud PA (1983) Sufficient conditions for the consistency of maximum likelihood estimation despite misspecification of distribution in multinomial discrete choice models. Econometrica 51:225–228

    Article  MathSciNet  MATH  Google Scholar 

  • Sam AG, Jiang GJ (2009) Nonparametric estimation of the short rate diffusion process from a panel of yields. J Financ Quant Anal 44:1197–1230

    Article  Google Scholar 

  • Sam AG, Ker AP (2006) Nonparametric regression under alternative data environments. Stat Probab Lett 76(10):1037–1046

    Article  MathSciNet  MATH  Google Scholar 

  • Schuster E, Yakowitz S (1979) Contributions to the theory of nonparametric regression with application to system identification. Ann Stat 7:139–149

    Article  MathSciNet  MATH  Google Scholar 

  • Seah KY, Fesselmeyer E, Le K (2017) Estimating and decomposing changes in the whiteblack homeownership gap from 2005 to 2011. Urban Stud 54(1):119–36

    Article  Google Scholar 

  • Van Birke MS, Bellegem Van, Keilegom I (2017) Semi-parametric estimation in a single-index model with endogenous variables. Scand J Stat 44(1):168–91

    Article  MathSciNet  MATH  Google Scholar 

  • Yatchew A, Griliches Z (1985) Specification error in probit models. Rev Econ Stat 67:134–139

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alan P. Ker.

Appendix

Appendix

We make the following assumptions:

Assumption 1:

Observed sample \((x_i,y_i)\), \(i=1,\ldots ,n\) is i.i.d.

Assumption 2:

\(B\subset {\mathbb {R}}^q\) is compact and the true parameter vector \(b_0\) is in the interior of B.

Assumption 3:

\(A_x\) is compact.

Assumption 4:

K(s) is a density. Furthermore \(\int sK(s)ds=0\), \(K(s)=0\) for \(s<-1\) and \(s>1\), and its second derivative satisfies a Lipschitz condition.

Assumption 5:

Parametric start G is uniformly bounded over x, b and \(G(xb)\ne 0\; \forall x,b\in A_x\times B\).

Assumption 6:

\(\int |\phi (t)|dt < \infty \) where \(\phi (t)\) is the characteristic function of K.

Assumption 7:

There exist \({\underline{F}}\) and \({\overline{F}}\) that do not depend on x such that \(0<{\underline{F}}\leqslant F(xb) \leqslant {\overline{F}}~<~1 \quad \forall b \in B\).

Proof of lemma 1

Our proof of lemma 1 follows closely Bierens (1987a, b) and Pagan and Ullah (1999, pp. 36–39). Note that \({\hat{F}}\) in (9) can be written as \({\hat{F}}_{1} / {\hat{F}}_{2}\) where

$$\begin{aligned} {\hat{F}}_{1}&= \frac{1}{(n-1)h} \sum _{j\ne i}y_j{\mathbf {1}}_{[x_j\in A_{nx}]}\bigg \{\frac{G(x_{i}b)}{G(x_{j}b)} \bigg \} K\left( \frac{x_{i}b-x_{j}b}{h}\right) \\ {\hat{F}}_{2}&= \frac{1}{(n-1)h} \sum _{j\ne i}{\mathbf {1}}_{[x_j\in A_{nx}]}K\left( \frac{x_{i}b-x_{j}b}{h}\right) . \end{aligned}$$

Since \({\hat{F}}_{2}\) is \({\hat{F}}_{1}\) with \(y_j=1\) and \(G(\cdot )\) a constant function, we will only show uniform convergence of \({\hat{F}}_{1}\).Footnote 22 Now observe that

$$\begin{aligned} E(y|x)=F(x)=\frac{G(x)\int y \frac{1}{G(x)}f(y,x)dy}{\int f(y,x)dy}. \end{aligned}$$

Let \(g(x)=\int y \frac{1}{G(x)} f(y,x)dy / \int f(y,x)dy\) and \(h(x)=\int f(y,x)dy\). So \(F(x)=G(x)g(x)\) and \(G(x)g(x)h(x)=G(x)\int y\frac{1}{G(x)}f(y,x)dy\). Thus \({\hat{F}}_{1}=G(x){\hat{g}}(x){\hat{h}}(x)\). Notice that by assumption 5, G is uniformly bounded and we have \(\sup _x |G(x)|=O(1)\). In fact a plausible start is a distribution function in which case \(\sup _x |G(x)|=1\). Hence it suffices to show \(\sup _x |{\hat{g}}(x){\hat{h}}(x) - g(x)h(x)|\rightarrow 0\). So we have

$$\begin{aligned}&\sup _x |{\hat{g}}(x){\hat{h}}(x) - g(x)h(x)|\leqslant \sup _x |{\hat{g}}(x){\hat{h}}(x) - E{\hat{g}}(x){\hat{h}}(x)|\nonumber \\&\quad + \sup _x |E{\hat{g}}(x){\hat{h}}(x)-g(x)h(x)|. \end{aligned}$$
(11)

Like Ichimura (1993), we will refer to the second term of the right-hand side as bias term and show that it converges to 0 at the rate \(h^2\). But first notice that

$$\begin{aligned} {\hat{g}}(x){\hat{h}}(x)=\frac{1}{nh}\sum _{j=1}^n\frac{y_j}{G(x_j)}K\left( \frac{x-x_j}{h}\right) . \end{aligned}$$
(12)

From the inversion formula (see Fristedt et al. 1997, p. 231) and by assumption 6 we have

$$\begin{aligned} K(a)=\frac{1}{2\pi }\int \exp (-ita)\phi (t)dt \end{aligned}$$
(13)

where \(\phi (t)\) is the characteristic function of K and \(i^2=-1\). Using (12) and (13) and letting \(s=t/h\) we get

$$\begin{aligned} {\hat{g}}(x){\hat{h}}(x)=\frac{1}{2\pi }\int \bigg \{ \frac{1}{n}\sum _{j=1}^{n}\frac{y_j}{G(x_j)} \exp (itx_j) \bigg \} \exp (-itx) \phi (ht)dt. \end{aligned}$$
(14)

From (14) we get

$$\begin{aligned} E{\hat{g}}(x){\hat{h}}(x)=\frac{1}{2\pi }\int E\bigg [ \frac{y_j}{G(x_j)}\exp (itx_j) \bigg ] \exp (-itx) \phi (ht)dt. \end{aligned}$$
(15)

From (14) and (15) and noting that \(|\exp (-itx)|=1\)

$$\begin{aligned}&|{\hat{g}}(x){\hat{h}}(x)- E{\hat{g}}(x){\hat{h}}(x)| \leqslant \frac{1}{2\pi } \int \left| \frac{1}{n} \sum _{j=1}^n \left\{ \frac{y_j}{G(x_j)}\exp (itx_j)\right. \right. \\&\quad \left. \left. - E\left[ \frac{y_j}{G(x_j)}\exp (itx_j)\right] \right\} \right| |\phi (ht)|dt. \end{aligned}$$

So

$$\begin{aligned}&\sup _{x}|{\hat{g}}(x){\hat{h}}(x)- E{\hat{g}}(x){\hat{h}}(x)| \\&\quad \leqslant \frac{1}{2\pi } \int \left| \frac{1}{n} \sum _{j=1}^n \left\{ \frac{y_j}{G(x_j)}\exp (itx_j) - E\left[ \frac{y_j}{G(x_j)}\exp (itx_j)\right] \right\} \right| |\phi (ht)|dt. \end{aligned}$$

Using \(\exp (itx_j)=\cos (tx_j)+i\sin (txj)\) we can write

$$\begin{aligned} \begin{aligned}&E\left| \frac{1}{n}\sum _{j=1}^n\left\{ \frac{y_j}{G(x_j)}\exp (itx_j) -E\left[ \frac{y_j}{G(x_j)}\exp (itx_j)\right] \right\} \right| \\&\quad =E\Bigg | \underset{A}{ \underbrace{ \frac{1}{n}\sum _{j=1}^n\left\{ \frac{y_j}{G(x_j)}\cos tx_j -E\left[ \frac{y_j}{G(x_j)}\cos tx_j\right] \right\} }} \\&\qquad +\,i \underset{B}{ \underbrace{ \frac{1}{n}\sum _{j=1}^n\left\{ \frac{y_j}{G(x_j)}\sin tx_j -E\left[ \frac{y_j}{G(x_j)}\sin tx_j\right] \right\} }} \Bigg | \end{aligned} \end{aligned}$$
(16)

Note that we can write \(|A+iB|=(A^2+B^2)^{1/2}\) and so \(E|A+iB|=E(A^2+B^2)^{1/2}\leqslant (EA^2+EB^2)^{1/2}=(Var(A)+Var(B))^{1/2}\) where the inequality comes from Jensen’s inequality and by construction \(EA=0\) and \(EB=0\). So (16) is

$$\begin{aligned}&\leqslant \left\{ Var\left[ \frac{1}{n}\sum _{j=1}^n\left\{ \frac{y_j}{G(x_j)}\cos tx_j -E\left[ \frac{y_j}{G(x_j)}\cos tx_j\right] \right\} \right] \right. \\&\qquad \left. + Var\left[ \frac{1}{n}\sum _{j=1}^n\left\{ \frac{y_j}{G(x_j)}\sin tx_j -E\left[ \frac{y_j}{G(x_j)}\sin tx_j\right] \right\} \right] \right\} ^{1/2} \\&\quad = \left\{ Var\left[ \frac{1}{n}\sum _{j=1}^n\left\{ \frac{y_j}{G(x_j)}\cos tx_j \right\} \right] + Var\left[ \frac{1}{n}\sum _{j=1}^n\left\{ \frac{y_j}{G(x_j)}\sin tx_j \right\} \right] \right\} ^{1/2}\\&\quad = \left\{ \frac{1}{n} \left( Var\left[ \frac{y_j}{G(x_j)}\cos tx_j \right] + Var\left[ \frac{y_j}{G(x_j)}\sin tx_j \right] \right) \right\} ^{1/2}. \end{aligned}$$

Note that \(VarX\leqslant EX^2\) so

$$\begin{aligned}&\leqslant \left\{ \frac{1}{n} \left( E\left[ \left( \frac{y_j}{G(x_j)}\right) ^2 \cos ^2 tx_j \right] + E\left[ \left( \frac{y_j}{G(x_j)}\right) ^2 \sin ^2 tx_j \right] \right) \right\} ^{1/2} \\&= \frac{1}{\sqrt{n}} \left\{ E\left[ \left( \frac{y_j}{G(x_j)} \right) ^2 \right] \right\} ^{1/2} \end{aligned}$$

noting that \(\cos ^2 tx_j+\sin ^2 tx_j = 1\). So

$$\begin{aligned} E\sup _x |{\hat{g}}(x){\hat{h}}(x)- E{\hat{g}}(x){\hat{h}}(x)|&\leqslant \frac{1}{2\pi }\frac{1}{\sqrt{n}} \left\{ E\left[ \left( \frac{y_j}{G(x_j)}\right) ^2\right] \right\} ^{1/2} \int |\phi (ht)|dt \nonumber \\&= \frac{1}{2\pi }\frac{1}{h\sqrt{n}} \left\{ E\left[ \left( \frac{y_j}{G(x_j)}\right) ^2\right] \right\} ^{1/2} \int |\phi (s)|ds \end{aligned}$$
(17)

after a change of variables (\(s=ht\)) and the last term goes to zero as \(h\sqrt{n}\rightarrow \infty \). Finally using Markov’s inequality with (17) we get

$$\begin{aligned} Pr\left( \sup _x |{\hat{g}}(x){\hat{h}}(x)- E{\hat{g}}(x){\hat{h}}(x)| > \epsilon \right) \rightarrow 0 \quad \text {as} \quad n\rightarrow \infty . \end{aligned}$$

Now for \(|E{\hat{g}}(x){\hat{h}}(x)- g(x)h(x)|\) note that

$$\begin{aligned} E{\hat{g}}(z){\hat{h}}(z)&= E\left[ \frac{1}{nh}\sum _{j=1}^n y_j\frac{1}{G(x_j)} K\left( \frac{z-x_j}{h}\right) \right] \nonumber \\&= \frac{1}{h}E\left[ y_j\frac{1}{G(x_j)} K\left( \frac{z-x_j}{h}\right) \right] \end{aligned}$$
(18)
$$\begin{aligned}&= \frac{1}{h} \int K\left( \frac{z-x}{h}\right) \underbrace{\int y \frac{1}{G(x)}f(y,x)dy}_{g(x)h(x)} dx. \end{aligned}$$
(19)

Now let \(\psi (x)=g(x)h(x)\) and \(s=(z-x)/h\) for the Taylor expansion

$$\begin{aligned} \psi (x)=\psi (z-sh)=\psi (z)-hs\psi '(z)+\frac{1}{2}h^2 s^2 \psi ''(z) + o(h^2). \end{aligned}$$

So

$$\begin{aligned} \frac{1}{h} \int \left( \psi (z)-hs\psi '(z)+\frac{1}{2}h^2 s^2 \psi ''(z) \right) K(s)hds =\psi (z)+\frac{1}{2}h^2 \psi ''(z)\int s^2 K(s)ds. \end{aligned}$$

Thus we can write

$$\begin{aligned} \sup _x\left| E{\hat{\psi }}(x)-\psi (x)\right|&= \sup _x\left| \psi (x)+\frac{1}{2}h^2 \psi ''(x) \int s^2 K(s)ds - \psi (x) \right| \\&\leqslant \frac{1}{2}h^2 \sup _x|\psi ''(x)|\int |s^2 K(s)|ds \end{aligned}$$

so the last term goes to zero at the rate \(h^2\). This completes the proof of lemma 1.

Proof of theorem 1

Let z represent a point on the support of \(z_i\). Given the binary nature of the dependent variable \(y_i\) and the fact that \(P(y_i=1)=F(z_i)\), we have \(E(y_i|z_i)=F(z_i)\). Hence: \(y_i=F(z_i)+\epsilon _i\) where \(\epsilon \) is an i.i.d error term such that \(E(\epsilon _i|z_i=0)\) and \(Var(\epsilon _i|z_i)=F(z_i)(1-F(z_i))~~\forall i.\) In deriving the asymptotic properties of the proposed link function estimator (PGSIM), we require, in addition to Assumptions 1 and 5, the following assumptions:

Assumption 8:

The density function of the index f(z) with bounded support Z, and the unknown link function F(z) \(\in {{{\mathcal {C}}}}^{2}(\Theta )\) with finite second derivatives and \(f(z)\ne 0\) in \(\Theta \), the neighborhood of point z.

Assumption 9:

The Kernel function K(s) is bounded, real-valued, with the following characteristics: (i) \(\int K(s)ds=1\), (ii) K(s) is symmetric about 0, (iii) \(\int s^{2}K(s)ds <\infty \), (iv) \(\vert s\vert K(\vert s \vert )\rightarrow 0\) as \(\vert s \vert \rightarrow \infty \), (v) \(\int K^{2}(s)ds \le \infty .\)

Assumption 10:

\(h\rightarrow 0\) and \(nh\rightarrow \infty \).

Assumption 11:

\(E\left| \frac{G(z)}{G(z_i)}\right| ^{2+\delta },~~E\vert \epsilon _{i}\vert ^{2+\delta }\), and \(\int \vert K(\omega )\vert ^{2+\delta }\) are finite for some \(\delta > 0.\)

The nonparametric Kernel estimator of the link function (Ichimura 1993; Klein and Spady 1993) is:

$$\begin{aligned} {\tilde{F}}(z)=\frac{\sum _iy_{i}K_{h}(z_{i}-z)}{\sum _iK_{h}(z_{i}-z)} \end{aligned}$$
(20)

Denoting \(\mu _{2}=\int s^{2}K(s)ds\) and \( \text{ R }(K)= \int K^{2}(s) dz\), standard properties of \({\tilde{F}}(z)\) are:

$$\begin{aligned} E({\tilde{F}}(x)-F(z))= & {} \frac{1}{2}\mu _{2}h^{2}\left( F^{\prime \prime }(z)+2F^{\prime }(z)\frac{f^{\prime }(z)}{f(z)}\right) +o_p(h^{2}) , \end{aligned}$$
(21)
$$\begin{aligned} \text{ V }ar({\tilde{F}}(z))= & {} \frac{\sigma ^{2}(z) R(K)}{(nh)f(z)}+O_p(h/n) \end{aligned}$$
(22)

where \(\sigma ^2(z)=F(z)(1-F(z))\).

Now consider our parametrically guided nonparametric estimator of the link function:

$$\begin{aligned} {\hat{F}}(z)= & {} \frac{\sum _{i=1}^{n}y_i\left[ \frac{G(z)}{G(z_i)}\right] K_{h}(z_i-z)}{\sum _{i=1}^{n}K_{h}(z_{i}-z)}\end{aligned}$$
(23)
$$\begin{aligned}= & {} \frac{n^{-1}\sum _{i=1}^{n}y_i\left[ \frac{G(z)}{G(z_i)}\right] K_{h}(z_i-z)}{{\hat{f}}(z)} \end{aligned}$$
(24)

First, we derive the bias and variance of the proposed estimator. We have

$$\begin{aligned} \left( \hat{F(z)}-F(z)\right) {\hat{f}}(z)= & {} n^{-1}\sum _i K_h(z_i-z)\left[ \frac{G(z)}{G(z_i)}\right] y_i - n^{-1}\sum _i K_h(z_i-z)F(z)\\= & {} n^{-1}\sum _i K_h(z_i-z)\left( \left[ \frac{G(z)}{G(z_i)}\right] y_i-F(z)\right) \\= & {} n^{-1}\sum _i K_h(z_i-z)\left( G(z)\frac{F(z_i)+\epsilon _i}{G(z_i)}-r(z)G(z)\right) \\= & {} n^{-1}G(z)\sum _i K_h(z_i-z)\left( r(z_i)-r(z)\right) ~~~~~~~~\left( \hbox { }\ C_n\right) \\+ & {} n^{-1}G(z)\sum _i K_h(z_i-z)\frac{\epsilon _i}{G(z_i)}~~~~~~~~~~~~~~~~~~~~~~\left( \hbox { }\ D_n\right) \end{aligned}$$

Per assumption 1, we have,

$$\begin{aligned} E(C_n)= & {} G(z)\int K_{h}(z_{1}-z)\left( r(z_1)-r(z)\right) f(z_{1})dz_{1} \\= & {} G(z)\int K(s)\left( r(z+hs)-r(z)\right) f(z+hs)ds \\&\text{ after } \text{ a } \text{ change } \text{ of } \text{ variable }\\= & {} \frac{h^{2}}{2}\left( G(z)f(z)r^{\prime \prime }(z)+2G(z)f{\prime }(z)r^{\prime }(z)\right) \mu _{2}(K)+o_p(h^{2}). \end{aligned}$$

It can be easily seen that \(E(D_n)=0\). Using the fact that \({\hat{f}}(z)=f(z)+o_p(1)\), we obtain

$$\begin{aligned} E({\hat{F}}(z)-F(z))= & {} \frac{1}{2}\mu _{2}h^{2}\left( r^{\prime \prime }(z)+2r^{\prime }(z)\frac{f^{\prime }(z)}{f(z)}\right) G(z)+o_p(h^{2}) \end{aligned}$$
(25)

Turning to the variance of the estimator, we have

$$\begin{aligned} Var(C_n)= & {} Var\left[ n^{-1}G(z)\sum _i^{n}K_{h}(z_{i}-z)(r(z_i)-r(z))\right] \\= & {} (n)^{-2}G^2(z)\left( E\left[ \sum _i^{n}K^2_{h}(z_{i}-z)(r(z_{i}-r(z))^2\right] \right. \\&\left. - \left[ E\sum _i^{n}K_{h}(z_{i}-z)(r(z_{i})-r(z))\right] ^{2}\right) \\= & {} (nh)^{-1} G^2(z) \int K^{2}(s)(r(z+hs)-r(z))^2 f(z+hs)ds \\&+\frac{n(n-1)}{n^{2}} G^2(z)\left[ \int K(s)(r(z+hs)-r(z))f(z+hs)ds\right] ^{2} \\&- G^2(z)\left[ \int K(s)(r(z+hs)-r(z))f(z+hs)ds\right] ^{2}\\= & {} (nh)^{-1} \int K^{2}(s) [hs~r^{\prime }(z)]^2 f(z)ds + O(n^{-1})+ o_p((nh)^{-1}) \\ \end{aligned}$$

Hence \(Var(C_n)=o_p((nh)^{-1})\).

$$\begin{aligned} Var(D_n)= & {} E(Var_{z_i}(D_n))~~\hbox {since} E_{z_i}(D_n)=0 \hbox {, where} Var_{z_{i}}\,\\&\hbox {denotes the conditional variance of} D_n\\= & {} E\left[ n^{-2}G(z)^2 \sum _i K^2_h(z_i-z)G^{-2}(z_{i})\sigma ^2(z_i)\right] \\= & {} (nh)^{-1}\sigma ^2(z)G(z)^2\int K^{2}(s)ds[ G(z)^{-2}(z)f(z) ]+o((nh)^{-1})\\= & {} (nh)^{-1}\sigma ^2(z)R(K)f(z)+o_p((nh)^{-1}) \end{aligned}$$

Finally,

$$\begin{aligned} Cov(C_n,D_n)= & {} E(C_nD_n)~~\hbox { since}\ E(D_n)=0\\= & {} (n)^{-2}G^2(z)E \left[ \sum _i K_h^{2}(z_i-z)(r(z_{i})-r(z))^2\epsilon (z_i)^2\right] \\&+\frac{n(n-1)}{(n)^{2}}G^2(z)E\left[ K_h(z_{1}-z)(r(z_{1}-r(z))\right] E\left[ K_h(z_{1}-z)\epsilon (z_1)\right] \\= & {} (nh)^{-1} G^2(z)\sigma ^2(z)h^2r^{\prime }(z)f(z) \int s^2 K^{2}(s) ds\\= & {} o_p((nh)^{-1}) \end{aligned}$$

Piecing the results together, we have

$$\begin{aligned} \text{ V }ar({\hat{F}}(z))= & {} \frac{\sigma ^{2}(z) R(K)}{(nh)f(z)}+o_p(nh)^{-1}. \end{aligned}$$

Note that the variance function is the same as the variance of the Klein and Spady or Ichimura estimator.

Next, we show that the proposed semiparametric link function estimator \({\hat{F}}(z)\) has a limiting normal distribution

$$\begin{aligned} \sqrt{nh}({\hat{F}}(z)-F(z)) \rightarrow {\mathcal {N}}(B(h),\Sigma ) \end{aligned}$$
(26)

where \(B(h)=\frac{1}{2}\mu _{2}h^{2}\left( r^{\prime \prime }(z)+2r^{\prime }(z)\frac{f^{\prime }(z)}{f(z)}\right) G(z)\) and \(\Sigma =\frac{\sigma ^{2}(z)}{f(z)}R(K)\).

From the results above, it can be seen that

$$\begin{aligned} C_{n}= & {} f(z)B(h)+o_{p}(h^{2}); \end{aligned}$$

We have also shown that \(E(D_{n})=0\) and \(Var(D_{n})=(nh)^{-1}\left( \sigma ^{2}(z)R(K)f(z)+o(1)\right) .\) \(D_{n}\) is a triangular array of i.i.d. random variables; thus, under assumption A6,

$$\begin{aligned}(nh)^{-\delta /2}E\left| \frac{G(z)}{G(z_i)}\right| ^{2+\delta }E\left| \epsilon _i\right| ^{2+\delta }E\left| K_h(z_i-z)\right| ^{2+\delta }h^{-1}\rightarrow 0 \text{ as } n \rightarrow \infty . \end{aligned}$$

Hence we can apply Liapounov’s central limit theorem to obtain \(\sqrt{nh} (D_{n}) \rightarrow {\mathcal {N}}(0,f^{2}(x)\Sigma )\). Since \(plim {\hat{f}}(z)=f(z)\), it also follows that

$$\begin{aligned} \sqrt{nh}({\hat{F}}(z)-F(z)-B(h))= \sqrt{nh}\frac{D_{n}}{f(x)}+o_{p}(1) \rightarrow {\mathcal {N}}(0,\Sigma ). \end{aligned}$$
(27)

Proof of theorem 2

Note that

$$\begin{aligned} \sup _b|{\hat{Q}}_n(b) - Q_0(b) | \leqslant \sup _b |{\hat{Q}}_n(b) - Q_n(b) |+ \sup _b |Q_n(b) - Q_0(b) | \end{aligned}$$

where

$$\begin{aligned} {\hat{Q}}_n(b)&= \frac{1}{n}\sum _{i=1}^{n}{\mathbf {1}}_{[x_i\in A_x]}(y_{i}\log [{\hat{F}}(x_i b)]+(1-y_i)\log [1-{\hat{F}}(x_i b)]), \\ Q_n(b)&= \frac{1}{n}\sum _{i=1}^{n}{\mathbf {1}}_{[x_i\in A_x]}(y_{i}\log [F(x_i b)]+(1-y_i)\log [1-F(x_i b)]), \\ Q_0(b)&= \frac{1}{n}\sum _{i=1}^{n} E\left[ {\mathbf {1}}_{[x_i\in A_x]}(y_{i}\log [F(x_i b)]+(1-y_i)\log [1-F(x_i b)]) \right] . \\ \end{aligned}$$

Let \({\hat{Q}}_{1n}=n^{-1}\sum _{i=1}^n {\mathbf {1}}_{[x_i\in A_x]}y_i \log [{\hat{F}}(x_i b)]\) and similarly for \(Q_{1n}\). Let \({\hat{F}}_i\equiv {\hat{F}}(x_ib)\) and similarly for \(F_i\). \({\hat{Q}}_{1n}\) can be viewed as a function of \({\hat{F}}_i\), so from a Taylor expansion of \({\hat{Q}}_{1n}\) about \(F_i\) we get

$$\begin{aligned} |{\hat{Q}}_{1n}-Q_{1n}|&=\left| \frac{1}{n}\sum _{i=1}^{n}{\mathbf {1}}_{[x_i\in A_x]}y_{i}\log [F_i]\right. \\&\quad \left. + \frac{1}{n}\sum _{i=1}^{n}{\mathbf {1}}_{[x_i\in A_x]}y_{i}\frac{1}{\tilde{F_i}}({\hat{F}}_i-F_i)- \frac{1}{n}\sum _{i=1}^{n}{\mathbf {1}}_{[x_i\in A_x]}y_{i}\log [F_i] \right| \\&\leqslant \frac{1}{n}\sum _{i=1}^{n}{\mathbf {1}}_{[x_i\in A_x]}y_{i}\frac{1}{\tilde{F_i}}|{\hat{F}}_i-F_i| \end{aligned}$$

where \({\tilde{F}}_i\) is between \({\hat{F}}_i\) and \(F_i\). So we have

$$\begin{aligned} \sup _b |{\hat{Q}}_{1n}-Q_{1n}| \leqslant \sup _{i,b}|q_i| \frac{1}{n}\sum _{i=1}^{n}\sup _b|{\hat{F}}_i-F_i| \end{aligned}$$

where \(q_i={\mathbf {1}}_{[x_i\in A_x]}y_i / {\tilde{F}}_i\). Note that \(\sup _{i,b}|q_i|=O(1)\) and from lemma 1 \(\sup _b|{\hat{F}}_i-F_i|\rightarrow 0\) in probability so \(\sup _b |{\hat{Q}}_{1n}-Q_{1n}|=o_p(1)\). A similar result can be obtained for \(y_i=0\) part of the likelihood. Thus we have \(\sup _b |{\hat{Q}}_n(b) - Q_n(b) |=o_p(1)\).

Now let \(q_i(x_i,y_i,b)={\mathbf {1}}_{[x_i\in A_x]}(y_{i}\log [F(x_i b)]+(1-y_i)\log [1-F(x_i b)])\). So

$$\begin{aligned} \sup _b |Q_n(b) - Q_0(b)|=\sup _b\left| \frac{1}{n}\sum _{i=1}^n (q_i(x_i,y_i,b)-E[q_i(x_i,y_i,b)])\right| . \end{aligned}$$
(28)

As Ichimura (1993, p. 91), we can use the uniform law of large numbers by Andrews (1987) and so (28) goes to 0 in probability. This completes the proof of theorem 2.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ker, A.P., Sam, A.G. Semiparametric estimation of the link function in binary-choice single-index models. Comput Stat 33, 1429–1455 (2018). https://doi.org/10.1007/s00180-017-0779-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-017-0779-2

Keywords

Navigation