Skip to main content

What Do We Choose When We Err? Model Selection and Testing for Misspecified Logistic Regression Revisited

  • Chapter
  • First Online:
Challenges in Computational Statistics and Data Mining

Part of the book series: Studies in Computational Intelligence ((SCI,volume 605))

  • 1886 Accesses

Abstract

The problem of fitting logistic regression to binary model allowing for missppecification of the response function is reconsidered. We introduce two-stage procedure which consists first in ordering predictors with respect to deviances of the models with the predictor in question omitted and then choosing the minimizer of Generalized Information Criterion in the resulting nested family of models. This allows for large number of potential predictors to be considered in contrast to an exhaustive method. We prove that the procedure consistently chooses model \(t^{*}\) which is the closest in the averaged Kullback-Leibler sense to the true binary model t. We then consider interplay between t and \(t^{*}\) and prove that for monotone response function when there is genuine dependence of response on predictors, \(t^{*}\) is necessarily nonempty. This implies consistency of a deviance test of significance under misspecification. For a class of distributions of predictors, including normal family, Rudd’s result asserts that \(t^{*}=t\). Numerical experiments reveal that for normally distributed predictors probability of correct selection and power of deviance test depend monotonically on Rudd’s proportionality constant \(\eta \).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bache K, Lichman M (2013) UCI machine learning repository. University of California, Irvine

    Google Scholar 

  2. Bishop CM (2006) Pattern recognition and machine learning. Springer, New York

    Google Scholar 

  3. Bogdan M, Doerge R, Ghosh J (2004) Modifying the Schwarz Bayesian Information Criterion to locate multiple interacting quantitative trait loci. Genetics 167:989–999

    Google Scholar 

  4. Bozdogan H (1987) Model selection and Akaike’s information criterion (AIC): the general theory and its analitycal extensions. Psychometrika 52:345–370

    Google Scholar 

  5. Burnham K, Anderson D (2002) Model selection and multimodel inference. A practical information-theoretic approach. Springer, New York

    Google Scholar 

  6. Carroll R, Pederson S (1993) On robustness in the logistic regression model. J R Stat Soc B 55:693–706

    MathSciNet  MATH  Google Scholar 

  7. Casella G, Giron J, Martinez M, Moreno E (2009) Consistency of Bayes procedures for variable selection. Ann Stat 37:1207–1228

    Google Scholar 

  8. Chen J, Chen Z (2008) Extended Bayesian Information Criteria for model selection with large model spaces. Biometrika 95:759–771

    Google Scholar 

  9. Chen J, Chen Z (2012) Extended BIC for small-n-large-p sparse glm. Statistica Sinica 22:555–574

    Google Scholar 

  10. Claeskens G, Hjort N (2008) Model selection and model averaging. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  11. Czado C, Santner T (1992) The effect of link misspecification on binary regression inference. J Stat Plann Infer 33:213–231

    Article  MathSciNet  Google Scholar 

  12. Fahrmeir L (1987) Asymptotic testing theory for generalized linear models. Statistics 1:65–76

    Article  MathSciNet  Google Scholar 

  13. Fahrmeir L (1990) Maximum likelihood estimation in misspecified generalized linear models. Statistics 4:487–502

    Article  MathSciNet  Google Scholar 

  14. Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Stat Assoc 96:1348–1360

    Article  MathSciNet  MATH  Google Scholar 

  15. Foster D, George E (1994) The risk inflation criterion for multiple regression. Ann Stat 22:1947–1975

    Google Scholar 

  16. Hjort N, Pollard D (1993) Asymptotics for minimisers of convex processes. Unpublished manuscript

    Google Scholar 

  17. Konishi S, Kitagawa G (2008) Information criteria and statistical modeling. Springer, New York

    Google Scholar 

  18. Lehmann E (1959) Testing statistical hypotheses. Wiley, New York

    MATH  Google Scholar 

  19. Li K, Duan N (1991) Slicing regression: a link-free regression method. Ann Stat 19(2):505–530

    Article  MathSciNet  Google Scholar 

  20. Qian G, Field C (2002) Law of iterated logarithm and consistent model selection criterion in logistic regression. Stat Probab Lett 56:101–112

    Article  MathSciNet  MATH  Google Scholar 

  21. Ruud P (1983) Sufficient conditions for the consistency of maximum likelihood estimation despite misspecification of distribution in multinomial discrete choice models. Econometrica 51(1):225–228

    Google Scholar 

  22. Sin C, White H (1996) Information criteria for selecting possibly misspecified parametric models. J Econometrics 71:207–225

    Article  MathSciNet  MATH  Google Scholar 

  23. Zak-Szatkowska M, Bogdan M (2011) Modified versions of Baysian Information Criterion for sparse generalized linear models. Comput Stat Data Anal 5:2908–2924

    Google Scholar 

  24. Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc B 67(2):301–320

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jan Mielniczuk .

Editor information

Editors and Affiliations

Appendix A: Auxiliary Lemmas

Appendix A: Auxiliary Lemmas

This section contains some auxiliary facts used in the proofs. The following theorem states the asymptotic normality of maximum likelihood estimator.

Theorem 6

Assume (A1) and (A2). Then

$$\begin{aligned} \sqrt{n}(\hat{\varvec{\beta }}-\varvec{\beta }^{*})\xrightarrow {d}N(0,J^{-1}(\varvec{\beta }^{*})K(\varvec{\beta }^{*})J^{-1}(\varvec{\beta }^{*})) \end{aligned}$$

where J and K are defined in (5) and (9), respectively.

The above Theorem is stated in [11] (Theorem 3.1) and in [16] ((2.10) and Sect. 5B).

Lemma 4

Assume that \(\max _{1\le i\le n}|\mathbf {x}_i'(\varvec{\upgamma }-\varvec{\beta })|\le C\) for some \(C>0\) and some \(\varvec{\upgamma }\in R^{p+1}\). Then for any \(\mathbf {c}\in R^{p+1}\)

$$\begin{aligned} \exp (-3C)\mathbf {c}'J_{n}(\varvec{\beta })\mathbf {c}\le \mathbf {c}'J_{n}(\varvec{\upgamma })\mathbf {c}\le \exp (3C)\mathbf {c}'J_{n}(\varvec{\beta })\mathbf {c},\quad a.e. \end{aligned}$$

Proof

It suffices to show that for \(i=1,\ldots ,n\)

$$\begin{aligned} \exp (-3C)p(\mathbf {x}_i'\varvec{\beta })[1-p(\mathbf {x}_i'\varvec{\beta })]\le p(\mathbf {x}_i'\varvec{\upgamma })[1-p(\mathbf {x}_i'\varvec{\upgamma })]\le \exp (3C)p(\mathbf {x}_i'\varvec{\beta })[1-p(\mathbf {x}_i'\varvec{\beta })]. \end{aligned}$$

Observe that for \(\varvec{\upgamma }\) such that \(\max _{i\le n}|\mathbf {x}_i'(\varvec{\upgamma }-\varvec{\beta })|\le C\) there is

$$\begin{aligned} \frac{p(\mathbf {x}_i'\varvec{\upgamma })[1-p(\mathbf {x}_i'\varvec{\upgamma })]}{p(\mathbf {x}_i'\varvec{\beta })[1-p(\mathbf {x}_i'\varvec{\beta })]}= e^{\mathbf {x}_i'(\varvec{\upgamma }-\varvec{\beta })}\left[ \frac{1+e^{\mathbf {x}_i'\varvec{\beta }}}{1+e^{\mathbf {x}_i'\varvec{\upgamma }}}\right] ^2\ge e^{-C}\left[ \frac{e^{-\mathbf {x}_i'\varvec{\beta }}+1}{e^{-\mathbf {x}_i'\varvec{\beta }}+e^{C}}\right] ^2\ge e^{-3C}. \end{aligned}$$
(15)

By replacing \(\varvec{\beta }\) and \(\varvec{\upgamma }\) in (15) we obtain the upper bound for \(\mathbf {c}'J_{n}(\varvec{\upgamma })\mathbf {c}\).

Lemma 5

Assume (A1) and (A2). Then \(l(\hat{\varvec{\beta }},\mathbf {Y}|\mathbf {X})-l(\varvec{\beta }^{*},\mathbf {Y}|\mathbf {X})=O_{P}(1)\).

Proof

Using Taylor expansion we have for some \(\bar{\varvec{\beta }}\) belonging to the line segment joining \(\hat{\varvec{\beta }}\) and \(\varvec{\beta }^{*}\)

(16)

Define set \(A_{n}=\{\varvec{\upgamma }: ||\varvec{\upgamma }-\varvec{\beta }^{*}||\le s_n\}\), where \(s_n\) is an arbitrary sequence such that \(ns_n^2\rightarrow 0\). Using Schwarz and Markov inequalities we have for any \(C>0\)

$$\begin{aligned} P&[\max _{i\le i\le n}|\mathbf {x}_i'(\varvec{\upgamma }-\varvec{\beta }^*)|>C]\le P[\max _{1\le i\le n}||\mathbf {x}_i||s_n>C]\\&\le n \max _{i\le i\le n}P[||\mathbf {x}_i||>Cs_n^{-1}]\le C^{-2}ns_n^{2}\mathbf {E}(||\mathbf {x}||^2)\rightarrow 0. \end{aligned}$$

Thus using Lemma 4 the quadratic form in (16) is bounded with probability tending to 1 from above by

$$\begin{aligned} \exp (3C)\sqrt{n}(\hat{\varvec{\beta }}-\varvec{\beta }^{*})'[J_{n}(\varvec{\beta }^*)/n]\sqrt{n}(\hat{\varvec{\beta }}-\varvec{\beta }^{*})/2, \end{aligned}$$

which is \(O_{P}(1)\) as \(\sqrt{n}(\hat{\varvec{\beta }}-\varvec{\beta }^{*})=O_{P}(1)\) in view of Theorem 6 and \(n^{-1}J_{n}(\varvec{\beta }^*)\xrightarrow {P}J(\varvec{\beta }^*)\).

1.1 A.1 Proof of Lemma 2

As \(\varvec{\beta }^*_{m}=\varvec{\beta }^*_c\) we have for \(c\supseteq m \supseteq t^*\)

$$\begin{aligned} l(\hat{\varvec{\beta }}_c,\mathbf {Y}|\mathbf {X})-l(\hat{\varvec{\beta }}_m,\mathbf {Y}|\mathbf {X})= [l(\hat{\varvec{\beta }}_c,\mathbf {Y}|\mathbf {X})-l(\varvec{\beta }^*_c,\mathbf {Y}|\mathbf {X})]+ [l(\varvec{\beta }^*_m,\mathbf {Y}|\mathbf {X})-l(\hat{\varvec{\beta }}_m|\mathbf {X},\mathbf {Y})], \end{aligned}$$

which is \(O_{P}(1)\) in view of Remark 1 and Lemma 5.

1.2 A.2 Proof of Lemma 3

The difference \(l(\hat{\varvec{\beta }}_c,\mathbf {Y}|\mathbf {X})-l(\hat{\varvec{\beta }}_w,\mathbf {Y}|\mathbf {X})\) can be written as

$$\begin{aligned}{}[l(\hat{\varvec{\beta }}_c,\mathbf {Y}|\mathbf {X})-l(\varvec{\beta }^*,\mathbf {Y}|\mathbf {X})]+ [l(\varvec{\beta }^*,\mathbf {Y}|\mathbf {X})-l(\hat{\varvec{\beta }}_w|\mathbf {X},\mathbf {Y})]. \end{aligned}$$
(17)

It follows from Lemma 5 and Remark 1 that the first term in (17) is \(O_{P}(1)\). We will show that the probability that the second term in (17) is greater or equal \(\alpha _1nd_n^2\), for some \(\alpha _1>0\) tends to 1. Define set \(A_n=\{\varvec{\upgamma }:||\varvec{\upgamma }-\varvec{\beta }^*||\le d_n\}\). Using the Schwarz inequality we have

$$\begin{aligned} \sup _{\varvec{\upgamma }\in A_n}\max _{i\le n}|\mathbf {x}_i'(\varvec{\upgamma }-\varvec{\beta }^*)|<\max _{1\le i\le n}||\mathbf {x}_i||d_n\le 1, \end{aligned}$$
(18)

with probability one. Define \(H_n(\varvec{\upgamma })=l(\varvec{\beta }^*,\mathbf {Y}|\mathbf {X})-l(\varvec{\upgamma },\mathbf {Y}|\mathbf {X})\). Note that \(H(\varvec{\upgamma })\) is convex and \(H(\varvec{\beta }^*)=0\). For any incorrect model w, in view of definition (11) of \(d_n\), we have \(\hat{\varvec{\beta }}_w\notin A_n\) for sufficiently large n. Thus it suffices to show that \(P(\inf _{\varvec{\upgamma }\in \partial A_n}H_n(\varvec{\upgamma })> \alpha _1 nd_n^{2})\rightarrow 1\), as \(n\rightarrow \infty \), for some \(\alpha _1>0\). Using Taylor expansion for some \(\bar{\varvec{\upgamma }}\) belonging to the line segment joining \(\varvec{\upgamma }\) and \(\varvec{\beta }^*\)

$$\begin{aligned} l(\varvec{\upgamma },\mathbf {Y}|\mathbf {X})-l(\varvec{\beta }^*,\mathbf {Y}|\mathbf {X})= (\varvec{\upgamma }-\varvec{\beta }^*)'s_n(\varvec{\beta }^*)-(\varvec{\upgamma }-\varvec{\beta }^*)'J_{n}(\bar{\varvec{\upgamma }})(\varvec{\upgamma }-\varvec{\beta }^*)/2 \end{aligned}$$

and the last convergence is implied by

$$\begin{aligned} P[\sup _{\varvec{\upgamma }\in \partial A_n}(\varvec{\upgamma }-\varvec{\beta }^*)'s_n(\varvec{\beta }^*)>\inf _{\varvec{\upgamma }\in \partial A_n}(\varvec{\upgamma }-\varvec{\beta }^*)'J_n(\bar{\varvec{\upgamma }})(\varvec{\upgamma }-\varvec{\beta }^*)/2-\alpha _1nd_n^2]\rightarrow 0. \end{aligned}$$
(19)

It follows from Lemma 4 and (18) that for \(\varvec{\upgamma }\in A_n\)

$$\begin{aligned} (\varvec{\upgamma }-\varvec{\beta }^*)'J_{n}(\bar{\varvec{\upgamma }})(\varvec{\upgamma }-\varvec{\beta }^*)\ge e^{-3}(\varvec{\upgamma }-\varvec{\beta }^*)'J_{n}(\varvec{\beta }^*)(\varvec{\upgamma }-\varvec{\beta }^*). \end{aligned}$$
(20)

Let \(\tau =\exp (-3)/2\). Using (20), the probability in (19) can be bounded from above by

$$\begin{aligned}&P[\sup _{\varvec{\upgamma }\in \partial A_n}(\varvec{\upgamma }-\varvec{\beta })'s_n(\varvec{\beta })>\tau d_n^2\lambda _{\min }(J_n(\varvec{\beta }))-\alpha _1nd_n^2] \\&\qquad +\,P[\inf _{\varvec{\upgamma }\in \partial A_n}(\varvec{\upgamma }-\varvec{\beta })'J_n(\bar{\varvec{\upgamma }})(\varvec{\upgamma }-\varvec{\beta })/2<\tau d_n^2\lambda _{\min }(J_n(\varvec{\beta }))]. \end{aligned}$$
(21)

Let \(\lambda _{1}^{-}=\lambda _{\min }(J(\varvec{\beta }))/2\). Assuming \(\alpha _1<\lambda _{1}^{-}\tau \), the first probability in (21) can be bounded by

$$\begin{aligned}&P[d_n||s_n(\varvec{\beta })||>\tau nd_n^2\lambda _{1}^{-}-\alpha _1nd_n^2]+ P[\lambda _{\min }(J_n(\varvec{\beta }))<\lambda _{1}^{-}n] \\&\qquad \le P[||s_n(\varvec{\beta })||>(\tau \lambda _{1}^{-}-\alpha _1)n^{1/2}a_n^{1/2}] \\&\qquad +\,P[nd_n<n^{1/2}a_n^{1/2}]+ P[\lambda _{\min }(J_n(\varvec{\beta }))<\lambda _{1}^{-}n]. \end{aligned}$$
(22)

Consider the first probability in (22). Note that \(s_n(\varvec{\beta }^*)\) is a random vector with zero mean and the covariance matrix \(K_n(\varvec{\beta }^*)\). Using Markov’s inequality, the fact that \(\text {cov}[s_{n}(\varvec{\beta }^{*})]=nK(\varvec{\beta }^{*})\) and taking \(\alpha _1<\lambda ^{-}\tau \) it can be bounded from above by

$$\begin{aligned} \frac{tr\{\text {cov}[s_{n}(\varvec{\beta }^*)]\}}{(\tau \lambda ^{-}-\alpha _1)^2n^2d_n^2}&= \frac{tr[K_n(\varvec{\beta }^*)]}{(\tau \lambda ^{-}-\alpha _1)^2n^2d_n^2}\le \frac{n\kappa p}{(\tau \lambda ^{-}-\alpha _1)^2n^2d_n^2}\\\nonumber&\le \frac{\kappa p}{(\tau \lambda ^{-}-\alpha _1)^2 a_n}\rightarrow 0, \end{aligned}$$
(23)

where the last convergence follows from \(a_n\rightarrow \infty \).

The convergence to zero of the second probability in (22) follows from \(nd_n^2/a_n\xrightarrow {P}\infty \). As eigenvalues of a matrix are continuous functions of its entries, we have \(\lambda _{\min }(n^{-1}J_{n}(\varvec{\beta }^*))\xrightarrow {P}\lambda _{\min }(J(\varvec{\beta }^*))\). Thus the convergence to zero of the third probability in (22) follows from the fact that in view of (A1) matrix \(J(\varvec{\beta }^*)\) is positive definite. The second term in (21) is bounded from above by

$$\begin{aligned} P&[\inf _{\varvec{\upgamma }\in \partial A_n}(\varvec{\upgamma }-\varvec{\beta })'J_n(\bar{\varvec{\upgamma }})(\varvec{\upgamma }-\varvec{\beta })/2<\tau d_n^2\lambda _{\min }(J_n(\varvec{\beta }))]\\&\le P[\inf _{\varvec{\upgamma }\in \partial A_n}(\varvec{\upgamma }-\varvec{\beta })'[J_n(\bar{\varvec{\upgamma }})-2\tau J_n(\varvec{\beta })](\varvec{\upgamma }-\varvec{\beta })/2\\&\quad +2\tau d_n^2\lambda _{\min }(J_n(\varvec{\beta }))/2<\tau d_n^2\lambda _{\min }(J_n(\varvec{\beta }))]\\&\le P[\inf _{\varvec{\upgamma }\in \partial A_n}(\varvec{\upgamma }-\varvec{\beta })'[J_n(\bar{\varvec{\upgamma }})-2\tau J_n(\varvec{\beta })](\varvec{\upgamma }-\varvec{\beta })/2<0]\rightarrow 0, \end{aligned}$$

where the last convergence follows from Lemma 4 and (18).

Lemma 6

Assume (A2) and (A3). Then we have \(\max _{i\le n}||\mathbf {x}_i||^2a_n/n\xrightarrow {P}0\).

Proof

Using Markov inequality, (A2) and (A3) we have that \(||\mathbf {x}_n||^{2}a_n/n\xrightarrow {P}0\). We show that this implies the conclusion. Denote \(g_n:=\max _{1\le i\le n}||\mathbf {x}_i||^{2}a_n/n\) and \(h_n:=||\mathbf {x}_n||^{2}a_n/n\). Define sequence \(n_k\) such that \(n_1=1\) and \(n_{k+1}=\min \{n>n_k:\max _{i\le n}||\mathbf {x}_i||^{2}>\max _{i\le n_k}||\mathbf {x}_i||^{2}\}\) (if such \(n_{k+1}\) does not exist put \(n_{k+1}=n_k\)). Without loss of generality we assume that for \(A=\{n_k\rightarrow \infty \}\) we have \(P(A)=1\) as on \(A^c\) the conclusion is trivially satisfied. Observe that \(g_{n_k}=h_{n_k}\) and \(h_{n_k}\xrightarrow {P}0 \) as a subsequence of \(h_n\xrightarrow {P}0\) and thus also \(g_{n_k}\xrightarrow {P}0\). This implies that for any \(\epsilon >0\) there exists \(n_0\in \mathbf {N}\) such that for \(n_k>n_0\) we have \(P[|g_{n_k}|\le \epsilon ]\ge 1-\epsilon \). As for \(n\in (n_k,n_{k+1})\) \(g_n\le g_{n_k}\) since \(a_n/n\) is nonincreasing we have that if \(n\ge n_0\) \(P[|g_n|\le \epsilon ]\ge 1-\epsilon \) i.e. \(g_n\xrightarrow {P}0\).

1.3 A.3 Proof of Proposition 1

Assume first that \(\tilde{\varvec{\beta }}^{*}=0\) and note that this implies \(p(\beta _{0}+\tilde{\mathbf {x}}'\tilde{\varvec{\beta }}^{*})=p(\beta _{0})=C\in (0,1)\). From (8) we have

$$\begin{aligned} P(y=1)=\mathbf {E}(y)=\mathbf {E}[\mathbf {E}(y|\tilde{\mathbf {x}})]=\mathbf {E}[q(\beta _{0}+\tilde{\mathbf {x}}'\tilde{\varvec{\beta }})]=\mathbf {E}[p(\beta _{0}^{*}+\tilde{\mathbf {x}}'\tilde{\varvec{\beta }}^{*})]=C. \end{aligned}$$
(24)

Using (24) and (7) we get

$$\begin{aligned} \mathbf {E}(\tilde{\mathbf {x}} y)&=\mathbf {E}\{\mathbf {E}[\tilde{\mathbf {x}} y|\tilde{\mathbf {x}}]\}=\mathbf {E}\{\tilde{\mathbf {x}}\mathbf {E}[y|\tilde{\mathbf {x}}]\}=\mathbf {E}[\tilde{\mathbf {x}} q(\beta _{0}+\tilde{\mathbf {x}}'\tilde{\varvec{\beta }})]\\\nonumber&=\mathbf {E}[\tilde{\mathbf {x}} p(\beta _{0}^{*}+\tilde{\mathbf {x}}'\tilde{\varvec{\beta }}^{*})] =\mathbf {E}(\tilde{\mathbf {x}})C. \end{aligned}$$
(25)

From (24) we also have

$$\mathbf {E}(\tilde{\mathbf {x}}y)=\mathbf {E}\tilde{\mathbf {x}} I\{y=1\}=\mathbf {E}(\tilde{\mathbf {x}}|y=1)P(y=1)=\mathbf {E}(\tilde{\mathbf {x}}|y=1)C.$$

Comparing the last equation and right-side term in (25) we obtain \(\mathbf {E}(\tilde{\mathbf {x}}|y=1)=E{\tilde{\mathbf {x}}}=\mathbf {E}(\tilde{\mathbf {x}}|y=0)\). Assume now \(\mathbf {E}(\tilde{\mathbf {x}}|y=1)=\mathbf {E}(\tilde{\mathbf {x}}|y=0)\) which implies as before that that \(\mathbf {E}(\tilde{\mathbf {x}}|y=1)=\mathbf {E}(\tilde{\mathbf {x}})\). Thus

$$\begin{aligned} \mathbf {E}(\tilde{\mathbf {x}} y)=\mathbf {E}(\tilde{\mathbf {x}}|y=1)\mathbf {E}(y)=\mathbf {E}(\tilde{\mathbf {x}})\mathbf {E}(y). \end{aligned}$$
(26)

Since \((\beta _{0}^{*},\tilde{\varvec{\beta }}^{*})\) is unique it suffices to show that (7) and (8) are satisfied for \(\tilde{\varvec{\beta }}^{*}=0\) and \(\beta _{0}^*\) such that \(Ep(\beta _{0}^*)=P(Y=1)\). This easily follows from (26).

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Mielniczuk, J., Teisseyre, P. (2016). What Do We Choose When We Err? Model Selection and Testing for Misspecified Logistic Regression Revisited. In: Matwin, S., Mielniczuk, J. (eds) Challenges in Computational Statistics and Data Mining. Studies in Computational Intelligence, vol 605. Springer, Cham. https://doi.org/10.1007/978-3-319-18781-5_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-18781-5_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-18780-8

  • Online ISBN: 978-3-319-18781-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics