What Do We Choose When We Err? Model Selection and Testing for Misspecified Logistic Regression Revisited

Mielniczuk, Jan; Teisseyre, Paweł

doi:10.1007/978-3-319-18781-5_15

Jan Mielniczuk^4,5 &
Paweł Teisseyre⁵

Part of the book series: Studies in Computational Intelligence ((SCI,volume 605))

1886 Accesses

Abstract

The problem of fitting logistic regression to binary model allowing for missppecification of the response function is reconsidered. We introduce two-stage procedure which consists first in ordering predictors with respect to deviances of the models with the predictor in question omitted and then choosing the minimizer of Generalized Information Criterion in the resulting nested family of models. This allows for large number of potential predictors to be considered in contrast to an exhaustive method. We prove that the procedure consistently chooses model $t^{*}$ which is the closest in the averaged Kullback-Leibler sense to the true binary model t. We then consider interplay between t and $t^{*}$ and prove that for monotone response function when there is genuine dependence of response on predictors, $t^{*}$ is necessarily nonempty. This implies consistency of a deviance test of significance under misspecification. For a class of distributions of predictors, including normal family, Rudd’s result asserts that $t^{*}=t$. Numerical experiments reveal that for normally distributed predictors probability of correct selection and power of deviance test depend monotonically on Rudd’s proportionality constant $\eta $.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bache K, Lichman M (2013) UCI machine learning repository. University of California, Irvine
Google Scholar
Bishop CM (2006) Pattern recognition and machine learning. Springer, New York
Google Scholar
Bogdan M, Doerge R, Ghosh J (2004) Modifying the Schwarz Bayesian Information Criterion to locate multiple interacting quantitative trait loci. Genetics 167:989–999
Google Scholar
Bozdogan H (1987) Model selection and Akaike’s information criterion (AIC): the general theory and its analitycal extensions. Psychometrika 52:345–370
Google Scholar
Burnham K, Anderson D (2002) Model selection and multimodel inference. A practical information-theoretic approach. Springer, New York
Google Scholar
Carroll R, Pederson S (1993) On robustness in the logistic regression model. J R Stat Soc B 55:693–706
MathSciNet MATH Google Scholar
Casella G, Giron J, Martinez M, Moreno E (2009) Consistency of Bayes procedures for variable selection. Ann Stat 37:1207–1228
Google Scholar
Chen J, Chen Z (2008) Extended Bayesian Information Criteria for model selection with large model spaces. Biometrika 95:759–771
Google Scholar
Chen J, Chen Z (2012) Extended BIC for small-n-large-p sparse glm. Statistica Sinica 22:555–574
Google Scholar
Claeskens G, Hjort N (2008) Model selection and model averaging. Cambridge University Press, Cambridge
Book MATH Google Scholar
Czado C, Santner T (1992) The effect of link misspecification on binary regression inference. J Stat Plann Infer 33:213–231
Article MathSciNet Google Scholar
Fahrmeir L (1987) Asymptotic testing theory for generalized linear models. Statistics 1:65–76
Article MathSciNet Google Scholar
Fahrmeir L (1990) Maximum likelihood estimation in misspecified generalized linear models. Statistics 4:487–502
Article MathSciNet Google Scholar
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Stat Assoc 96:1348–1360
Article MathSciNet MATH Google Scholar
Foster D, George E (1994) The risk inflation criterion for multiple regression. Ann Stat 22:1947–1975
Google Scholar
Hjort N, Pollard D (1993) Asymptotics for minimisers of convex processes. Unpublished manuscript
Google Scholar
Konishi S, Kitagawa G (2008) Information criteria and statistical modeling. Springer, New York
Google Scholar
Lehmann E (1959) Testing statistical hypotheses. Wiley, New York
MATH Google Scholar
Li K, Duan N (1991) Slicing regression: a link-free regression method. Ann Stat 19(2):505–530
Article MathSciNet Google Scholar
Qian G, Field C (2002) Law of iterated logarithm and consistent model selection criterion in logistic regression. Stat Probab Lett 56:101–112
Article MathSciNet MATH Google Scholar
Ruud P (1983) Sufficient conditions for the consistency of maximum likelihood estimation despite misspecification of distribution in multinomial discrete choice models. Econometrica 51(1):225–228
Google Scholar
Sin C, White H (1996) Information criteria for selecting possibly misspecified parametric models. J Econometrics 71:207–225
Article MathSciNet MATH Google Scholar
Zak-Szatkowska M, Bogdan M (2011) Modified versions of Baysian Information Criterion for sparse generalized linear models. Comput Stat Data Anal 5:2908–2924
Google Scholar
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc B 67(2):301–320
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Mathematics and Information Science, Warsaw University of Technology, Koszykowa 75, 00-662, Warsaw, Poland
Jan Mielniczuk
Institute of Computer Science, Polish Academy of Sciences, Jana Kazimierza 5, 01-248, Warsaw, Poland
Jan Mielniczuk & Paweł Teisseyre

Authors

Jan Mielniczuk
View author publications
You can also search for this author in PubMed Google Scholar
Paweł Teisseyre
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan Mielniczuk .

Editor information

Editors and Affiliations

Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada
Stan Matwin
Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland, and Warsaw University of Technology, Warsaw, Poland
Jan Mielniczuk

Appendix A: Auxiliary Lemmas

This section contains some auxiliary facts used in the proofs. The following theorem states the asymptotic normality of maximum likelihood estimator.

Theorem 6

Assume (A1) and (A2). Then

$$\begin{aligned} \sqrt{n}(\hat{\varvec{\beta }}-\varvec{\beta }^{*})\xrightarrow {d}N(0,J^{-1}(\varvec{\beta }^{*})K(\varvec{\beta }^{*})J^{-1}(\varvec{\beta }^{*})) \end{aligned}$$

where J and K are defined in (5) and (9), respectively.

The above Theorem is stated in [11] (Theorem 3.1) and in [16] ((2.10) and Sect. 5B).

Lemma 4

Assume that $\max _{1\le i\le n}|\mathbf {x}_i'(\varvec{\upgamma }-\varvec{\beta })|\le C$ for some $C>0$ and some $\varvec{\upgamma }\in R^{p+1}$. Then for any $\mathbf {c}\in R^{p+1}$

$$\begin{aligned} \exp (-3C)\mathbf {c}'J_{n}(\varvec{\beta })\mathbf {c}\le \mathbf {c}'J_{n}(\varvec{\upgamma })\mathbf {c}\le \exp (3C)\mathbf {c}'J_{n}(\varvec{\beta })\mathbf {c},\quad a.e. \end{aligned}$$

Proof

It suffices to show that for $i=1,\ldots ,n$

$$\begin{aligned} \exp (-3C)p(\mathbf {x}_i'\varvec{\beta })[1-p(\mathbf {x}_i'\varvec{\beta })]\le p(\mathbf {x}_i'\varvec{\upgamma })[1-p(\mathbf {x}_i'\varvec{\upgamma })]\le \exp (3C)p(\mathbf {x}_i'\varvec{\beta })[1-p(\mathbf {x}_i'\varvec{\beta })]. \end{aligned}$$

Observe that for $\varvec{\upgamma }$ such that $\max _{i\le n}|\mathbf {x}_i'(\varvec{\upgamma }-\varvec{\beta })|\le C$ there is

$$\begin{aligned} \frac{p(\mathbf {x}_i'\varvec{\upgamma })[1-p(\mathbf {x}_i'\varvec{\upgamma })]}{p(\mathbf {x}_i'\varvec{\beta })[1-p(\mathbf {x}_i'\varvec{\beta })]}= e^{\mathbf {x}_i'(\varvec{\upgamma }-\varvec{\beta })}\left[ \frac{1+e^{\mathbf {x}_i'\varvec{\beta }}}{1+e^{\mathbf {x}_i'\varvec{\upgamma }}}\right] ^2\ge e^{-C}\left[ \frac{e^{-\mathbf {x}_i'\varvec{\beta }}+1}{e^{-\mathbf {x}_i'\varvec{\beta }}+e^{C}}\right] ^2\ge e^{-3C}. \end{aligned}$$

(15)

By replacing $\varvec{\beta }$ and $\varvec{\upgamma }$ in (15) we obtain the upper bound for $\mathbf {c}'J_{n}(\varvec{\upgamma })\mathbf {c}$.

Lemma 5

Assume (A1) and (A2). Then $l(\hat{\varvec{\beta }},\mathbf {Y}|\mathbf {X})-l(\varvec{\beta }^{*},\mathbf {Y}|\mathbf {X})=O_{P}(1)$.

Proof

Using Taylor expansion we have for some $\bar{\varvec{\beta }}$ belonging to the line segment joining $\hat{\varvec{\beta }}$ and $\varvec{\beta }^{*}$

(16)

Define set $A_{n}=\{\varvec{\upgamma }: ||\varvec{\upgamma }-\varvec{\beta }^{*}||\le s_n\}$, where $s_n$ is an arbitrary sequence such that $ns_n^2\rightarrow 0$. Using Schwarz and Markov inequalities we have for any $C>0$

$$\begin{aligned} P&[\max _{i\le i\le n}|\mathbf {x}_i'(\varvec{\upgamma }-\varvec{\beta }^*)|>C]\le P[\max _{1\le i\le n}||\mathbf {x}_i||s_n>C]\\&\le n \max _{i\le i\le n}P[||\mathbf {x}_i||>Cs_n^{-1}]\le C^{-2}ns_n^{2}\mathbf {E}(||\mathbf {x}||^2)\rightarrow 0. \end{aligned}$$

Thus using Lemma 4 the quadratic form in (16) is bounded with probability tending to 1 from above by

$$\begin{aligned} \exp (3C)\sqrt{n}(\hat{\varvec{\beta }}-\varvec{\beta }^{*})'[J_{n}(\varvec{\beta }^*)/n]\sqrt{n}(\hat{\varvec{\beta }}-\varvec{\beta }^{*})/2, \end{aligned}$$

which is $O_{P}(1)$ as $\sqrt{n}(\hat{\varvec{\beta }}-\varvec{\beta }^{*})=O_{P}(1)$ in view of Theorem 6 and $n^{-1}J_{n}(\varvec{\beta }^*)\xrightarrow {P}J(\varvec{\beta }^*)$.

1.1 A.1 Proof of Lemma 2

As $\varvec{\beta }^*_{m}=\varvec{\beta }^*_c$ we have for $c\supseteq m \supseteq t^*$

$$\begin{aligned} l(\hat{\varvec{\beta }}_c,\mathbf {Y}|\mathbf {X})-l(\hat{\varvec{\beta }}_m,\mathbf {Y}|\mathbf {X})= [l(\hat{\varvec{\beta }}_c,\mathbf {Y}|\mathbf {X})-l(\varvec{\beta }^*_c,\mathbf {Y}|\mathbf {X})]+ [l(\varvec{\beta }^*_m,\mathbf {Y}|\mathbf {X})-l(\hat{\varvec{\beta }}_m|\mathbf {X},\mathbf {Y})], \end{aligned}$$

which is $O_{P}(1)$ in view of Remark 1 and Lemma 5.

1.2 A.2 Proof of Lemma 3

The difference $l(\hat{\varvec{\beta }}_c,\mathbf {Y}|\mathbf {X})-l(\hat{\varvec{\beta }}_w,\mathbf {Y}|\mathbf {X})$ can be written as

$$\begin{aligned}{}[l(\hat{\varvec{\beta }}_c,\mathbf {Y}|\mathbf {X})-l(\varvec{\beta }^*,\mathbf {Y}|\mathbf {X})]+ [l(\varvec{\beta }^*,\mathbf {Y}|\mathbf {X})-l(\hat{\varvec{\beta }}_w|\mathbf {X},\mathbf {Y})]. \end{aligned}$$

(17)

It follows from Lemma 5 and Remark 1 that the first term in (17) is $O_{P}(1)$. We will show that the probability that the second term in (17) is greater or equal $\alpha _1nd_n^2$, for some $\alpha _1>0$ tends to 1. Define set $A_n=\{\varvec{\upgamma }:||\varvec{\upgamma }-\varvec{\beta }^*||\le d_n\}$. Using the Schwarz inequality we have

$$\begin{aligned} \sup _{\varvec{\upgamma }\in A_n}\max _{i\le n}|\mathbf {x}_i'(\varvec{\upgamma }-\varvec{\beta }^*)|<\max _{1\le i\le n}||\mathbf {x}_i||d_n\le 1, \end{aligned}$$

(18)

with probability one. Define $H_n(\varvec{\upgamma })=l(\varvec{\beta }^*,\mathbf {Y}|\mathbf {X})-l(\varvec{\upgamma },\mathbf {Y}|\mathbf {X})$. Note that $H(\varvec{\upgamma })$ is convex and $H(\varvec{\beta }^*)=0$. For any incorrect model w, in view of definition (11) of $d_n$, we have $\hat{\varvec{\beta }}_w\notin A_n$ for sufficiently large n. Thus it suffices to show that $P(\inf _{\varvec{\upgamma }\in \partial A_n}H_n(\varvec{\upgamma })> \alpha _1 nd_n^{2})\rightarrow 1$, as $n\rightarrow \infty $, for some $\alpha _1>0$. Using Taylor expansion for some $\bar{\varvec{\upgamma }}$ belonging to the line segment joining $\varvec{\upgamma }$ and $\varvec{\beta }^*$

$$\begin{aligned} l(\varvec{\upgamma },\mathbf {Y}|\mathbf {X})-l(\varvec{\beta }^*,\mathbf {Y}|\mathbf {X})= (\varvec{\upgamma }-\varvec{\beta }^*)'s_n(\varvec{\beta }^*)-(\varvec{\upgamma }-\varvec{\beta }^*)'J_{n}(\bar{\varvec{\upgamma }})(\varvec{\upgamma }-\varvec{\beta }^*)/2 \end{aligned}$$

and the last convergence is implied by

$$\begin{aligned} P[\sup _{\varvec{\upgamma }\in \partial A_n}(\varvec{\upgamma }-\varvec{\beta }^*)'s_n(\varvec{\beta }^*)>\inf _{\varvec{\upgamma }\in \partial A_n}(\varvec{\upgamma }-\varvec{\beta }^*)'J_n(\bar{\varvec{\upgamma }})(\varvec{\upgamma }-\varvec{\beta }^*)/2-\alpha _1nd_n^2]\rightarrow 0. \end{aligned}$$

(19)

It follows from Lemma 4 and (18) that for $\varvec{\upgamma }\in A_n$

$$\begin{aligned} (\varvec{\upgamma }-\varvec{\beta }^*)'J_{n}(\bar{\varvec{\upgamma }})(\varvec{\upgamma }-\varvec{\beta }^*)\ge e^{-3}(\varvec{\upgamma }-\varvec{\beta }^*)'J_{n}(\varvec{\beta }^*)(\varvec{\upgamma }-\varvec{\beta }^*). \end{aligned}$$

(20)

Let $\tau =\exp (-3)/2$. Using (20), the probability in (19) can be bounded from above by

$$\begin{aligned}&P[\sup _{\varvec{\upgamma }\in \partial A_n}(\varvec{\upgamma }-\varvec{\beta })'s_n(\varvec{\beta })>\tau d_n^2\lambda _{\min }(J_n(\varvec{\beta }))-\alpha _1nd_n^2] \\&\qquad +\,P[\inf _{\varvec{\upgamma }\in \partial A_n}(\varvec{\upgamma }-\varvec{\beta })'J_n(\bar{\varvec{\upgamma }})(\varvec{\upgamma }-\varvec{\beta })/2<\tau d_n^2\lambda _{\min }(J_n(\varvec{\beta }))]. \end{aligned}$$

(21)

Let $\lambda _{1}^{-}=\lambda _{\min }(J(\varvec{\beta }))/2$. Assuming $\alpha _1<\lambda _{1}^{-}\tau $, the first probability in (21) can be bounded by

$$\begin{aligned}&P[d_n||s_n(\varvec{\beta })||>\tau nd_n^2\lambda _{1}^{-}-\alpha _1nd_n^2]+ P[\lambda _{\min }(J_n(\varvec{\beta }))<\lambda _{1}^{-}n] \\&\qquad \le P[||s_n(\varvec{\beta })||>(\tau \lambda _{1}^{-}-\alpha _1)n^{1/2}a_n^{1/2}] \\&\qquad +\,P[nd_n<n^{1/2}a_n^{1/2}]+ P[\lambda _{\min }(J_n(\varvec{\beta }))<\lambda _{1}^{-}n]. \end{aligned}$$

(22)

Consider the first probability in (22). Note that $s_n(\varvec{\beta }^*)$ is a random vector with zero mean and the covariance matrix $K_n(\varvec{\beta }^*)$. Using Markov’s inequality, the fact that $\text {cov}[s_{n}(\varvec{\beta }^{*})]=nK(\varvec{\beta }^{*})$ and taking $\alpha _1<\lambda ^{-}\tau $ it can be bounded from above by

$$\begin{aligned} \frac{tr\{\text {cov}[s_{n}(\varvec{\beta }^*)]\}}{(\tau \lambda ^{-}-\alpha _1)^2n^2d_n^2}&= \frac{tr[K_n(\varvec{\beta }^*)]}{(\tau \lambda ^{-}-\alpha _1)^2n^2d_n^2}\le \frac{n\kappa p}{(\tau \lambda ^{-}-\alpha _1)^2n^2d_n^2}\\\nonumber&\le \frac{\kappa p}{(\tau \lambda ^{-}-\alpha _1)^2 a_n}\rightarrow 0, \end{aligned}$$

(23)

where the last convergence follows from $a_n\rightarrow \infty $.

The convergence to zero of the second probability in (22) follows from $nd_n^2/a_n\xrightarrow {P}\infty $. As eigenvalues of a matrix are continuous functions of its entries, we have $\lambda _{\min }(n^{-1}J_{n}(\varvec{\beta }^*))\xrightarrow {P}\lambda _{\min }(J(\varvec{\beta }^*))$. Thus the convergence to zero of the third probability in (22) follows from the fact that in view of (A1) matrix $J(\varvec{\beta }^*)$ is positive definite. The second term in (21) is bounded from above by

$$\begin{aligned} P&[\inf _{\varvec{\upgamma }\in \partial A_n}(\varvec{\upgamma }-\varvec{\beta })'J_n(\bar{\varvec{\upgamma }})(\varvec{\upgamma }-\varvec{\beta })/2<\tau d_n^2\lambda _{\min }(J_n(\varvec{\beta }))]\\&\le P[\inf _{\varvec{\upgamma }\in \partial A_n}(\varvec{\upgamma }-\varvec{\beta })'[J_n(\bar{\varvec{\upgamma }})-2\tau J_n(\varvec{\beta })](\varvec{\upgamma }-\varvec{\beta })/2\\&\quad +2\tau d_n^2\lambda _{\min }(J_n(\varvec{\beta }))/2<\tau d_n^2\lambda _{\min }(J_n(\varvec{\beta }))]\\&\le P[\inf _{\varvec{\upgamma }\in \partial A_n}(\varvec{\upgamma }-\varvec{\beta })'[J_n(\bar{\varvec{\upgamma }})-2\tau J_n(\varvec{\beta })](\varvec{\upgamma }-\varvec{\beta })/2<0]\rightarrow 0, \end{aligned}$$

where the last convergence follows from Lemma 4 and (18).

Lemma 6

Assume (A2) and (A3). Then we have $\max _{i\le n}||\mathbf {x}_i||^2a_n/n\xrightarrow {P}0$.

Proof

Using Markov inequality, (A2) and (A3) we have that $||\mathbf {x}_n||^{2}a_n/n\xrightarrow {P}0$. We show that this implies the conclusion. Denote $g_n:=\max _{1\le i\le n}||\mathbf {x}_i||^{2}a_n/n$ and $h_n:=||\mathbf {x}_n||^{2}a_n/n$. Define sequence $n_k$ such that $n_1=1$ and $n_{k+1}=\min \{n>n_k:\max _{i\le n}||\mathbf {x}_i||^{2}>\max _{i\le n_k}||\mathbf {x}_i||^{2}\}$ (if such $n_{k+1}$ does not exist put $n_{k+1}=n_k$). Without loss of generality we assume that for $A=\{n_k\rightarrow \infty \}$ we have $P(A)=1$ as on $A^c$ the conclusion is trivially satisfied. Observe that $g_{n_k}=h_{n_k}$ and $h_{n_k}\xrightarrow {P}0 $ as a subsequence of $h_n\xrightarrow {P}0$ and thus also $g_{n_k}\xrightarrow {P}0$. This implies that for any $\epsilon >0$ there exists $n_0\in \mathbf {N}$ such that for $n_k>n_0$ we have $P[|g_{n_k}|\le \epsilon ]\ge 1-\epsilon $. As for $n\in (n_k,n_{k+1})$ $g_n\le g_{n_k}$ since $a_n/n$ is nonincreasing we have that if $n\ge n_0$ $P[|g_n|\le \epsilon ]\ge 1-\epsilon $ i.e. $g_n\xrightarrow {P}0$.

1.3 A.3 Proof of Proposition 1

Assume first that $\tilde{\varvec{\beta }}^{*}=0$ and note that this implies $p(\beta _{0}+\tilde{\mathbf {x}}'\tilde{\varvec{\beta }}^{*})=p(\beta _{0})=C\in (0,1)$. From (8) we have

$$\begin{aligned} P(y=1)=\mathbf {E}(y)=\mathbf {E}[\mathbf {E}(y|\tilde{\mathbf {x}})]=\mathbf {E}[q(\beta _{0}+\tilde{\mathbf {x}}'\tilde{\varvec{\beta }})]=\mathbf {E}[p(\beta _{0}^{*}+\tilde{\mathbf {x}}'\tilde{\varvec{\beta }}^{*})]=C. \end{aligned}$$

(24)

Using (24) and (7) we get

$$\begin{aligned} \mathbf {E}(\tilde{\mathbf {x}} y)&=\mathbf {E}\{\mathbf {E}[\tilde{\mathbf {x}} y|\tilde{\mathbf {x}}]\}=\mathbf {E}\{\tilde{\mathbf {x}}\mathbf {E}[y|\tilde{\mathbf {x}}]\}=\mathbf {E}[\tilde{\mathbf {x}} q(\beta _{0}+\tilde{\mathbf {x}}'\tilde{\varvec{\beta }})]\\\nonumber&=\mathbf {E}[\tilde{\mathbf {x}} p(\beta _{0}^{*}+\tilde{\mathbf {x}}'\tilde{\varvec{\beta }}^{*})] =\mathbf {E}(\tilde{\mathbf {x}})C. \end{aligned}$$

(25)

From (24) we also have

$$\mathbf {E}(\tilde{\mathbf {x}}y)=\mathbf {E}\tilde{\mathbf {x}} I\{y=1\}=\mathbf {E}(\tilde{\mathbf {x}}|y=1)P(y=1)=\mathbf {E}(\tilde{\mathbf {x}}|y=1)C.$$

Comparing the last equation and right-side term in (25) we obtain $\mathbf {E}(\tilde{\mathbf {x}}|y=1)=E{\tilde{\mathbf {x}}}=\mathbf {E}(\tilde{\mathbf {x}}|y=0)$. Assume now $\mathbf {E}(\tilde{\mathbf {x}}|y=1)=\mathbf {E}(\tilde{\mathbf {x}}|y=0)$ which implies as before that that $\mathbf {E}(\tilde{\mathbf {x}}|y=1)=\mathbf {E}(\tilde{\mathbf {x}})$. Thus

$$\begin{aligned} \mathbf {E}(\tilde{\mathbf {x}} y)=\mathbf {E}(\tilde{\mathbf {x}}|y=1)\mathbf {E}(y)=\mathbf {E}(\tilde{\mathbf {x}})\mathbf {E}(y). \end{aligned}$$

(26)

Since $(\beta _{0}^{*},\tilde{\varvec{\beta }}^{*})$ is unique it suffices to show that (7) and (8) are satisfied for $\tilde{\varvec{\beta }}^{*}=0$ and $\beta _{0}^*$ such that $Ep(\beta _{0}^*)=P(Y=1)$. This easily follows from (26).

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mielniczuk, J., Teisseyre, P. (2016). What Do We Choose When We Err? Model Selection and Testing for Misspecified Logistic Regression Revisited. In: Matwin, S., Mielniczuk, J. (eds) Challenges in Computational Statistics and Data Mining. Studies in Computational Intelligence, vol 605. Springer, Cham. https://doi.org/10.1007/978-3-319-18781-5_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-18781-5_15
Published: 28 June 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18780-8
Online ISBN: 978-3-319-18781-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

What Do We Choose When We Err? Model Selection and Testing for Misspecified Logistic Regression Revisited

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix A: Auxiliary Lemmas

Appendix A: Auxiliary Lemmas

Theorem 6

Lemma 4

Proof

Lemma 5

Proof

1.1 A.1 Proof of Lemma 2

1.2 A.2 Proof of Lemma 3

Lemma 6

Proof

1.3 A.3 Proof of Proposition 1

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation