Penalized spline estimation for panel count data model with time-varying coefficients

Qin, Fei; Yu, Zhangsheng

doi:10.1007/s00180-021-01109-z

Penalized spline estimation for panel count data model with time-varying coefficients

Original paper
Published: 12 June 2021

Volume 36, pages 2413–2434, (2021)
Cite this article

Computational Statistics Aims and scope Submit manuscript

273 Accesses
1 Citation
Explore all metrics

Abstract

We consider a panel count data model with both time-varying and time-invariant coefficients. We estimate the baseline function and the time-varying coefficients using penalized splines based on the pseudolikelihood method. We evaluate the performance of three efficient Newton–Rapshon-based algorithms and another adaptive barrier algorithm. We propose a novel cross-validated score to select the smoothing parameters and deduce an easy-to-compute approximation to the score. Extensive simulations are conducted to compare the four algorithms, to compare the proposed penalized spline estimation with regression spline estimation and kernel estimation, and to assess the inference performance and robustness of the penalized spline estimation. Finally, we illustrate our method by using a data set from a childhood wheezing study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Combining the strengths of Dutch survey and register data in a data challenge to predict fertility (PreFer)

Article Open access 13 April 2024

A Guide for Sparse PCA: Model Comparison and Applications

Article Open access 29 June 2021

A Review on Global Sensitivity Analysis Methods

References

Byar D (1980) The veterans administration study of chemoprophylaxis for recurrent stage i bladder tumors: comparisons of placebo, pyridoxine, and topical thiotepa. In: Bladder tumors and other topics in urological oncology. pp 363–370
Cheng G, Zhang Y, Lu L (2011) Efficient algorithms for computing the non and semi-parametric maximum likelihood estimates with panel count data. J Nonparametr Stat 23(2):567–579
Article MathSciNet Google Scholar
Efraim AHNB, Levischaffer F (2008) Review: tissue remodeling and angiogenesis in asthma: the role of the eosinophil. Ther Adv Respir Dis 2(3):163–171
Article Google Scholar
Eilers PHC, Marx BD (1996) Flexible smoothing with b-splines and penalties. Stat Sci 11(2):89–121
Article MathSciNet Google Scholar
Green P, Silverman B (1994) Nonparametric regression and generalized linear models: a roughness penalty approach. J Chapman & Hall, London
Book Google Scholar
Hua L, Zhang Y, Tu W (2014) A spline-based semiparametric sieve likelihood method for over-dispersed panel count data. Canad J Stat Revue Canadienne De Statistique 42(2):217–245
Article MathSciNet Google Scholar
Jongbloed G (1998) The iterative convex minorant algorithm for nonparametric estimation. J Comput Graph Stat 7(3):310–321
MathSciNet Google Scholar
Kauermann G (2005) A note on smoothing parameter selection for penalized spline smoothing. J Stat Plan Inference 127(1):53–69
Article MathSciNet Google Scholar
Krivobokova T, Kauermann G (2007) A note on penalized spline smoothing with correlated errors. J Am Stat Assoc 102(480):1328–1337
Article MathSciNet Google Scholar
Lu M, Li C (2017) Penalized estimation for proportional hazards models with current status data. Stat Med 36(30):4893–4907
Article MathSciNet Google Scholar
Lu M, Zhang Y, Huang J (2007) Estimation of the mean function with panel count data using monotone polynomial splines. Biometrika 94(3):705–718
Article MathSciNet Google Scholar
Lu M, Zhang Y, Huang J (2009) Semiparametric estimation methods for panel count data using monotone b-splines. J Am Stat Assoc 104(487):1060–1070
Article MathSciNet Google Scholar
Nielsen JD, Dean CB (2008) Adaptive functional mixed nhpp models for the analysis of recurrent event panel data. Comput Stat Data Anal 52(7):3670–3685
Article MathSciNet Google Scholar
O’Sullivan F (1986) A statistical perspective on ill-posed inverse problems. Stat Sci 1(4):502–518
O’Sullivan F (1988) Fast computation of fully automated log-density and log-hazard estimators. SIAM J Sci Stat Comput 9(2):363–379
Article MathSciNet Google Scholar
Ruppert D (2002) Selecting the number of knots for penalized splines. J Comput Graph Stat 11(4):735–757
Article MathSciNet Google Scholar
Schumaker LL (1981) Spline functions: basic theory. Wiley, New York
MATH Google Scholar
Sun J, Kalbfleisch J (1995) Estimation of the mean function of point processes based on panel count data. Stat Sin 5(1):279–289
Sun J, Wei LJ (2000) Regression analysis of panel count data with covariate-dependent observation and censoring times. J R Stat Soc Ser B Stat Methodol 62(2):293–302
Article MathSciNet Google Scholar
Trevor Hastie RT (1990) Generalized additive models. Chapman and Hall, New York
MATH Google Scholar
Tu W, Batteiger BE, Wiehe S, Ofner S, Van Der Pol B, Katz BP, Orr DP, Fortenberry JD (2009) Time from first intercourse to first sexually transmitted infection diagnosis among adolescent women. JAMA Pediatr 163(12):1106–1111
Google Scholar
Verweij PJM, Van Houwelingen HC (1993) Cross-validation in survival analysis. Stat Med 12(24):2305–2314
Article Google Scholar
Verweij PJM, Van Houwelingen HC (1994) Penalized likelihood in cox regression. Stat Med 13:2427–2436
Article Google Scholar
Wang Y, Yu Z (2019a) A kernel regression model for panel count data with time-varying coefficients. arXiv:1903.10233
Wang Y, Yu Z (2019b) A kernel regression model for panel count data with time-varying coefficients. arXiv:1903.10233
Wellner JA, Zhang Y (2000) Two estimators of the mean of a counting process with panel count data. Ann Stat 28(3):779–814
Article MathSciNet Google Scholar
Wellner JA, Zhang Y (2007) Two likelihood-based semiparametric estimation methods for panel count data with covariates. Ann Stat 35(5):2106–2142
Article MathSciNet Google Scholar
Yao W, Barbetuana FM, Llapur CJ, Jones MH, Tiller C, Kimmel R, Kisling J, Nguyen ET, Nguyen J, Yu Z et al (2010) Evaluation of airway reactivity and immune characteristics as risk factors for wheezing early in life. J Allergy Clin Immunol 126(3):483–488
Article Google Scholar
Yu Z, Liu L, Bravata DM, Williams LS, Tepper RS (2013) A semiparametric recurrent events model with time-varying coefficients. Stat Med 32(6):1016–1026
Article MathSciNet Google Scholar
Zhang Y (2002) A semiparametric pseudolikelihood estimation method for panel count data. Biometrika 89(1):39–48
Article MathSciNet Google Scholar
Zhao H, Tu W, Yu Z (2018) A nonparametric time-varying coefficient model for panel count data. J Nonparametr Stat 30(3):640–661
Article MathSciNet Google Scholar
Zhao H, Zhang Y, Zhao X, Yu Z (2019) A nonparametric regression model for panel count data analysis. Stat Sin 29(2):809–826
MathSciNet MATH Google Scholar

Download references

Acknowledgements

The research was supported in part by National Natural Science Foundation of China 11671256(Yu), by 2016YFC0902403(Yu) of Chinese Ministry of Science and Technology, by the University of Michigan and Shanghai Jiao Tong University Collaboration Grant (2017, Yu), and also by Neil Shen’s SJTU Medical Research Fund.

Author information

Authors and Affiliations

Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
Fei Qin
SJTU-Yale Joint Centre for Biostatistics, Shanghai Jiao Tong University, Shanghai, China
Zhangsheng Yu

Authors

Fei Qin
View author publications
You can also search for this author in PubMed Google Scholar
Zhangsheng Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhangsheng Yu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The derivation of the approximated CVL score

Observe that $ \mathrm{max}_{\gamma \in \varvec{\varTheta } } l^{\lambda }(\gamma ) $ can be reformulated as the optimization problem with inequality constraints:

$$\begin{aligned} \min (-l^{\lambda }(\gamma )) \qquad \mathrm{subject \ to \ } {\left\{ \begin{array}{ll} \ g_{1}(\gamma )=\eta _{1,0}-\eta _{2,0}\le 0 \\ \ g_{2}(\gamma )=\eta _{2,0}-\eta _{3,0}\le 0 \\ \quad \qquad \qquad \vdots \\ \ g_{q_{n}-1}(\gamma )=\eta _{q_{n}-1,0}-\eta _{q_{n},0}\le 0 \end{array}\right. } \end{aligned}$$

(8)

By the Karush-Kuhn-Tucher (KKT) condition, if $ b^{\lambda } $ is the solution to the optimization problem (8), then there exist $ u_{i}^{*}\ge 0, i=1,2,\ldots ,q_{n}-1 $, such that $ \nabla _{\gamma } l^{\lambda }(b^{\lambda })=\sum _{i=1}^{q_{n}-1} u_{i}^{*}\nabla _{\gamma } g_{i}(b^{\lambda }) $. Define $ f(\gamma ) = -l^{\lambda }(\gamma )+\sum _{i=1}^{q_{n}-1} u_{i}^{*} g_{i}(\gamma ) $ and $ {\hat{\gamma }} = \mathrm{argmin}f(\gamma ) $, by the concavity of $ l^{\lambda }(\gamma ) $ and the sufficient condition for KKT problems, we have

$$\begin{aligned} b^{\lambda }=\mathop {\mathrm{argmax}}\limits _{\gamma \in \varvec{\varTheta } } l^{\lambda }(\gamma ) \Leftrightarrow b^{\lambda }=\mathrm{argmin}f(\gamma ) = {\hat{\gamma }}. \end{aligned}$$

Similarily, there exist $ w_{k}^{*}\ge 0, k=1,2,\ldots ,q_{n}-1 $, such that $ \nabla _{\gamma } l^{\lambda }_{(-(i,j))}(b^{\lambda }_{(-(i,j))})=\sum _{k=1}^{q_{n}-1} w_{k}^{*}\nabla _{\gamma } g_{k}(b^{\lambda }_{(-(i,j))}) $, and define $ f_{(-(i,j))}(\gamma ) = -l^{\lambda }_{(-(i,j))}(\gamma )+\sum _{k=1}^{q_{n}-1} w_{k}^{*} g_{k}(\gamma ) $ and $ {\hat{\gamma }} _{(-(i,j))} = \mathrm{argmin}f_{(-(i,j))}(\gamma ) $, then we have

$$\begin{aligned} b^{\lambda }_{(-(i,j))}=\mathop {\mathrm{argmax}}\limits _{ \gamma \in \varvec{\varTheta } } l^{\lambda }_{(-(i,j))}(\gamma ) \Leftrightarrow b^{\lambda }_{(-(i,j))}=\mathrm{argmin}f_{(-(i,j))}(\gamma ) = {\hat{\gamma }}_{(-(i,j))}. \end{aligned}$$

A first-order Taylor approximation at $ {\hat{\gamma }} $ yields

$$\begin{aligned} \nabla _{\gamma } f_{(-(i,j))} \big ({\hat{\gamma }}_{(-(i,j))}\big ) \approx \nabla _{\gamma } f_{(-(i,j))} \big ({\hat{\gamma }} \big ) + \nabla ^{2}_{\gamma } f_{(-(i,j))} \big ({\hat{\gamma }} \big )\cdot \big ({\hat{\gamma }}_{(-(i,j))}-{\hat{\gamma }}\big ). \end{aligned}$$

Define $ f_{(i,j)}(\gamma ) = f(\gamma )-f_{(-(i,j))}(\gamma ) \approx l_{(-(i,j))}(\gamma )-l(\gamma )=-l_{(i,j)}(\gamma ) $, then we have

$$\begin{aligned} 0 \approx - \nabla _{\gamma } f_{(i,j)} ({\hat{\gamma }} ) +\big (\nabla ^{2}_{\gamma } f({\hat{\gamma }} )-\nabla ^{2}_{\gamma } f_{(i,j)} ({\hat{\gamma }} ) \big ) \cdot \big ({\hat{\gamma }}_{(-(i,j))}-{\hat{\gamma }}\big ). \end{aligned}$$

(9)

The computation of the second derivatives can consume a lot of computer time and/or storage. Omitting this term $ \nabla ^{2}_{\gamma } f_{(i,j)} ({\hat{\gamma }} ) $ in (9) leads to

$$\begin{aligned} {\hat{\gamma }}_{(-(i,j))}&\approx {\hat{\gamma }}+ \big ( \nabla ^{2}_{\gamma } f({\hat{\gamma }} )\big )^{-1}\cdot \nabla _{\gamma } f_{(i,j)} ({\hat{\gamma }} ) \\&\approx {\hat{\gamma }}+\big ( \nabla ^{2}_{\gamma } l^{\lambda }({\hat{\gamma }} ) \big )^{-1}\cdot \nabla _{\gamma } l_{(i,j)} ({\hat{\gamma }} ). \end{aligned}$$

Then we have

$$\begin{aligned} l_{(i,j)} \big (b^{\lambda }_{(-(i,j))}\big )&\approx l_{(i,j)}\big [ b^{\lambda }+\big ( \nabla ^{2}_{\gamma } l^{\lambda }(b^{\lambda } ) \big )^{-1}\cdot \nabla _{\gamma } l_{(i,j)} (b^{\lambda } ) \big ] \\&\approx l_{(i,j)}( b^{\lambda })+\bigl ( \nabla _{\gamma } l_{(i,j)} (b^{\lambda } ) \bigr )^{T} \cdot \bigl [ \big ( \nabla ^{2}_{\gamma } l^{\lambda }(b^{\lambda } ) \big )^{-1}\cdot \nabla _{\gamma } l_{(i,j)} (b^{\lambda } ) \bigr ] \\&= l_{(i,j)}( b^{\lambda })+\mathrm{tr}\bigl [ \bigl ( \nabla ^{2}_{\gamma } l^{\lambda }(b^{\lambda } ) \bigr )^{-1}\cdot \nabla _{\gamma } l_{(i,j)} (b^{\lambda } ) \cdot \bigl ( \nabla _{\gamma } l_{(i,j)} (b^{\lambda } )\bigr )^{T} \bigr ]. \end{aligned}$$

Therefore,

$$\begin{aligned} CVL(\lambda ) \approx l(b^{\lambda })+ \mathrm{tr}\bigg \{ \bigl (\nabla ^{2}_{\gamma } l^{\lambda }(b^{\lambda } ) \big )^{-1}\cdot \bigg [ \sum \limits _{i=1}^n\sum \limits _{j=1}^{ K_{i}} \Big ( \nabla _{\gamma } l_{(i,j)} (b^{\lambda } ) \cdot \bigl ( \nabla _{\gamma } l_{(i,j)} (b^{\lambda } ) \bigr )^{T} \Big ) \bigg ] \bigg \}. \end{aligned}$$

In view of the fact that

$$\begin{aligned} \begin{aligned}&E \bigg [ \mathrm{tr}\bigg \{ \bigl (\nabla ^{2}_{\gamma } l^{\lambda }(b^{\lambda } ) \big )^{-1}\cdot \bigg [ \sum \limits _{i=1}^n\sum \limits _{j=1}^{ K_{i}} \Big ( \nabla _{\gamma } l_{(i,j)} (b^{\lambda } ) \cdot \bigl ( \nabla _{\gamma } l_{(i,j)} (b^{\lambda } ) \bigr )^{T} \Big ) \bigg ] \bigg \} \bigg ] \\&\quad = \mathrm{tr}\Bigl [ \bigl (\nabla ^{2}_{\gamma } l^{\lambda }(b^{\lambda } ) \big ) ^{-1} \cdot \bigl (- \nabla ^{2}_{\gamma } l(b^{\lambda } ) \bigl )\Bigr ]. \end{aligned} \end{aligned}$$

Hence,

$$\begin{aligned} \mathrm{CVL}(\lambda ) \approx l(b^{\lambda })-\mathrm{tr}\Big [\big (\nabla ^{2}_\gamma l^{\lambda }(b^{\lambda })\big )^{-1}\cdot \nabla ^{2}_\gamma l(b^{\lambda })\Big ]. \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qin, F., Yu, Z. Penalized spline estimation for panel count data model with time-varying coefficients. Comput Stat 36, 2413–2434 (2021). https://doi.org/10.1007/s00180-021-01109-z

Download citation

Received: 28 July 2020
Accepted: 29 April 2021
Published: 12 June 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s00180-021-01109-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Penalized spline estimation for panel count data model with time-varying coefficients

Abstract

Access this article

Similar content being viewed by others

Combining the strengths of Dutch survey and register data in a data challenge to predict fertility (PreFer)

A Guide for Sparse PCA: Model Comparison and Applications

A Review on Global Sensitivity Analysis Methods

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

The derivation of the approximated CVL score

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Penalized spline estimation for panel count data model with time-varying coefficients

Abstract

Access this article

Similar content being viewed by others

Combining the strengths of Dutch survey and register data in a data challenge to predict fertility (PreFer)

A Guide for Sparse PCA: Model Comparison and Applications

A Review on Global Sensitivity Analysis Methods

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

The derivation of the approximated CVL score

The derivation of the approximated CVL score

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation