Abstract
We consider a panel count data model with both time-varying and time-invariant coefficients. We estimate the baseline function and the time-varying coefficients using penalized splines based on the pseudolikelihood method. We evaluate the performance of three efficient Newton–Rapshon-based algorithms and another adaptive barrier algorithm. We propose a novel cross-validated score to select the smoothing parameters and deduce an easy-to-compute approximation to the score. Extensive simulations are conducted to compare the four algorithms, to compare the proposed penalized spline estimation with regression spline estimation and kernel estimation, and to assess the inference performance and robustness of the penalized spline estimation. Finally, we illustrate our method by using a data set from a childhood wheezing study.
Similar content being viewed by others
References
Byar D (1980) The veterans administration study of chemoprophylaxis for recurrent stage i bladder tumors: comparisons of placebo, pyridoxine, and topical thiotepa. In: Bladder tumors and other topics in urological oncology. pp 363–370
Cheng G, Zhang Y, Lu L (2011) Efficient algorithms for computing the non and semi-parametric maximum likelihood estimates with panel count data. J Nonparametr Stat 23(2):567–579
Efraim AHNB, Levischaffer F (2008) Review: tissue remodeling and angiogenesis in asthma: the role of the eosinophil. Ther Adv Respir Dis 2(3):163–171
Eilers PHC, Marx BD (1996) Flexible smoothing with b-splines and penalties. Stat Sci 11(2):89–121
Green P, Silverman B (1994) Nonparametric regression and generalized linear models: a roughness penalty approach. J Chapman & Hall, London
Hua L, Zhang Y, Tu W (2014) A spline-based semiparametric sieve likelihood method for over-dispersed panel count data. Canad J Stat Revue Canadienne De Statistique 42(2):217–245
Jongbloed G (1998) The iterative convex minorant algorithm for nonparametric estimation. J Comput Graph Stat 7(3):310–321
Kauermann G (2005) A note on smoothing parameter selection for penalized spline smoothing. J Stat Plan Inference 127(1):53–69
Krivobokova T, Kauermann G (2007) A note on penalized spline smoothing with correlated errors. J Am Stat Assoc 102(480):1328–1337
Lu M, Li C (2017) Penalized estimation for proportional hazards models with current status data. Stat Med 36(30):4893–4907
Lu M, Zhang Y, Huang J (2007) Estimation of the mean function with panel count data using monotone polynomial splines. Biometrika 94(3):705–718
Lu M, Zhang Y, Huang J (2009) Semiparametric estimation methods for panel count data using monotone b-splines. J Am Stat Assoc 104(487):1060–1070
Nielsen JD, Dean CB (2008) Adaptive functional mixed nhpp models for the analysis of recurrent event panel data. Comput Stat Data Anal 52(7):3670–3685
O’Sullivan F (1986) A statistical perspective on ill-posed inverse problems. Stat Sci 1(4):502–518
O’Sullivan F (1988) Fast computation of fully automated log-density and log-hazard estimators. SIAM J Sci Stat Comput 9(2):363–379
Ruppert D (2002) Selecting the number of knots for penalized splines. J Comput Graph Stat 11(4):735–757
Schumaker LL (1981) Spline functions: basic theory. Wiley, New York
Sun J, Kalbfleisch J (1995) Estimation of the mean function of point processes based on panel count data. Stat Sin 5(1):279–289
Sun J, Wei LJ (2000) Regression analysis of panel count data with covariate-dependent observation and censoring times. J R Stat Soc Ser B Stat Methodol 62(2):293–302
Trevor Hastie RT (1990) Generalized additive models. Chapman and Hall, New York
Tu W, Batteiger BE, Wiehe S, Ofner S, Van Der Pol B, Katz BP, Orr DP, Fortenberry JD (2009) Time from first intercourse to first sexually transmitted infection diagnosis among adolescent women. JAMA Pediatr 163(12):1106–1111
Verweij PJM, Van Houwelingen HC (1993) Cross-validation in survival analysis. Stat Med 12(24):2305–2314
Verweij PJM, Van Houwelingen HC (1994) Penalized likelihood in cox regression. Stat Med 13:2427–2436
Wang Y, Yu Z (2019a) A kernel regression model for panel count data with time-varying coefficients. arXiv:1903.10233
Wang Y, Yu Z (2019b) A kernel regression model for panel count data with time-varying coefficients. arXiv:1903.10233
Wellner JA, Zhang Y (2000) Two estimators of the mean of a counting process with panel count data. Ann Stat 28(3):779–814
Wellner JA, Zhang Y (2007) Two likelihood-based semiparametric estimation methods for panel count data with covariates. Ann Stat 35(5):2106–2142
Yao W, Barbetuana FM, Llapur CJ, Jones MH, Tiller C, Kimmel R, Kisling J, Nguyen ET, Nguyen J, Yu Z et al (2010) Evaluation of airway reactivity and immune characteristics as risk factors for wheezing early in life. J Allergy Clin Immunol 126(3):483–488
Yu Z, Liu L, Bravata DM, Williams LS, Tepper RS (2013) A semiparametric recurrent events model with time-varying coefficients. Stat Med 32(6):1016–1026
Zhang Y (2002) A semiparametric pseudolikelihood estimation method for panel count data. Biometrika 89(1):39–48
Zhao H, Tu W, Yu Z (2018) A nonparametric time-varying coefficient model for panel count data. J Nonparametr Stat 30(3):640–661
Zhao H, Zhang Y, Zhao X, Yu Z (2019) A nonparametric regression model for panel count data analysis. Stat Sin 29(2):809–826
Acknowledgements
The research was supported in part by National Natural Science Foundation of China 11671256(Yu), by 2016YFC0902403(Yu) of Chinese Ministry of Science and Technology, by the University of Michigan and Shanghai Jiao Tong University Collaboration Grant (2017, Yu), and also by Neil Shen’s SJTU Medical Research Fund.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The derivation of the approximated CVL score
The derivation of the approximated CVL score
Observe that \( \mathrm{max}_{\gamma \in \varvec{\varTheta } } l^{\lambda }(\gamma ) \) can be reformulated as the optimization problem with inequality constraints:
By the Karush-Kuhn-Tucher (KKT) condition, if \( b^{\lambda } \) is the solution to the optimization problem (8), then there exist \( u_{i}^{*}\ge 0, i=1,2,\ldots ,q_{n}-1 \), such that \( \nabla _{\gamma } l^{\lambda }(b^{\lambda })=\sum _{i=1}^{q_{n}-1} u_{i}^{*}\nabla _{\gamma } g_{i}(b^{\lambda }) \). Define \( f(\gamma ) = -l^{\lambda }(\gamma )+\sum _{i=1}^{q_{n}-1} u_{i}^{*} g_{i}(\gamma ) \) and \( {\hat{\gamma }} = \mathrm{argmin}f(\gamma ) \), by the concavity of \( l^{\lambda }(\gamma ) \) and the sufficient condition for KKT problems, we have
Similarily, there exist \( w_{k}^{*}\ge 0, k=1,2,\ldots ,q_{n}-1 \), such that \( \nabla _{\gamma } l^{\lambda }_{(-(i,j))}(b^{\lambda }_{(-(i,j))})=\sum _{k=1}^{q_{n}-1} w_{k}^{*}\nabla _{\gamma } g_{k}(b^{\lambda }_{(-(i,j))}) \), and define \( f_{(-(i,j))}(\gamma ) = -l^{\lambda }_{(-(i,j))}(\gamma )+\sum _{k=1}^{q_{n}-1} w_{k}^{*} g_{k}(\gamma ) \) and \( {\hat{\gamma }} _{(-(i,j))} = \mathrm{argmin}f_{(-(i,j))}(\gamma ) \), then we have
A first-order Taylor approximation at \( {\hat{\gamma }} \) yields
Define \( f_{(i,j)}(\gamma ) = f(\gamma )-f_{(-(i,j))}(\gamma ) \approx l_{(-(i,j))}(\gamma )-l(\gamma )=-l_{(i,j)}(\gamma ) \), then we have
The computation of the second derivatives can consume a lot of computer time and/or storage. Omitting this term \( \nabla ^{2}_{\gamma } f_{(i,j)} ({\hat{\gamma }} ) \) in (9) leads to
Then we have
Therefore,
In view of the fact that
Hence,
Rights and permissions
About this article
Cite this article
Qin, F., Yu, Z. Penalized spline estimation for panel count data model with time-varying coefficients. Comput Stat 36, 2413–2434 (2021). https://doi.org/10.1007/s00180-021-01109-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-021-01109-z