Abstract
The conditional logistic regression model is the standard tool for the analysis of epidemiological studies in which one or more cases (the event of interest), are matched with one or more controls (not showing the event). These situations arise, for example, in matched case–control and case–crossover studies. In sparse and high-dimensional settings, penalized methods, such as the Lasso, have emerged as an alternative to conventional estimation and variable selection procedures. We describe the R package clogitLasso, which brings together algorithms to estimate parameters of conditional logistic models using sparsity-inducing penalties. Most individually matched designs are covered, and, beside Lasso, Elastic Net, adaptive Lasso and bootstrapped versions are available. Different criteria for choosing the regularization term are implemented, accounting for the dependency of data. Finally, stability is assessed by resampling methods. We previously review the recent works pertaining to clogitLasso. We also report the use in exploratory analysis of a large pharmacoepidemiological study.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bland, J.M., Altman, D.G.: Matching. BMJ 309, 1128 (1994)
Kupper, L.L., Karon, J.M., Kleinbaum, D.G., Morgenstern, H., Lewis, D.K.: Matching in epidemiologic studies: validity and efficiency considerations. Biometrics 37, 271–291 (1981)
Karon, J.M., Kupper, L.L.: In defense of matching. Am. J. Epidemiol. 116, 852–866 (1982)
Constanza, M.C.: Matching. Preventive Med. 24, 425–433 (1995)
Rothman, K., Greenland, S.: Modern Epidemiology, 2nd edn. Lippincott, Williams and Wilkins, Philadelphia (1998)
Stürmer, T., Brenner, H.: Flexible matching strategies to increase power and efficiency to detect and estimate gene-environment interactions in case-control studies. Am. J. Epidemiol. 155, 593–602 (2002)
Vandenbroucke, J.P., von Elm, E., Altman, D.G., Gotzsche, P.C., Mulrow, C.D., Pocock, S.J., Poole, C., Schlesselman, J.J., Egger, M.: Strengthening the reporting of observational studies in epidemiology (strobe): explanation and elaboration. PLoS Med. 4, 1628–1654 (2007)
Hansson, L., Khamis, H.: Matched samples logistic regression in case-control studies with missing values: when to break the matches. Stat. Methods Med. Res. 17, 595–607 (2008)
Rose, S., Van der Laan, M.J.: Why match? investigating matched case-control study designs with causal effect estimation. Int. J. Biostat. 5, Art. 1 (2009). doi: 10.2202/1557-4679.1127
Stuart, E.: Matching methods for causal inference: a review and a look forward. Stat. Sci. 25, 1–21 (2010)
Maclure, M.: The case-crossover design: a method for studying transient effects on the risk of acute event. Am. J. Epidemiol. 133, 144–153 (1991)
Delaney, J., Suissa, S.: The case-crossover study design in pharmacoepidemiology. Stat. Methods Med. Res. 18, 53–65 (2009)
Mittleman, M., Maclure, M., Robins, J.: Control sampling strategies for case-crossover studies: an assessment of relative efficiency. Am. J. Epidemiol. 142, 91–98 (1995)
Janes, H., Sheppard, L., Lumley, T.: Overlap bias in the case-crossover design, with application to air pollution exposures. Stat. Med. 24, 285–300 (2005)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser. B 58, 267–288 (1996)
Avalos, M.: Model selection via the lasso in conditional logistic regression. In: Proceedings of the Second International Biometric Society Channel Network Conference, Ghent, Belgium, 6–8 April 2009
Avalos, M., Grandvalet, Y., Duran-Adroher, N., Orriols, L., Lagarde, E.: Analysis of multiple exposures in the case-crossover design via sparse conditional likelihood. Stat. Med. 31, 2290–2302 (2012)
Breslow, N.E., Day, N.E.: Statistical Methods in Cancer Research. The analysis of case-control studies, vol. 1. IARC Scientific Publications, Lyon (1980)
Knight, K., Fu, W.: Asymptotics for lasso-type estimators. Ann. Stat. 28, 1356–1378 (2000)
Zhao, P., Yu, B.: On model selection consistency of lasso. J. Mach. Learn. Res. 7, 2541–2563 (2006)
Candes, E.J., Plan, Y.: Near-ideal model selection by L1 minimization. Technical report, Caltech, USA (2007)
Bach, F.: Bolasso: model consistent lasso estimation through the bootstrap. In: McCallum, A., Roweis, S.T. (eds.) Proceedings of the 25th International Conference on Machine Learning (ICML 2008), Helsinki, Finland, 5–9 July 2008
Zhang, T.: Some sharp performance bounds for least squares regression with L1 regularization. Ann. Stat. 37, 2109–2114 (2009)
Wainwright, M.J.: Sharp thresholds for noisy and high-dimensional recovery of sparsity using L1-constrained quadratic programming (lasso). IEEE Trans. Inf. Theory 55, 2183 (2009)
Juditsky, A., Nemirovski, A.: On verifiable sufficient conditions for sparse signal recovery via L1 minimization. Math. Program. 127, 57–88 (2011)
Van de Geer, S.: High-dimensional generalized linear models and the lasso. Ann. Stat. 36, 614–645 (2008)
Huang, J., Ma, S., Zhang, C.: The iterated lasso for high-dimensional logistic regression. Technical report, The University of Iowa, USA, No. 392 (2008)
Bunea, F., Barbu, A.: Dimension reduction and variable selection in case control studies via regularized likelihood optimization. Electron. J. Stat. 3, 1257–1287 (2009)
Huang, J., Zhang, C.: Estimation and selection via absolute penalized convex minimization and its multistage adaptive applications. J. Mach. Learn. Res. 13, 1839–1864 (2012)
Meinshausen, N., Yu, B.: Lasso-type recovery of sparse representations for high-dimensional data. Ann. Stat. 37, 246–270 (2009)
Bach, F.: Self-concordant analysis for logistic regression. Electron. J. Stat. 4, 384–414 (2010)
Zou, H.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006)
Hall, P., Lee, E., Park, B.: Bootstrap-based penalty choice for the lasso, achieving oracle performance. Stat. Sinica 19, 449–471 (2009)
She, Y.: Thresholding-based iterative selection procedures for model selection and shrinkage. Electron. J. Stat. 3, 384–415 (2009)
Meinshausen, N., Bühlmann, P.: Stability selection. J. Roy. Stat. Soc. Ser. B 72, 417–473 (2010)
Chatterjee, A., Lahiri, S.N.: Bootstrapping lasso estimators. J. Am. Stat. Assoc. 106, 608–625 (2011)
Wang, S., Nan, B., Rosset, S., Zhu, J.: Random lasso. Ann. Appl. Stat. 5(1), 468–485 (2011)
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat. Soc. Ser. B 67, 301–320 (2005)
Shi, W., Lee, K., Wahba, G.: Detecting disease-causing genes by lasso-patternsearch algorithm. BMC Proc. 1(Suppl 1), S60 (2007)
Van der Laan, M., Dudoit, S., Keles, S.: Asymptotic optimality of likelihood-based cross-validation. Stat. Appl.Genet. Mol. Biol. 3, Art. 4. (2004). doi: 10.2202/1544-6115.1036
Van Houwelingen, H.C., Bruinsma, T., Hart, A.A.M., van’t Veer, L.J., Wessels, L.F.A.: Cross-validated Cox regression on microarray gene expression data. Stat. Med. 25, 3201–3216 (2006)
Arlot, S., Celisse, C.: A survey of cross-validation procedures for model selection. Stat. Surv. 4, 40–79 (2010)
Ye, J.: On measuring and correcting the effects of data mining and model selection. J. Am. Stat. Assoc. 93, 120–131 (1998)
Zou, H., Hastie, T., Tibshirani, R.: On the degrees of freedom of the lasso. Ann. Stat. 35, 2173–2192 (2007)
Tibshirani, R.J., Taylor, J.: Degrees of freedom in lasso problems. Ann. Stat. 40, 1198–1232 (2012)
Yang, Y.: Can the strengths of AIC and BIC be shared? a conflict between model identification and regression estimation. Biometrika 92, 937–950 (2005)
Yang, Y.: Comparing learning methods for classification. Stat. Sinica 16, 635–657 (2006)
Leng, C., Lin, Y., Wahba, G.: A note on the lasso and related procedures in model selection. Stat. Sinica 16, 1273–1284 (2006)
Yang, Y.: Consistency of cross validation for comparing regression procedures. Ann. Stat. 35, 2450–2473 (2007)
Liao, H., Lynn, H.S., Li, S., Hsu, L., Peng, J., Wang, P.: Bootstrap inference for network construction with an application to a breast cancer microarray study. Ann. Appl. Stat. 7, 391–417 (2013)
Waldron, L., Pintilie, M., Tsao, M.S., Shepherd, F., Huttenhower, C., Jurisica, I.: Optimized application of penalized regression methods to diverse genomic data. Bioinformatics 27, 3399–3406 (2011)
Lê Cao, K.A., Boitard, S., Besse, P.: Sparse pls discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems. BMC Bioinform. 12, 253 (2011)
Bunea, F., She, Y., Ombao, H., Gongvatana, A., Devlin, K., Cohen, R.: Penalized least squares regression methods and applications to neuroimaging. Neuroimage 55, 1519–1527 (2011)
Rohart, F., Villa-Vialaneix, N., Paris, A., Laurent, B., SanCristobal, M.: Phenotypic prediction based on metabolomic data: lasso vs bolasso, primary data vs wavelet data. In: Proceedings of the 9th World Congress on Genetics Applied to Livestock Production (WCGALP), Leipzig, Germany (2010)
Avalos, M., Orriols, L., Pouyes, H., Grandvalet, Y., Thiessard, F., Lagarde, E.: Variable selection on large case-crossover data: application to a registry-based study of prescription drugs and road-traffic crashes. Pharmacoepidemiol. Drug Saf. 23, 140–151 (2013). (Epub ahead of print)
Greenland, S.: Invited commentary: variable selection versus shrinkage in the control of multiple confounders. Am. J. Epidemiol. 167, 523–529 (2008)
Walter, S., Tiemeier, H.: Variable selection: current practice in epidemiological studies. Eur. J. Epidemiol. 24, 733–736 (2009)
Hurvich, C.M., Tsai, C.L.: The impact of model selection on inference in linear regression. Am. Stat. 44, 214–217 (1990)
Breiman, L.: Heuristics of instability and stabilization in model selection. Ann. Stat. 24, 2350–2383 (1996)
Austin, P.C.: Using the bootstrap to improve estimation and confidence intervals for regression coefficients selected using backwards variable elimination. Stat. Med. 27, 3286–3300 (2008)
Wiegand, R.E.: Performance of using multiple stepwise algorithms for variable selection. Stat. Med. 29, 1647–59 (2010)
Tibshirani, R.: The lasso method for variable selection in the Cox model. Stat. Med. 16, 385–95 (1997)
Osborne, M.R., Presnell, B., Turlach, B.A.: On the lasso and its dual. J. Comput. Graph. Stat. 9, 319–337 (2000)
Chatterjee, A., Lahiri, S.N.: Asymptotic properties of the residual bootstrap for lasso estimators. Proc. Am. Math. Soc. 138, 4497–4509 (2010)
Park, M., Hastie, T.: \(l_{1}\)-regularization path algorithm for generalized linear models. J. Roy. Stat. Soc. Ser. B 69, 659–677 (2007)
D’Angelo, G.M., Rao, D.C., Gu, C.C.: Combining least absolute shrinkage and selection operator (lasso) and principal-components analysis for detection of gene-gene interactions in genome-wide association studies. BMC Proc. 3, S62 (2009)
Pötscher, B.: Confidence sets based on sparse estimators are necessarily large. Sankhya 71, 1–18 (2009)
Pötscher, B., Schneider, U.: Confidence sets based on penalized maximum likelihood estimators in Gaussian regression. Electron. J. Stat. 4, 334–360 (2010)
Farchione, D., Kabaila, P.: Variable-width confidence intervals in gaussian regression and penalized maximum likelihood estimators. Technical report, Department of Mathematics and Statistics, La Trobe University, Australia (2010)
Sperrin, M., Jaki, T.: Direct effects testing: a two-stage procedure to test for effect size and variable importance for correlated binary predictors and a binary response. Stat. Med. 29, 2544–2556 (2010)
Goeman, J.: \(l_{1}\) penalized estimation in the Cox proportional hazards model. Biometrical J. 52, 70–84 (2010)
Sartori, S.: Penalized regression: Bootstrap confidence intervals and variable selection for high-dimensional data sets. PhD thesis, Raleigh, NC (2011)
Avalos, M., Duran-Adroher, N., Thiessard, F., Grandvalet, Y., Orriols, L., Lagarde, E.: Prescription-drug-related risk in driving comparing conventional and lasso shrinkage logistic regressions. Epidemiology 23, 706–12 (2012)
Park, M., Casella, G.: The bayesian lasso. J. Am. Stat. Assoc. 103, 681–686 (2008)
Hans, C.: Model uncertainty and variable selection in bayesian lasso regression. Stat. Comput. 20, 221–229 (2010)
Sardy, S.: On the practice of rescaling covariates. Int. Stat. Rev. 76, 285–297 (2008)
Belloni, A., Chernozhukov, V.: Least squares after model selection in high-dimensional sparse models. Bernoulli 19, 521–547 (2013)
Bien, J., Taylor, J., Tibshirani, R.: A lasso for hierarchical interactions. Ann. Stat. 41, 1111–1141 (2013)
Gertheiss, J., Tutz, G.: Sparse modeling of categorial explanatory variables. Ann. Appl. Stat. 4, 2150–2180 (2010)
Avalos, M., Pouyes, H., Grandvalet, Y., Orriols, L., Lagarde, E.: Sparse conditional logistic regression for analyzing large-scale matched data from epidemiological studies: A simple implementation in r. Technical report, Bordeaux School of Public Health, University Bordeaux Segalen (2013) (Submitted)
Jörnsten, R., Abenius, T., Kling, T., Schmidt, L., Johansson, E., Nordling, T., Nordlander, B., Sander, C., Gennemark, P., Funa, K., Nilsson, B., Lindahl, L., Nelander, S.: Network modeling of the transcriptional effects of copy number aberrations in glioblastoma. Mol. Syst. Biol. 7, Art. 486 (2011). doi: 10.1038/msb.2011.17
Bunea, F.: Honest variable selection in linear and logistic regression models via \(l_{1}\) and \(l_{1}+l_{2}\) penalization. Electron. J. Stat. 2, 1153–1194 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Avalos, M., Grandvalet, Y., Pouyes, H., Orriols, L., Lagarde, E. (2014). High–Dimensional Sparse Matched Case–Control and Case–Crossover Data: A Review of Recent Works, Description of an R Tool and an Illustration of the Use in Epidemiological Studies. In: Formenti, E., Tagliaferri, R., Wit, E. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2013. Lecture Notes in Computer Science(), vol 8452. Springer, Cham. https://doi.org/10.1007/978-3-319-09042-9_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-09042-9_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09041-2
Online ISBN: 978-3-319-09042-9
eBook Packages: Computer ScienceComputer Science (R0)