Skip to main content

High–Dimensional Sparse Matched Case–Control and Case–Crossover Data: A Review of Recent Works, Description of an R Tool and an Illustration of the Use in Epidemiological Studies

  • Conference paper
  • First Online:
Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB 2013)

Abstract

The conditional logistic regression model is the standard tool for the analysis of epidemiological studies in which one or more cases (the event of interest), are matched with one or more controls (not showing the event). These situations arise, for example, in matched case–control and case–crossover studies. In sparse and high-dimensional settings, penalized methods, such as the Lasso, have emerged as an alternative to conventional estimation and variable selection procedures. We describe the R package clogitLasso, which brings together algorithms to estimate parameters of conditional logistic models using sparsity-inducing penalties. Most individually matched designs are covered, and, beside Lasso, Elastic Net, adaptive Lasso and bootstrapped versions are available. Different criteria for choosing the regularization term are implemented, accounting for the dependency of data. Finally, stability is assessed by resampling methods. We previously review the recent works pertaining to clogitLasso. We also report the use in exploratory analysis of a large pharmacoepidemiological study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bland, J.M., Altman, D.G.: Matching. BMJ 309, 1128 (1994)

    Article  Google Scholar 

  2. Kupper, L.L., Karon, J.M., Kleinbaum, D.G., Morgenstern, H., Lewis, D.K.: Matching in epidemiologic studies: validity and efficiency considerations. Biometrics 37, 271–291 (1981)

    Article  MATH  MathSciNet  Google Scholar 

  3. Karon, J.M., Kupper, L.L.: In defense of matching. Am. J. Epidemiol. 116, 852–866 (1982)

    Google Scholar 

  4. Constanza, M.C.: Matching. Preventive Med. 24, 425–433 (1995)

    Article  Google Scholar 

  5. Rothman, K., Greenland, S.: Modern Epidemiology, 2nd edn. Lippincott, Williams and Wilkins, Philadelphia (1998)

    Google Scholar 

  6. Stürmer, T., Brenner, H.: Flexible matching strategies to increase power and efficiency to detect and estimate gene-environment interactions in case-control studies. Am. J. Epidemiol. 155, 593–602 (2002)

    Article  Google Scholar 

  7. Vandenbroucke, J.P., von Elm, E., Altman, D.G., Gotzsche, P.C., Mulrow, C.D., Pocock, S.J., Poole, C., Schlesselman, J.J., Egger, M.: Strengthening the reporting of observational studies in epidemiology (strobe): explanation and elaboration. PLoS Med. 4, 1628–1654 (2007)

    Article  Google Scholar 

  8. Hansson, L., Khamis, H.: Matched samples logistic regression in case-control studies with missing values: when to break the matches. Stat. Methods Med. Res. 17, 595–607 (2008)

    Article  MathSciNet  Google Scholar 

  9. Rose, S., Van der Laan, M.J.: Why match? investigating matched case-control study designs with causal effect estimation. Int. J. Biostat. 5, Art. 1 (2009). doi: 10.2202/1557-4679.1127

  10. Stuart, E.: Matching methods for causal inference: a review and a look forward. Stat. Sci. 25, 1–21 (2010)

    Article  MathSciNet  Google Scholar 

  11. Maclure, M.: The case-crossover design: a method for studying transient effects on the risk of acute event. Am. J. Epidemiol. 133, 144–153 (1991)

    Google Scholar 

  12. Delaney, J., Suissa, S.: The case-crossover study design in pharmacoepidemiology. Stat. Methods Med. Res. 18, 53–65 (2009)

    Article  MathSciNet  Google Scholar 

  13. Mittleman, M., Maclure, M., Robins, J.: Control sampling strategies for case-crossover studies: an assessment of relative efficiency. Am. J. Epidemiol. 142, 91–98 (1995)

    Google Scholar 

  14. Janes, H., Sheppard, L., Lumley, T.: Overlap bias in the case-crossover design, with application to air pollution exposures. Stat. Med. 24, 285–300 (2005)

    Article  MathSciNet  Google Scholar 

  15. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser. B 58, 267–288 (1996)

    MATH  MathSciNet  Google Scholar 

  16. Avalos, M.: Model selection via the lasso in conditional logistic regression. In: Proceedings of the Second International Biometric Society Channel Network Conference, Ghent, Belgium, 6–8 April 2009

    Google Scholar 

  17. Avalos, M., Grandvalet, Y., Duran-Adroher, N., Orriols, L., Lagarde, E.: Analysis of multiple exposures in the case-crossover design via sparse conditional likelihood. Stat. Med. 31, 2290–2302 (2012)

    Article  MathSciNet  Google Scholar 

  18. Breslow, N.E., Day, N.E.: Statistical Methods in Cancer Research. The analysis of case-control studies, vol. 1. IARC Scientific Publications, Lyon (1980)

    Google Scholar 

  19. Knight, K., Fu, W.: Asymptotics for lasso-type estimators. Ann. Stat. 28, 1356–1378 (2000)

    Google Scholar 

  20. Zhao, P., Yu, B.: On model selection consistency of lasso. J. Mach. Learn. Res. 7, 2541–2563 (2006)

    MATH  MathSciNet  Google Scholar 

  21. Candes, E.J., Plan, Y.: Near-ideal model selection by L1 minimization. Technical report, Caltech, USA (2007)

    Google Scholar 

  22. Bach, F.: Bolasso: model consistent lasso estimation through the bootstrap. In: McCallum, A., Roweis, S.T. (eds.) Proceedings of the 25th International Conference on Machine Learning (ICML 2008), Helsinki, Finland, 5–9 July 2008

    Google Scholar 

  23. Zhang, T.: Some sharp performance bounds for least squares regression with L1 regularization. Ann. Stat. 37, 2109–2114 (2009)

    Article  MATH  Google Scholar 

  24. Wainwright, M.J.: Sharp thresholds for noisy and high-dimensional recovery of sparsity using L1-constrained quadratic programming (lasso). IEEE Trans. Inf. Theory 55, 2183 (2009)

    Article  MathSciNet  Google Scholar 

  25. Juditsky, A., Nemirovski, A.: On verifiable sufficient conditions for sparse signal recovery via L1 minimization. Math. Program. 127, 57–88 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  26. Van de Geer, S.: High-dimensional generalized linear models and the lasso. Ann. Stat. 36, 614–645 (2008)

    Article  MATH  Google Scholar 

  27. Huang, J., Ma, S., Zhang, C.: The iterated lasso for high-dimensional logistic regression. Technical report, The University of Iowa, USA, No. 392 (2008)

    Google Scholar 

  28. Bunea, F., Barbu, A.: Dimension reduction and variable selection in case control studies via regularized likelihood optimization. Electron. J. Stat. 3, 1257–1287 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  29. Huang, J., Zhang, C.: Estimation and selection via absolute penalized convex minimization and its multistage adaptive applications. J. Mach. Learn. Res. 13, 1839–1864 (2012)

    Google Scholar 

  30. Meinshausen, N., Yu, B.: Lasso-type recovery of sparse representations for high-dimensional data. Ann. Stat. 37, 246–270 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  31. Bach, F.: Self-concordant analysis for logistic regression. Electron. J. Stat. 4, 384–414 (2010)

    Google Scholar 

  32. Zou, H.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006)

    Article  MATH  Google Scholar 

  33. Hall, P., Lee, E., Park, B.: Bootstrap-based penalty choice for the lasso, achieving oracle performance. Stat. Sinica 19, 449–471 (2009)

    MATH  MathSciNet  Google Scholar 

  34. She, Y.: Thresholding-based iterative selection procedures for model selection and shrinkage. Electron. J. Stat. 3, 384–415 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  35. Meinshausen, N., Bühlmann, P.: Stability selection. J. Roy. Stat. Soc. Ser. B 72, 417–473 (2010)

    Article  Google Scholar 

  36. Chatterjee, A., Lahiri, S.N.: Bootstrapping lasso estimators. J. Am. Stat. Assoc. 106, 608–625 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  37. Wang, S., Nan, B., Rosset, S., Zhu, J.: Random lasso. Ann. Appl. Stat. 5(1), 468–485 (2011)

    Google Scholar 

  38. Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat. Soc. Ser. B 67, 301–320 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  39. Shi, W., Lee, K., Wahba, G.: Detecting disease-causing genes by lasso-patternsearch algorithm. BMC Proc. 1(Suppl 1), S60 (2007)

    Article  MathSciNet  Google Scholar 

  40. Van der Laan, M., Dudoit, S., Keles, S.: Asymptotic optimality of likelihood-based cross-validation. Stat. Appl.Genet. Mol. Biol. 3, Art. 4. (2004). doi: 10.2202/1544-6115.1036

  41. Van Houwelingen, H.C., Bruinsma, T., Hart, A.A.M., van’t Veer, L.J., Wessels, L.F.A.: Cross-validated Cox regression on microarray gene expression data. Stat. Med. 25, 3201–3216 (2006)

    Google Scholar 

  42. Arlot, S., Celisse, C.: A survey of cross-validation procedures for model selection. Stat. Surv. 4, 40–79 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  43. Ye, J.: On measuring and correcting the effects of data mining and model selection. J. Am. Stat. Assoc. 93, 120–131 (1998)

    Article  MATH  Google Scholar 

  44. Zou, H., Hastie, T., Tibshirani, R.: On the degrees of freedom of the lasso. Ann. Stat. 35, 2173–2192 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  45. Tibshirani, R.J., Taylor, J.: Degrees of freedom in lasso problems. Ann. Stat. 40, 1198–1232 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  46. Yang, Y.: Can the strengths of AIC and BIC be shared? a conflict between model identification and regression estimation. Biometrika 92, 937–950 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  47. Yang, Y.: Comparing learning methods for classification. Stat. Sinica 16, 635–657 (2006)

    MATH  Google Scholar 

  48. Leng, C., Lin, Y., Wahba, G.: A note on the lasso and related procedures in model selection. Stat. Sinica 16, 1273–1284 (2006)

    MATH  MathSciNet  Google Scholar 

  49. Yang, Y.: Consistency of cross validation for comparing regression procedures. Ann. Stat. 35, 2450–2473 (2007)

    Article  MATH  Google Scholar 

  50. Liao, H., Lynn, H.S., Li, S., Hsu, L., Peng, J., Wang, P.: Bootstrap inference for network construction with an application to a breast cancer microarray study. Ann. Appl. Stat. 7, 391–417 (2013)

    Article  MathSciNet  Google Scholar 

  51. Waldron, L., Pintilie, M., Tsao, M.S., Shepherd, F., Huttenhower, C., Jurisica, I.: Optimized application of penalized regression methods to diverse genomic data. Bioinformatics 27, 3399–3406 (2011)

    Article  Google Scholar 

  52. Lê Cao, K.A., Boitard, S., Besse, P.: Sparse pls discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems. BMC Bioinform. 12, 253 (2011)

    Article  Google Scholar 

  53. Bunea, F., She, Y., Ombao, H., Gongvatana, A., Devlin, K., Cohen, R.: Penalized least squares regression methods and applications to neuroimaging. Neuroimage 55, 1519–1527 (2011)

    Article  Google Scholar 

  54. Rohart, F., Villa-Vialaneix, N., Paris, A., Laurent, B., SanCristobal, M.: Phenotypic prediction based on metabolomic data: lasso vs bolasso, primary data vs wavelet data. In: Proceedings of the 9th World Congress on Genetics Applied to Livestock Production (WCGALP), Leipzig, Germany (2010)

    Google Scholar 

  55. Avalos, M., Orriols, L., Pouyes, H., Grandvalet, Y., Thiessard, F., Lagarde, E.: Variable selection on large case-crossover data: application to a registry-based study of prescription drugs and road-traffic crashes. Pharmacoepidemiol. Drug Saf. 23, 140–151 (2013). (Epub ahead of print)

    Article  Google Scholar 

  56. Greenland, S.: Invited commentary: variable selection versus shrinkage in the control of multiple confounders. Am. J. Epidemiol. 167, 523–529 (2008)

    Article  Google Scholar 

  57. Walter, S., Tiemeier, H.: Variable selection: current practice in epidemiological studies. Eur. J. Epidemiol. 24, 733–736 (2009)

    Article  Google Scholar 

  58. Hurvich, C.M., Tsai, C.L.: The impact of model selection on inference in linear regression. Am. Stat. 44, 214–217 (1990)

    Google Scholar 

  59. Breiman, L.: Heuristics of instability and stabilization in model selection. Ann. Stat. 24, 2350–2383 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  60. Austin, P.C.: Using the bootstrap to improve estimation and confidence intervals for regression coefficients selected using backwards variable elimination. Stat. Med. 27, 3286–3300 (2008)

    Article  MathSciNet  Google Scholar 

  61. Wiegand, R.E.: Performance of using multiple stepwise algorithms for variable selection. Stat. Med. 29, 1647–59 (2010)

    MathSciNet  Google Scholar 

  62. Tibshirani, R.: The lasso method for variable selection in the Cox model. Stat. Med. 16, 385–95 (1997)

    Article  Google Scholar 

  63. Osborne, M.R., Presnell, B., Turlach, B.A.: On the lasso and its dual. J. Comput. Graph. Stat. 9, 319–337 (2000)

    MathSciNet  Google Scholar 

  64. Chatterjee, A., Lahiri, S.N.: Asymptotic properties of the residual bootstrap for lasso estimators. Proc. Am. Math. Soc. 138, 4497–4509 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  65. Park, M., Hastie, T.: \(l_{1}\)-regularization path algorithm for generalized linear models. J. Roy. Stat. Soc. Ser. B 69, 659–677 (2007)

    Article  MathSciNet  Google Scholar 

  66. D’Angelo, G.M., Rao, D.C., Gu, C.C.: Combining least absolute shrinkage and selection operator (lasso) and principal-components analysis for detection of gene-gene interactions in genome-wide association studies. BMC Proc. 3, S62 (2009)

    Article  Google Scholar 

  67. Pötscher, B.: Confidence sets based on sparse estimators are necessarily large. Sankhya 71, 1–18 (2009)

    Google Scholar 

  68. Pötscher, B., Schneider, U.: Confidence sets based on penalized maximum likelihood estimators in Gaussian regression. Electron. J. Stat. 4, 334–360 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  69. Farchione, D., Kabaila, P.: Variable-width confidence intervals in gaussian regression and penalized maximum likelihood estimators. Technical report, Department of Mathematics and Statistics, La Trobe University, Australia (2010)

    Google Scholar 

  70. Sperrin, M., Jaki, T.: Direct effects testing: a two-stage procedure to test for effect size and variable importance for correlated binary predictors and a binary response. Stat. Med. 29, 2544–2556 (2010)

    Article  MathSciNet  Google Scholar 

  71. Goeman, J.: \(l_{1}\) penalized estimation in the Cox proportional hazards model. Biometrical J. 52, 70–84 (2010)

    MATH  MathSciNet  Google Scholar 

  72. Sartori, S.: Penalized regression: Bootstrap confidence intervals and variable selection for high-dimensional data sets. PhD thesis, Raleigh, NC (2011)

    Google Scholar 

  73. Avalos, M., Duran-Adroher, N., Thiessard, F., Grandvalet, Y., Orriols, L., Lagarde, E.: Prescription-drug-related risk in driving comparing conventional and lasso shrinkage logistic regressions. Epidemiology 23, 706–12 (2012)

    Article  Google Scholar 

  74. Park, M., Casella, G.: The bayesian lasso. J. Am. Stat. Assoc. 103, 681–686 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  75. Hans, C.: Model uncertainty and variable selection in bayesian lasso regression. Stat. Comput. 20, 221–229 (2010)

    Article  MathSciNet  Google Scholar 

  76. Sardy, S.: On the practice of rescaling covariates. Int. Stat. Rev. 76, 285–297 (2008)

    Article  Google Scholar 

  77. Belloni, A., Chernozhukov, V.: Least squares after model selection in high-dimensional sparse models. Bernoulli 19, 521–547 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  78. Bien, J., Taylor, J., Tibshirani, R.: A lasso for hierarchical interactions. Ann. Stat. 41, 1111–1141 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  79. Gertheiss, J., Tutz, G.: Sparse modeling of categorial explanatory variables. Ann. Appl. Stat. 4, 2150–2180 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  80. Avalos, M., Pouyes, H., Grandvalet, Y., Orriols, L., Lagarde, E.: Sparse conditional logistic regression for analyzing large-scale matched data from epidemiological studies: A simple implementation in r. Technical report, Bordeaux School of Public Health, University Bordeaux Segalen (2013) (Submitted)

    Google Scholar 

  81. Jörnsten, R., Abenius, T., Kling, T., Schmidt, L., Johansson, E., Nordling, T., Nordlander, B., Sander, C., Gennemark, P., Funa, K., Nilsson, B., Lindahl, L., Nelander, S.: Network modeling of the transcriptional effects of copy number aberrations in glioblastoma. Mol. Syst. Biol. 7, Art. 486 (2011). doi: 10.1038/msb.2011.17

  82. Bunea, F.: Honest variable selection in linear and logistic regression models via \(l_{1}\) and \(l_{1}+l_{2}\) penalization. Electron. J. Stat. 2, 1153–1194 (2008)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marta Avalos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Avalos, M., Grandvalet, Y., Pouyes, H., Orriols, L., Lagarde, E. (2014). High–Dimensional Sparse Matched Case–Control and Case–Crossover Data: A Review of Recent Works, Description of an R Tool and an Illustration of the Use in Epidemiological Studies. In: Formenti, E., Tagliaferri, R., Wit, E. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2013. Lecture Notes in Computer Science(), vol 8452. Springer, Cham. https://doi.org/10.1007/978-3-319-09042-9_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09042-9_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09041-2

  • Online ISBN: 978-3-319-09042-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics