Skip to main content
Log in

De-noising boosting methods for variable selection and estimation subject to error-prone variables

  • Original Paper
  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Boosting is one of the most powerful statistical learning methods that combines multiple weak learners into a strong learner. The main idea of boosting is to sequentially apply the algorithm to enhance its performance. Recently, boosting methods have been implemented to handle variable selection. However, little work has been available to deal with complex data such as measurement error in covariates. In this paper, we adopt the boosting method to do variable selection, especially in the presence of measurement error. We develop two different approximated correction approaches to deal with different types of responses, and meanwhile, eliminate measurement error effects. In addition, the proposed algorithms are easy to implement and are able to derive precise estimators. Throughout numerical studies under various settings, the proposed method outperforms other competitive approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Brown, B., Miller, C.J., Wolfson, J.: ThrEEBoost: thresholded boosting for variable selection and prediction via estimating equations. J. Comput. Graph. Stat. 26, 579–588 (2017)

    Article  MathSciNet  Google Scholar 

  • Brown, B., Weaver, T., Wolfson, J.: MEBoost: variable selection in the presence of measurement error. Stat. Med. 38, 2705–2718 (2019)

    Article  MathSciNet  Google Scholar 

  • Bühlmann, P., Hothorn, T.: Boosting algorithms: regularization, prediction and model fitting. Stat. Sci. 22, 477–505 (2007)

    MathSciNet  MATH  Google Scholar 

  • Candes, E., Tao, T.: The Dantzig selector: statistical estimation when p is much larger than n (with discussion). Ann. Stat. 35, 2313–2404 (2007)

    MATH  Google Scholar 

  • Carroll, R.J., Küchenhoff, H., Lombard, F., Stefanski, L.A.: Asymptotics for the SIMEX estimator in nonlinear measurement error models. J. Am. Stat. Assoc. 91, 242–250 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  • Carroll, R.J., Fan, J., Gijbels, I., Wand, M.P.: Generalized partially linear single-index models. J. Am. Stat. Assoc. 92, 477–489 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  • Carroll, R.J., Ruppert, D., Stefanski, L.A., Crainiceanu, C.M.: Measurement Error in Nonlinear Model. CRC Press, New York (2006)

    Book  MATH  Google Scholar 

  • Chen, L.-P.: Variable selection and estimation for the additive hazards model subject to left-truncation, right-censoring and measurement error in covariates. J. Stat. Comput. Simul. 90, 3261–3300 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  • Chen, L.-P.: Ultrahigh-dimensional sufficient dimension reduction with measurement error in covariates. Stat. Probab. Lett. 168, 108931 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  • Chen, L.-P., Yi, G.Y.: Model selection and model averaging for analysis of truncated and censored data with measurement error. Electron. J. Stat. 14, 4054–4109 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  • Chen, L.-P., Yi, G.Y.: Analysis of noisy survival data with graphical proportional hazards measurement error models. Biometrics 77, 956–969 (2021a)

    Article  MathSciNet  Google Scholar 

  • Chen, L.-P., Yi, G.Y.: Semiparametric methods for left-truncated and right-censored survival data with covariate measurement error. Ann. Inst. Stat. Math. 73, 451–481 (2021b)

  • Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32, 409–499 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  • Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  • Hall, P., Li, K.-C.: On almost linearity of low-dimensional projections from high-dimensional data. Ann. Stat. 21, 867–889 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  • Hastie, T.: Comment: Boosting algorithms: regularization, prediction and model fitting. Stat. Sci. 22, 513–515 (2007)

    Article  MATH  Google Scholar 

  • Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Data Mining, Inference, and Prediction. Springer, New York (2009)

    MATH  Google Scholar 

  • Küchenhoff, H., Carroll, R.J.: Segmented regression with errors in predictors: semi-parametric and parametric methods. Stat. Med. 16, 169–188 (1997)

    Article  Google Scholar 

  • Ma, Y., Li, R.: Variable selection in measurement error models. Bernoulli 16, 274–300 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Nghiem, L., Potgieter, C.: Simulation-selection-extrapolation: estimation in high-dimensional errors-in-variables models. Biometrics 75, 1133–1144 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  • Sørensen, Ø., Hellton, K.H., Frigessi, A., Thoresen, M.: Covariate selection in high-dimensional generalized linear models with measurement error. J. Comput. Graph. Stat. 27, 739–749 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  • Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  • Tutz, G., Binder, H.: Generalized additive modeling with implicit variable selection by likelihood-based boosting. Biometrics 62, 961–071 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, C.Y.: Flexible regression calibration for covariate measurement error with longitudinal surrogate variables. Stat. Sin. 10, 905–921 (2000)

    MathSciNet  MATH  Google Scholar 

  • Wolfson, J.: EEBoost: a general method for prediction and variable selection based on estimating equation. J. Am. Stat. Assoc. 106, 296–305 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Zou, H.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67, 301–320 (2005)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The author would like to thank the Editor, Associate Editor, and one referee for their useful comments to significantly improve the presentation of the initial manuscript. Chen’s research was supported by National Science and Technology Council with Grant ID 110-2118-M-004-006-MY2.

Author information

Authors and Affiliations

Authors

Contributions

Li-Pang Chen is the sole author for this manuscript.

Corresponding author

Correspondence to Li-Pang Chen.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, LP. De-noising boosting methods for variable selection and estimation subject to error-prone variables. Stat Comput 33, 38 (2023). https://doi.org/10.1007/s11222-023-10209-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11222-023-10209-3

Keywords

Navigation