Skip to main content
Log in

Inference for \(L_2\)-Boosting

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

We propose a statistical inference framework for the component-wise functional gradient descent algorithm (CFGD) under normality assumption for model errors, also known as \(L_2\)-Boosting. The CFGD is one of the most versatile tools to analyze data, because it scales well to high-dimensional data sets, allows for a very flexible definition of additive regression models and incorporates inbuilt variable selection. Due to the variable selection, we build on recent proposals for post-selection inference. However, the iterative nature of component-wise boosting, which can repeatedly select the same component to update, necessitates adaptations and extensions to existing approaches. We propose tests and confidence intervals for linear, grouped and penalized additive model components selected by \(L_2\)-Boosting. Our concepts also transfer to slow-learning algorithms more generally, and to other selection techniques which restrict the response space to more complex sets than polyhedra. We apply our framework to an additive model for sales prices of residential apartments and investigate the properties of our concepts in simulation studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Akaike, H.: A new look at the statistical model identification. IEEE Trans. Automat. Control 19(6), 716–723 (1974)

    Article  MathSciNet  Google Scholar 

  • Berk, R., Brown, L., Buja, A., Zhang, K., Zhao, L., et al.: Valid post-selection inference. Ann. Stat. 41(2), 802–837 (2013)

    Article  MathSciNet  Google Scholar 

  • Brockhaus, S., Scheipl, F., Hothorn, T., Greven, S.: The functional linear array model. Stat. Model. 15(3), 279–300 (2015)

    Article  MathSciNet  Google Scholar 

  • Brockhaus, S., Fuest, A., Mayr, A., Greven, S.: Signal regression models for location, scale and shape with an application to stock returns. J. R. Stat. Soc. Ser. C 67(3), 665–686 (2018)

    Article  MathSciNet  Google Scholar 

  • Bühlmann, P., Hothorn, T.: Boosting algorithms: Regularization, prediction and model fitting (with discussion). Stat. Sci. 22(4), 477–505 (2007)

    Article  Google Scholar 

  • Bühlmann, P., Yu, B.: Boosting with the \({L}_2\) loss: regression and classification. J. Am. Stat. Assoc. 98(462), 324–339 (2003)

    Article  Google Scholar 

  • Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., et al.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)

    Article  MathSciNet  Google Scholar 

  • Eilers, P.H.C., Marx, B.D.: Flexible smoothing with B-splines and penalties. Stat. Sci. 11(2), 89–121 (1996)

    Article  MathSciNet  Google Scholar 

  • Fithian, W., Sun, D., Taylor, J.: Optimal Inference After Model Selection. arXiv e-prints arXiv:1410.2597 (2014)

  • Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001)

    Article  MathSciNet  Google Scholar 

  • Hofner, B., Hothorn, T., Kneib, T., Schmid, M.: A framework for unbiased model selection based on boosting. J. Comput. Graph. Stat. 20(4), 956–971 (2011)

    Article  MathSciNet  Google Scholar 

  • Hothorn, T., Bühlmann, P., Kneib, T., Schmid, M., Hofner, B.: Model-based boosting 2.0. J. Mach. Learn. Res 11, 2109–2113 (2010)

    MathSciNet  MATH  Google Scholar 

  • Kivaranovic, D., Leeb, H.: Expected length of post-model-selection confidence intervals conditional on polyhedral constraints, vol. 1803, 01665. ArXiv e-prints (2018)

  • Lee, J.D., Sun, D.L., Sun, Y., Taylor, J.E.: Exact post-selection inference, with application to the lasso. Ann. Stat. 44(3), 907–927 (2016). https://doi.org/10.1214/15-AOS1371

    Article  MathSciNet  MATH  Google Scholar 

  • Loftus, J.R., Taylor, J.E.: A significance test for forward stepwise model selection. arXiv e-prints arXiv:1405.3920 (2014)

  • Loftus, J.R., Taylor, J.E.: Selective inference in regression models with groups of variables. arXiv e-prints arXiv:1511.01478 (2015)

  • Martino, L., Elvira, V., Louzada, F.: Effective sample size for importance sampling based on discrepancy measures. Signal Process. 131, 386–401 (2017)

    Article  Google Scholar 

  • Mayr, A., Hofner, B., Waldmann, E., Hepp, T., Meyer, S., Gefeller, O.: An update on statistical boosting in biomedicine. Comput. Math. Methods Med. 2017, 12 (2017a)

    Article  Google Scholar 

  • Mayr, A., Schmid, M., Pfahlberg, A., Uter, W., Gefeller, O.: A permutation test to analyse systematic bias and random measurement errors of medical devices via boosting location and scale models. Stat. Methods Med. Res. 26(3), 1443–1460 (2017b)

    Article  MathSciNet  Google Scholar 

  • Melcher, M., Scharl, T., Luchner, M., Striedner, G., Leisch, F.: Boosted structured additive regression for escherichia coli fed-batch fermentation modeling. Biotechnol. Bioeng. 114(2), 321–334 (2017). https://doi.org/10.1002/bit.26073

    Article  Google Scholar 

  • Rafiei, M.H., Adeli, H.: A novel machine learning model for estimation of sale prices of real estate units. J. Constr. Eng. Manage. 142(2), 04015066 (2015)

    Article  Google Scholar 

  • Rügamer, D., Greven, S.: Selective inference after likelihood- or test-based model selection in linear models. Stat. Probab. Lett. 140, 7–12 (2018)

    Article  MathSciNet  Google Scholar 

  • Rügamer, D., Brockhaus, S., Gentsch, K., Scherer, K., Greven, S.: Boosting factor-specific functional historical models for the detection of synchronization in bioelectrical signals. J. R. Stat. Soc. Ser. C 67(3), 621–642 (2018)

    Article  MathSciNet  Google Scholar 

  • Shah, R.D., Samworth, R.J.: Variable selection with error control: another look at stability selection. J. R. Stat. Soc. Ser. B 75(1), 55–80 (2013)

    Article  MathSciNet  Google Scholar 

  • Tian, X., Taylor, J.: Asymptotics of selective inference. Scand. J. Stat. 44(2), 480–499 (2017)

    MathSciNet  MATH  Google Scholar 

  • Tian Harris, X., Panigrahi, S., Markovic, J., Bi, N., Taylor, J.: Selective sampling after solving a convex problem, vol. 1609, pp. 05609, ArXiv e-prints (2016)

  • Tibshirani, R.J., Taylor, J., Lockhart, R., Tibshirani, R.: Exact post-selection inference for sequential regression procedures. J. Am. Stat. Assoc. 111(514), 600–620 (2016)

    Article  MathSciNet  Google Scholar 

  • Tibshirani, R.J., Rinaldo, A., Tibshirani, R., Wasserman, L.: Uniform asymptotic inference and the bootstrap after model selection. Ann. Stat. 46(3), 1255–1287 (2018)

    Article  MathSciNet  Google Scholar 

  • Wasserman, L., Roeder, K.: High dimensional variable selection. Ann. Stat. 37(5A), 2178–2201 (2009)

    Article  MathSciNet  Google Scholar 

  • Yang, F., Barber, R.F., Jain, P., Lafferty, J.: Selective inference for group-sparse linear models. In: Advances in Neural Information Processing Systems, pp. 2469–2477 (2016)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Rügamer.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 388 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rügamer, D., Greven, S. Inference for \(L_2\)-Boosting. Stat Comput 30, 279–289 (2020). https://doi.org/10.1007/s11222-019-09882-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-019-09882-0

Keywords

Navigation