Skip to main content
Log in

Backward Stepwise Elimination: Approximation Guarantee, a Batched GPU Algorithm, and Empirical Investigation

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Best subset selection is NP-hard and expensive to solve exactly for problems with a large number of features. Practitioners often employ heuristics to quickly obtain approximate solutions without any accuracy guarantees. We investigate solving the best subset selection problem with backward stepwise elimination (BSE). We prove an approximation guarantee for BSE that bounds its performance by applying the concept of approximate supermodularity. This guarantee provides conditions that suggest the backward stepwise elimination algorithm will return a near-optimal solution, or when another technique should be used. To improve computational performance of the algorithm, we develop a graphics processing unit (GPU) parallel BSE that averages up to 5x faster than an efficient CPU implementation on a collection of over 1.8 million problems; larger problems resulted in the largest speedups. Finally, we demonstrate the benefit of BSE with empirical results, comparing against several state-of-the-art feature selection approaches. For certain classes of problems, BSE generates solutions with lower relative test error than the lasso, the relaxed lasso, and forward stepwise selection. BSE thus deserves a place in the data modeling toolset along with these other more popular methods. All codes and data used for computations in this paper can be obtained from https://github.com/bsauk/BackwardStepwiseElimination.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Availability of Data

All data used for computations in this paper can be obtained from https://github.com/bsauk/BackwardStepwiseElimination

Code Availability

All codes used for computations in this paper can be obtained from https://github.com/bsauk/BackwardStepwiseElimination. This repository includes all scripts necessary to reproduce the results of this article.

References

  1. Amaldi, E., Kann, V.: On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems. Theoretical Computer Science 209, 237–260 (1998).

    Article  MathSciNet  Google Scholar 

  2. Anderson, E., Bai, Z., Bischof, C., Blackford, L.S., Demmel, J., Dongarra, J., Croz, J.D., Hammarling, S., Greenbaum, A., McKenney, A., Sorensen, D.: LAPACK Users’ Guide (Third ed.). Society for Industrial and Applied Mathematics, Philadelphia, PA, USA (1999).

    Book  Google Scholar 

  3. Bendel, R., Afifi, A.: Comparison of stopping rules in forward “stepwise” regression. Journal of the American Statistical Association 72, 46–53 (1977).

    MATH  Google Scholar 

  4. Bertsimas, D., King, A., Mazumder, R.: Best subset selection via a modern optimization lens. The Annals of Statistics 44, 813–852 (2016).

    Article  MathSciNet  Google Scholar 

  5. Bertsimas, D., Pauphilet, J., Parys, B.V.: Sparse regression: Scalable algorithms and empirical performance. Statistical Science 35, 555–578 (2020).

    MathSciNet  MATH  Google Scholar 

  6. Björck, A., Park, H., Eldén, L.: Accurate downdating of least squares solutions. SIAM Journal Matrix Analysis and Applications 15, 549–568 (1994).

    Article  MathSciNet  Google Scholar 

  7. Couvreur, C., Bresler, Y.: On the optimality of the backward greedy algorithm for the subset selection problem. SIAM Journal on Matrix Analysis and Applications 21, 797–808 (2000).

    Article  MathSciNet  Google Scholar 

  8. Cozad, A., Sahinidis, N.V., Miller, D.C.: Automatic learning of algebraic models for optimization. AIChE Journal 60, 2211–2227 (2014).

    Article  Google Scholar 

  9. Das, A., Kempe, D.: Approximate submodularity and its applications: subset selection, sparse approximation and dictionary selection. The Journal of Machine Learning Research 19, 74–107 (2018).

    MathSciNet  MATH  Google Scholar 

  10. Dong T, Haidar A, Luszczek P, Tomov S, Abdelfattah A, Dongarra J. MAGMA batched: a batched BLAS approach for small matrix factorizations and applications on GPUs. Tech. rep., Technical Report (2016).

  11. Efroymson, M.: Multiple regression analysis. In A. Ralston and H.S. Wilf (eds.), Mathematical Methods for Digital Computers, Wiley, New York pp. 191–203 (1960).

    Google Scholar 

  12. Furnival, G., Wilson, R.: Regressions by leaps and bounds. Technometrics 16, 499–511 (1974).

    Article  Google Scholar 

  13. Gatu, C., Kontoghiorghes, E.J.: Branch-and-bound algorithms for computing the best-subset regression models. Journal of Computational and Graphical Statistics 15, 139–156 (2006).

    Article  MathSciNet  Google Scholar 

  14. Gatu, C., Kontoghiorghes, E.J.: A fast algorithm for non-negativity model selection. Statistics and Computing 23, 403–411 (2013).

    Article  MathSciNet  Google Scholar 

  15. Haidar, A., Dong, T., Luszczek, P., Tomov, S., Dongarra, J.: Batched matrix computations on hardware accelerators based on GPUs. The International Journal of High Performance Computing Applications 29, 193–208 (2015).

    Article  Google Scholar 

  16. Hastie, T., Tibshirani, R., Tibshirani, R.J.: Best subset, forward stepwise or lasso? Analysis and recommendations based on extensive comparisons. Statistical Science 35, 579–592 (2020).

    MathSciNet  MATH  Google Scholar 

  17. ICL. MAGMA. 2020. http://icl.cs.utk.edu/projectsfiles/magma/doxygen/. Accessed Feb 2020.

  18. Karaca O, Kamgarpour M. Exploiting weak supermodularity for coalition-proof mechanisms. In: Proceedings 2018 IEEE Conference on decision and control (CDC), IEEE, Miami Beach, FL, 2018; p. 1118–123.

  19. Kohavi, R., John, G.: Wrappers for feature subset selection. Artificial intelligence 97, 273–324 (1997).

    Article  Google Scholar 

  20. Liberty, E., Sviridenko, M.: Greedy minimization of weakly supermodular set functions. In: Jansen, K., Rolim, J., Williamson, D., Vempala, S. (eds.) Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2017). Leibniz International Proceedings in Informatics (LIPIcs), pp. 19:1–19:11 (2017).

  21. Mantel, N.: Why stepdown procedures in variable selection. Technometrics 12, 621–625 (1970).

    Article  Google Scholar 

  22. Meinshausen, N.: Relaxed lasso. Computational Statistics & Data Analysis 52, 374–393 (2007).

    Article  MathSciNet  Google Scholar 

  23. Miller, A.: Subset selection in regression. CRC Press, Boca Roton (2002).

    Book  Google Scholar 

  24. Nemhauser, G., Wolsey, L., Fisher, M.: An analysis of approximations for maximizing submodular set functions-I. Mathematical Programming 14, 265–294 (1978).

    Article  MathSciNet  Google Scholar 

  25. NVIDIA Corporation: cuBLAS. 2020. https://docs.nvidia.com/cuda/cublas/index.html. Accessed Feb 2020.

  26. Ratner, B.: Variable selection methods in regression: Ignorable problem, outing notable solution. Journal of Targeting, Measurement and Analysis for Marketing 18, 65–75 (2010).

    Article  Google Scholar 

  27. Sakaue S. Weak supermodularity assists submodularity-based approaches to non-convex constrained optimization. Arxiv preprint arXiv pp. 1–26 (2019).

  28. Sarwar, O., Sauk, B., Sahinidis, N.V.: A discussion on practical considerations with sparse regression methodologies. Statistical Science 35, 593–601 (2020).

    Article  Google Scholar 

  29. Sauerbrei, W., Holländer, N., Buchholz, A.: Investigation about a screening step in model selection. Statistics and Computing 18, 195–208 (2008).

    Article  MathSciNet  Google Scholar 

  30. Sauk, B., Ploskas, N., Sahinidis, N.V.: GPU paramter tuning for tall and skinny dense linear least squares problems. Optimization Methods and Software 35, 638–660 (2020).

    Article  MathSciNet  Google Scholar 

  31. Tibshirani, R.: Regression shrinkage and selection via the lasso. Royal Statistical Society 58, 267–288 (1996).

    MathSciNet  MATH  Google Scholar 

  32. Tibshirani, R.: Degrees of freedom and model search. Statistica Sinica 25, 1265–1296 (2015).

    MathSciNet  MATH  Google Scholar 

Download references

Funding

This work was conducted as part of the Institute for the Design of Advanced Energy Systems (IDAES) with funding from the Office of Fossil Energy, Cross-Cutting Research, U.S. Department of Energy. We also gratefully acknowledge the support of the NVIDIA Corporation with the donation of the NVIDIA Tesla K40 GPU used for this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nikolaos V. Sahinidis.

Ethics declarations

Conflict of Interest

None.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sauk, B., Sahinidis, N.V. Backward Stepwise Elimination: Approximation Guarantee, a Batched GPU Algorithm, and Empirical Investigation. SN COMPUT. SCI. 2, 396 (2021). https://doi.org/10.1007/s42979-021-00788-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-021-00788-1

Keywords

Navigation