Skip to main content
Log in

Choosing the best set of variables in regression analysis using integer programming

  • Published:
Journal of Global Optimization Aims and scope Submit manuscript

Abstract

This paper is concerned with an algorithm for selecting the best set of s variables out of k(> s) candidate variables in a multiple linear regression model. We employ absolute deviation as the measure of deviation and solve the resulting optimization problem by using 0-1 integer programming methodologies. In addition, we will propose a heuristic algorithm to obtain a close to optimal set of variables in terms of squared deviation. Computational results show that this method is practical and reliable for determining the best set of variables.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Akaike H.: A new look at the statistical model identification. IEEE Trans. Automat Control 19, 716–723 (1974)

    Article  Google Scholar 

  2. Bloomfield P., Steiger W.L.: Least Absolute Deviations: Theory, Applications, and Algorithms. Birkhäuser, Boston (1983)

    Google Scholar 

  3. Burnham K., Anderson D.: Model Selection and Multimodel Inference: A Practical Information Theoretic Approach, 2nd edn. Springer, Berlin (2002)

    Google Scholar 

  4. Chvatál V.: Linear Programming. Freeman and Co., New York (1983)

    Google Scholar 

  5. CPLEX10.1 User’s Manual, ILOG (2006)

  6. Furnival G.M., Wilson R.W. Jr: Regressions by leaps and bounds. Technometrics 16, 499–511 (1974)

    Article  Google Scholar 

  7. Galindo J., Tamayo P.: Credit risk assessment using statistical and machine learning: basic methodology and risk modeling applications. Comput. Econ. 15, 107–143 (2000)

    Article  Google Scholar 

  8. Konno H., Kawadai N., Wu D.: Estimation of failure probability using semi-definite logit model. Comput. Manage. Sci. 1, 59–73 (2003)

    Article  Google Scholar 

  9. Miller A.J.: Subset Selection in Regression. Chapman and Hall, London (1990)

    Google Scholar 

  10. Osborne M.R.: On the computation of stepwise regressions. Australia Comput. J. 8, 61–68 (1976)

    Google Scholar 

  11. Pardalos P., Boginski V.: Vazacopoulos A. Data Mining in Biomedicine. Springer, Berlin (2007)

    Google Scholar 

  12. S-PLUS 6 for Windows Guide to Statistics, vol. 1. Insightful Corporation (2001)

  13. Wolsey L.A.: Integer Programming. Wiley, New York (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rei Yamamoto.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Konno, H., Yamamoto, R. Choosing the best set of variables in regression analysis using integer programming. J Glob Optim 44, 273–282 (2009). https://doi.org/10.1007/s10898-008-9323-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10898-008-9323-9

Keywords

Navigation