Abstract
This paper is concerned with an algorithm for selecting the best set of s variables out of k(> s) candidate variables in a multiple linear regression model. We employ absolute deviation as the measure of deviation and solve the resulting optimization problem by using 0-1 integer programming methodologies. In addition, we will propose a heuristic algorithm to obtain a close to optimal set of variables in terms of squared deviation. Computational results show that this method is practical and reliable for determining the best set of variables.
Similar content being viewed by others
References
Akaike H.: A new look at the statistical model identification. IEEE Trans. Automat Control 19, 716–723 (1974)
Bloomfield P., Steiger W.L.: Least Absolute Deviations: Theory, Applications, and Algorithms. Birkhäuser, Boston (1983)
Burnham K., Anderson D.: Model Selection and Multimodel Inference: A Practical Information Theoretic Approach, 2nd edn. Springer, Berlin (2002)
Chvatál V.: Linear Programming. Freeman and Co., New York (1983)
CPLEX10.1 User’s Manual, ILOG (2006)
Furnival G.M., Wilson R.W. Jr: Regressions by leaps and bounds. Technometrics 16, 499–511 (1974)
Galindo J., Tamayo P.: Credit risk assessment using statistical and machine learning: basic methodology and risk modeling applications. Comput. Econ. 15, 107–143 (2000)
Konno H., Kawadai N., Wu D.: Estimation of failure probability using semi-definite logit model. Comput. Manage. Sci. 1, 59–73 (2003)
Miller A.J.: Subset Selection in Regression. Chapman and Hall, London (1990)
Osborne M.R.: On the computation of stepwise regressions. Australia Comput. J. 8, 61–68 (1976)
Pardalos P., Boginski V.: Vazacopoulos A. Data Mining in Biomedicine. Springer, Berlin (2007)
S-PLUS 6 for Windows Guide to Statistics, vol. 1. Insightful Corporation (2001)
Wolsey L.A.: Integer Programming. Wiley, New York (1998)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Konno, H., Yamamoto, R. Choosing the best set of variables in regression analysis using integer programming. J Glob Optim 44, 273–282 (2009). https://doi.org/10.1007/s10898-008-9323-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10898-008-9323-9