Abstract
With respect to variable selection for linear regression models, a novel bagging ensemble method is developed in this paper based on a ranked list of variables. Specifically, a mixed importance measure is assigned to each variable according to the order that it is selected by stepwise search algorithm into the final model as well as the improvement resulted from its inclusion. Considering that small permutations in training data may lead to some changes in the order that the variables enter the final model, the above process is repeated for multiple times with each executed on a bootstrap sample. Finally, the importance measure of each variable is averaged across the bootstrapping trials. The experiments conducted with some simulated data demonstrate that the novel method compares favorably with some other variable selection techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bühlmann, P., Mandozzi, J.: High-dimensional variable screening and bias in subsequent inference, with an empirical comparison. Comput. Stat. 29(3–4), 407–430 (2014)
Liu, C., Shi, T., Lee, Y.: Two tales of variable selection for high dimensional regression: screening and model building. Stat. Anal. Data Min. 7(2), 140–159 (2014)
Fan, J.Q., Lv, J.C.: A selective overview of variable selection in high dimensional feature space. Stat. Sinica 20(1), 101–148 (2010)
Wasserman, L., Roeder, K.: High-dimensional variable selection. Ann. Stat. 37(5A), 2178–2201 (2009)
Shmueli, G.: To explain or to predict? Stat. Sci. 25(3), 289–310 (2010)
Xin, L., Zhu, M.: Stochastic stepwise ensembles for variable selection. J. Comput. Graph. Stat. 21(2), 275–294 (2012)
Breiman, L.: Heuristics of instability and stabilization in model selection. Ann. Stat. 24(6), 2350–2383 (1996)
Miller, A.: Subset Selection in Regression (Second Edition). Chapman & Hall/CRC Press, New Work (2002)
Fan, J.Q., Li, R.Z.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001)
Meinshausen, N., Bühlmann, P.: Stability selection (with discussion). J. Royal Stat. Soc. (Ser. B) 72(4), 417–473 (2010)
Shah, R.D., Samworth, R.J.: Variable selection with error control: another look at stability selection. J. Royal Stat. Soc. (Ser. B) 75(1), 55–80 (2013)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Royal Stat. Soc. (Ser. B) 58(1), 267–288 (1996)
Zou, H.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101(476), 1418–1429 (2006)
Zhu, M., Chipman, H.A.: Darwinian evolution in parallel universes: a parallel genetic algorithm for variable selection. Technometrics 48(4), 491–502 (2006)
Wang, S.J., Nan, B., Rosset, S., Zhu, J.: Random lasso. Ann. Appl. Stat. 5(1), 468–485 (2011)
Zhou, Z.H.: Ensemble Methods: Foundations and Algorithms. Taylor & Francis, Boca Raton (2012)
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
Freund, Y., Schapire, R.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Sys. Sci. 55(1), 119–139 (1997)
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001)
Mendes-Moreira, J., Soares, C., Jorge, A.M., de Sousa, J.F.: Ensemble approaches for regression: a survey. ACM Comput. Surv. 45(1), 40 (2012). Article 10
Zhang, C.X., Wang, G.W., Liu, J.M.: RandGA: injecting randomness into parallel genetic algorithm for variable selection. J. Appl. Stat. 42(3), 630–647 (2015)
Zhu, M., Fan, G.Z.: Variable selection by ensembles for the Cox model. J. Stat. Comput. Simul. 81(12), 1983–1992 (2011)
Acknowledgements
This research was supported by the National Basic Research Program of China (973 Program, No. 2013CB329406), the National Natural Science Foundations of China (No. 11201367, 91230101), the Science Plan Foundation of the Education Bureau of Shaanxi Province of China (No. 14JK1672).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Zhang, CX., Zhang, JS., Wang, GW. (2015). A Novel Bagging Ensemble Approach for Variable Ranking and Selection for Linear Regression Models. In: Schwenker, F., Roli, F., Kittler, J. (eds) Multiple Classifier Systems. MCS 2015. Lecture Notes in Computer Science(), vol 9132. Springer, Cham. https://doi.org/10.1007/978-3-319-20248-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-20248-8_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20247-1
Online ISBN: 978-3-319-20248-8
eBook Packages: Computer ScienceComputer Science (R0)