Abstract
Sparse regression ensemble (SRE) is to sparsely combine the outputs of multiple learners using a sparse weight vector. This paper deals with SRE based on the \(\ell _2\)–\(\ell _1\) problem and applies it to time series prediction problems. The \(\ell _2\)–\(\ell _1\) problem consists of \(\ell _2\)-norm and \(\ell _1\)-norm regularization terms, where the former denotes the total ensemble empirical risk, and the latter represents the ensemble complexity. Thus, the goal is both to minimize the total ensemble training error and control the ensemble complexity. Experiments on real-world data for regression and time series prediction are given.
Similar content being viewed by others
References
Asuncion A, Newman DJ (2007) UCI machine learning repository. From http://www.ics.uci.edu/mlearn/MLRepository.html
Baraniuk R, Davenport M, DeVore R, Wakin M (2008) A simple proof of the restricted isometry property for random matrices. Constr Approx 28(3):253–263
Barreto A, Araujo AA, Kremer S (2003) A taxonomy for spatiotemporal connectionist networks revisited: the unsupervised case. Neural Comput 15:1255–1320
Barron A, Cohen A, Dahmen W, DeVore R (2008) Approximation and learning by greedy algorithm. Ann Stat 36(1):64–94
Benediktsson JA, Sveinsson JR, Ersoy OK, Swain PH (1997) Parallel consensual neural networks. IEEE Trans Neural Netw 8(1):54–64
Bontempi G, Birattari M, Bersini H (1999) Local learning for iterated time-series prediction. In: Bratko I, Dzeroski S (eds) Proceedings of the sixteenth international conference on machine learning, Morgan Kaufmann Publishers, San Francisco, pp 32–38
Bouchachia A, Bouchachia S (2008) Ensemble learning for time series prediction. In: First international workshop on nonlinear dynamics and synchronization
Brazdil P, Giraud-Carrier C, Soares C, Vilalta R (2009) Metalearning Springer, Berlin Heidelberg
Breiman L (1996) Bagging predictors. Mach Learn 24:123–140
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Brown G, Wyatt JL, Tin̆o P (2005) Managing diversity in regression ensembles. J Mach Learn Res 6:1621–1650
Candès E, Tao T (2005) Decoding by linear programming. IEEE Trans Inf Theory 51(12):4203–4215
Candès E, Wakin M (2008) An introduction to compressed sampling. IEEE Signal Process Mag 25(2):21–30
Candès E, Romberg J, Tao T (2006) Stable signal recovery from incomplete and inaccurate measurements. Commun Pure Appl Math 59(8):1207–1223
Cao L (2003) Support vector machines experts for time series forecasting. Neurocomputing 51:321–339
Cernuda C, Lughofer E, Hintenaus P, Marzinger W, Reischer T, Pawlicek M, Kasberger J (2013) Hybrid adaptive calibration methods and ensemble strategy for prediction of cloud point in melamine resin production. Chemom Intell Lab Syst 126:60–75
Chang FJ, Chiang YM, Chang LC (2007) Multi-step-ahead neural networks for flood forecasting. Hydrolog Sci J 52(1):114–130
Chen H, Tino P, Yao X (2009) Predictive ensemble pruning by expectation propagation. IEEE Trans Knowl Data Eng 21(7):999–1013
Cheng CH, Cheng GW, Wang JW (2008) Multi-attribute fuzzy time series method based on fuzzy clustering. Expert Syst Appl 34:1235–1242
Cohen A, Dahmen W, DeVore R (2009) Compressed sensing and best k-term approximation. J Am Math Soc 22:211–231
Cowper MR, Mulgrew B, Unsworth CP (2002) Nonlinear prediction of chaotic signals using a normalized radial basis function network. Signal Process 82(5):775–789
DeVore RA (2007) Deterministic constructions of compressed sensing matrices. J Complexity 23(4–6):918–925
Donoho D (2006) Compressed sensing. IEEE Trans Inf Theory 52:1289–1306
Duda R, Hart P, Stork D (2000) Pattern classification, 2nd edn. Wiley, New Jersey
Figueiredo M, Nowak R (2003) An em algorithm for wavelet-based image restoration. IEEE Trans Image Process 12:906–916
Figueiredo M, Nowak R (2005) A bound optimization approach to wavelet-based image deconvolution. In: IEEE international conference on image processing—ICIP’2005, Genoa, Italy
Figueiredo MAT, Nowak RD, Wright SJ (2007) Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems. IEEE J Sel Topics Signal Process Special Issue Convex Optim Methods Signal Process 1(4):586–598
Floyd S, Warmuth M (1995) Sample compression, learnability, and the vapnik-chervonenkis dimension. Mach learn 21(3):269–304
Freund Y, Shapire R (1996) Experiments with a new boosting algorithm. Proceedings of the thirteenth international conference on machine learning. Morgan Kaufmann, Bary, pp 148–156
Gheyas IA, Smith LS (2011) A novel neural network ensemble architecture for time series forecasting. Neurocomputing 74(18):3855–3864
Girard A, Rasmussen CE, nonero Candela JQ, Murray-Smith R, (2002) Gaussian process priors with uncertain inputs—application to multiple-step ahead time series forecasting. Advances in neural information processing systems, vol 15. Vancouver, pp 529–536
Graepel T, Herbrich R, Shawe-Taylor J (2000) Generalisation error bounds for sparse linear classifiers. In: Proceedings of the thirteenth annual conference on computational learning theory, pp 298–303
Grassberger P, Procaccia I (1983) Estimation of the Kolmogorov entropy from a chaotic signal. Phys Rev A 28(4):2591–2593
Hale ET, Yin W, Zhang Y (2008) Fixed-point continuation for \(\ell _1\)-minimization: methodology and convergence. SIAM J Optim 19:1107–1130
He W, Wang Z, Jiang H (2008) Model optimizing and feature selecting for support vector regression in time series forecasting. Neurocomputing 72:600–611
Hernández-Lobato D, Noz GMM, Suárez A (2011) Empirical analysis and evaluation of approximate techniques for pruning regression bagging ensembles. Neurocomputing 74(12—-13):2250–2264
Holcapek MI, Novák V, Perfilieva I (2013) Noise reduction in time series using F-transform. In: 2013 IEEE international conference on fuzzy systems, pp 1–8
Kim KJ (2003) Financial time series forecasting using support vector machines. Neurocomputing 55:307–319
Kim SJ, Koh K, Lustig M, Boyd S, Gorinevsky D (2007) An interior-point method for large-scale \(\ell _1\)-regularized least squares. IEEE J Sel Topics Signal Process 1(4):606–617
Kuo IH, Horng SJ, Kao TW, Lin TL, Lee CL, Pan Y (2009) An improved method for forecasting enrollments based on fuzzy time series and particle swarm optimization. Expert Syst Appl 36:6108–6117
Kuncheva LI (2004) Combining pattern classifiers: methods and algorithms. Wiley Inc, Hoboken
Lasota T, Telec Z, Trawiński B, Trawiński K (2009) A multi-agent system to assist with real estate appraisals using bagging ensembles. In: Nguyen N, Kowalczyk R, Chen S-M (eds) Computational collective intelligence. Semantic web, social networks and multiagent systems, vol. 5796. Springer, Heidelberg, pp 813–824
Liebert W, Schuster HG (1989) Proper choice of the time delay for the analysis of chaotic time series. Phys Rev A 142:107–111
Liu Y, Yao X (1999) Ensemble learning via negative correlation. Neural Netw 12(10):1399–1404
Martínez-Muñoz G, Hernández-Lobato D, Suárez A (2009) An analysis of ensemble pruning techniques based on ordered aggregation. IEEE Trans Pattern Anal Mach Intell 31(2):245–259
Di Martino F, Loia V, Sessa S (2011) Fuzzy transforms method in prediction data analysis. Fuzzy Sets Syst 180(1):146–163
Minku FL, White A, Yao X (2010) The impact of diversity on on-line ensemble learning in the presence of concept drift. IEEE Trans Knowl Data Eng 22:730–742
Müller KR, Smola AJ, Ratsch G, Schölkopf B, Kohlmorgen J, Vapnik VN (1997) Predicting time series with support vector machines. In: Proceedings of 7th international conference artificial neural networks, Lausanne, vol 1327, pp 999–1004
Parlos AG, Rais OT, Atiya AF (2000) Multi-step-ahead prediction using dynamic recurrent neural networks. Neural Netw 13(7):765– 786
Rätsch G, Demiriz A, Bennett KP (2002) Sparse regression ensembles in infinite and finite hypothesis spaces. Mach Learn 48:189– 218
Pears R, Widiputra H, Kasabov N (2013) Evolving integrated multi-model framework for on line multiple time series prediction. Evol Syst 4:99–117
Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Syst Mag 6:21–45
Roli F, Kittler J, Windeatt T (eds) (2004) Multiple classifier systems. Lecture notes in computer Science, vol 3077. Springer-Verlag, Berlin, Heidelberg
Sing SR (2007) A simple time variant method for fuzzy time series forecasting. Cybern Syst Int J 38:305–321
Shi Z, Han M (2007) Support vector echo-state machine for chaotic time-series prediction. IEEE Trans Neural Netw 18(2):359–372
Sorjamaa A, Hao J, Reyhani N, Ji Y, Lendasse A (2007) Methodology for long-term prediction of time series. Neurocomputing 70:2861–2869
Taieb SB, Sorjamaa A, Bontempi G (2010) Multiple-output modeling for multi-step-ahead time series forecasting. Neurocomputing 73(10–12):1950–1957
Tresp V, Taniguchi M (1995) Combining estimators using non-constant weighting functions. Adv Neural Inf Process Syst 7:419–426
Ueda N, Nakano R (1996) Generalization error of ensemble estimators. In: Proceedings of international conference on neural networks, p 90–95
Wan EA (1994) Time series prediction by using a connectionist network with internal delay lines. In: Proceedings of NATO advanced research workshop comparative time series analysis, Addison-Wesley, Reading, pp 195–217
Weigend AS, Gershenfeld NA (1994) Time series prediction: forecasting the future and understanding the past. Addison-Wesley, Reading. http://www-psych.stanford.edu/andreas/Time-Series/SantaFe.html#SantaFeTop
Wen Z, Yin W, Goldfarb D, Zhang Y (2010) A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization and continuation. SIAM J Sci Comput 32(4):1832–1857
Widiputra H, Pears R, Kasabov N (2012) Dynamic learning of multiple time series in a nonstationary environment. In: Sayed-Mouchaweh M, Lughofer E (eds) Learning in non-stationary environments: methods and applications, Springer, New York, p 303–348
Windeatt T, Roli F (eds) (2003) Multiple classifier systems. Lecture notes in computer science, vol 2709. Springer-Verlag, Berlin, Heidelberg
Wright S, Nowak R, Figueiredo M (2009) Sparse reconstruction by separable approximation. IEEE Trans Signal Process 57(7):2479–2493
Yang H, Huang K, King I, Lyu MR (2009) Localized support vector regression for time series prediction. Neurocomputing 72:2659–2669
Yao X, Liu Y (1998) Making use of population information in evolutionary artificial neural networks. IEEE Trans Syst Man Cybern Part B Cybern 28(3):417–425
Zhang G, Patuwo BE, Hu MY (1998) Forecasting with artificial neural networks: the state of the art. Int J Forecast 14:35–62
Zhang L, Zhou W (2010) On the sparseness of 1-norm support vector machines. Neural Netw 23:373–385
Zhang L, Zhou WD (2011) Sparse ensembles using weighted combination methods based on linear programming. Pattern Recognit 44(1):97–106
Zhou ZH, Wu JX, Tang W (2002) Ensembling neural networks: many could be better than all. Artif Intell 137(1–2):239–263
Zhou Z (2012) Ensemble methods: foundations and algorithms. Chapman & Hall/CRC data mining and knowledge discovery series, Boca Raton
Acknowledgments
We would like to thank two anonymous reviewers and Editor A. Castiglione for their valuable comments and suggestions, which have significantly improved this paper. This work was supported in part by the National Natural Science Foundation of China under Grant Nos. 61373093, 61033013,and 61271301, by the Natural Science Foundation of Jiangsu Province of China under Grant Nos. BK2011284 and BK201222725, by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under Grant No.13KJA520001, and by the Qing Lan Project.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by A. Castiglione.
Rights and permissions
About this article
Cite this article
Zhang, L., Zhou, WD. Time series prediction using sparse regression ensemble based on \(\ell _2\)–\(\ell _1\) problem. Soft Comput 19, 781–792 (2015). https://doi.org/10.1007/s00500-014-1304-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-014-1304-y