Abstract
Recently, many outstanding techniques for Time series forecasting (TSF) have been proposed. These techniques depend on necessary and sufficient data samples, which is the key to train a good predictor. Thus, an Active learning (AL) algorithmic framework based on Support vector regression (SVR) is designed for TSF, with the goal to choose the most valuable samples and reduce the complexity of the training set. To evaluate the quality of samples comprehensively, multiple essential criteria, such as informativeness, representativeness and diversity, are considered in a two clustering-based consecutive stages procedure. In addition, considering the imbalance of time series data, a range of values might be seriously under-represented but extremely important to the user. Thus, it is unreasonable to assign the same prediction cost to each sample. To address this imbalance problem, a multiple criteria cost-sensitive active learning algorithm in the virtue of weight SVR architecture, abbreviated as MAW-SVR, ad hoc for imbalanced TSF, is proposed. By introducing the cost-sensitive scheme, each sample is endowed with a penalty weight, which can be dynamically updated in the AL procedure. The experimental comparisons between MAW-SVR and the other six AL algorithms on a total of thirty time series datasets verify the effectiveness of the proposed algorithm.
Similar content being viewed by others
References
Contreras-Reyes JE, Idrovo-Aguirre BJ (2020) Backcasting and forecasting time series using detrended cross-correlation analysis. Physica A-Stat Mechan Appl 560:125109
Salles R, Belloze K, Porto F, Gonzalez PH, Ogasawara E (2019) Nonstationary time series transformation methods: An experimental review. Knowl-Based Syst 164:274–291
Hyndman RJ, De Gooijer JG (2006) 25 years of time series forecasting. Int J Forecast 22:443–473
Junior DSDOS, De Oliveira JFL, Neto PSGDM (2019) An intelligent hybridization of ARIMA with machine learning models for time series forecasting. Knowl-Based Syst 175:72–86
De Prado MLAdvances in financial machine learning: John Wiley & Sons, 2018.
Li JH, Dai Q, Ye R (2019) A novel double incremental learning algorithm for time series prediction. Neural Comput Appl 31:6055–6077
Hong W-C (2012) Application of seasonal SVR with chaotic immune algorithm in traffic flow forecasting. Neural Comput Appl 21:583–593
Yaseen ZM, Allawi MF, Yousif AA, Jaafar O, Hamzah FM, El-Shafie A (2018) Non-tuned machine learning approach for hydrological time series forecasting. Neural Comput Appl 30:1479–1491
Peralta Donate J, Li X, Gutierrez Sanchez G, Sanchis A, de Miguel, (2013) Time series forecasting by evolving artificial neural networks with genetic algorithms, differential evolution and estimation of distribution algorithm. Neural Comput Appl 22:11–20
Suykens JAK, De Brabanter J, Lukas L, Vandewalle J (2002) Weighted least squares support vector machines: robustness and sparse approximation. Neurocomputing 48:85–105
Kumar P, Gupta A (2020) Active Learning Query Strategies for Classification, Regression, and Clustering: A Survey. J Comput Sci Technol 35:913–945
Shu Z, Sheng VS, Li J (2018) Learning from crowds with active learning and self-healing. Neural Comput Appl 30:2883–2894
Gorissen D, Tommasi LD, Crombecq K, Dhaene T (2009) Sequential modeling of a low noise amplifier with neural networks and active learning. Neural Comput Appl 18:485–494
Huang S, Jin R, Zhou Z (2014) Active Learning by Querying Informative and Representative Examples. IEEE Trans Pattern Anal Machine Intelligence 36:1936–1949
Yu H, Sun C, Yang W, Yang X, Zuo X (2015) AL-ELM: One uncertainty-based active learning algorithm using extreme learning machine. Neurocomputing 166:140–150
Wu D, Lin CT, Huang J (2019) Active Learning for Regression Using Greedy Sampling. Inf Sci 474:90–105
Wu D (2019) Pool-Based Sequential Active Learning for Regression. IEEE Trans Neural Networks 30:1348–1359
R Burbidge, JJ Rowland, and RD King 2007 "Active learning for regression based on query by committee," in 8th International Conference on Intelligent Data Engineering and Automated Learning, Birmingham, England pp. 209–218.
W Cai, Y Zhang, and J Zhou 2013 "Maximizing Expected Model Change for Active Learning in Regression," in Proceedings 13th IEEE International Conference on Data Mining, Dallas, Texas, 51–60
B. Settles and M. Craven 2008 "An Analysis of Active Learning Strategies for Sequence Labeling Tasks," in Conference on Empirical Methods in Natural Language Processing, Honolulu, Hawaii, pp. 1070–1079
Demir B, Bruzzone L (2014) A multiple criteria active learning method for support vector regression. Pattern Recogn 47:2558–2567
Cao XY, Yao J, Xu ZB, Meng DY (2020) Hyperspectral Image Classification With Convolutional Neural Network and Active Learning. IEEE Trans Geosci Remote Sens 58:4604–4616
Li M, Xiong A, Wang L, Deng S, Ye J (2020) ACO Resampling: Enhancing the performance of oversampling methods for class imbalance classification. Knowl-Based Syst 196:105–118
M. Koziarski, "Two-stage resampling for convolutional neural network training in the imbalanced colorectal cancer image classification arXiv," 7 April 2020.
Yu H, Yang X, Zheng S, Sun C (2019) Active Learning From Imbalanced Data: A Solution of Online Weighted Extreme Learning Machine. IEEE Trans Neural Networks 30:1088–1103
Ma C, Liu Z, Cao Z, Song W, Zeng W (2020) Cost-Sensitive Deep Forest for Price Prediction. Pattern Recogn 107:107–122
Moniz N, Branco P, Torgo L (2017) Resampling strategies for imbalanced time series forecasting. J Data Sci 3:161–181
McCarthy K, Zabar B, and Weiss G 2005 "Does cost-sensitive learning beat sampling for classifying rare classes?," in Proc. Int. Workshop Utility-Based Data Mining, Chicago, Illinois, USA pp. 69–77
Liu X and Zhou Z 2006 "The Influence of Class Imbalance on Cost-Sensitive Learning: An Empirical Study," in Proceedings 6th IEEE International Conference on Data Mining, Hong Kong, China pp. 970–974
Drummond C and Holte RC 2000 "Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria," in Proceedings of Learning from Imbalanced Data Sets, Austin, Texas, USA pp. 239–246
Smola AJ, Scholkopf B (2004) A tutorial on support vector regression. Stat Comput 14:199–222
Bao YK, Xiong T, Hu ZY (2014) Multi-step-ahead time series prediction using multiple-output support vector regression. Neurocomputing 129:482–493
Yoon ES, Lee DE, Song JH, Song S (2005) Weighted Support Vector Machine for Quality Estimation in the Polymerization Process. Ind Eng Chem Res 44:2101–2105
Elattar EE, Goulermas JY, Wu QH (2010) Electric Load Forecasting Based on Locally Weighted Support Vector Regression. IEEE Trans Syst Man Cybernetics Part C-Appl Rev 40:438–447
RPA Ribeiro 2011 "Utility-based Regression," Ph.D. thesis, Department of Computer Science, Faculty of Sciences, University of Porto
Dougherty RL, Edelman A, Hyman JM (1989) Nonnegativity-, monotonicity-, or convexity-preserving cubic and quintic hermite interpolation. Math Comput 52:471–494
R Zhang and AI Rudnicky 2002 "A large scale clustering scheme for kernel K-Means," in 16th International Conference on Pattern Recognition (ICPR), Quebec, Canada pp. 289–292
Mardia KV, Kent JT, Bibby JM (1979) Multivariate Analysis. Math Gazette 37:123–131
Yahoo Finance[EB/OL]. Available: http://finance.yahoo.com/
RJ Hyndman and Y Yang. (2018). Time Series Data Library. v0.1.0. Available: https://pkg.yangzhuoranyang.com/tsdl/
Plutowski M, Cottrell GW, White H (1996) Experience with selecting exemplars from clean data. Neural Netw 9:273–294
Dalponte M, Bruzzone L, Gianelle D (2011) A System for the Estimation of Single-Tree Stem Diameter and Volume Using Multireturn LIDAR Data. IEEE Trans Geosci Remote Sens 49:2479–2490
Acknowledgements
This work is supported by the National Key R&D Program of China (Grant Nos. 2018YFC2001600, 2018YFC2001602), and the National Natural Science Foundation of China under Grant no. 61473150.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, J., Dai, Q. A cost-sensitive active learning algorithm: toward imbalanced time series forecasting. Neural Comput & Applic 34, 6953–6972 (2022). https://doi.org/10.1007/s00521-021-06837-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-06837-3