Skip to main content
Log in

A cost-sensitive active learning algorithm: toward imbalanced time series forecasting

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Recently, many outstanding techniques for Time series forecasting (TSF) have been proposed. These techniques depend on necessary and sufficient data samples, which is the key to train a good predictor. Thus, an Active learning (AL) algorithmic framework based on Support vector regression (SVR) is designed for TSF, with the goal to choose the most valuable samples and reduce the complexity of the training set. To evaluate the quality of samples comprehensively, multiple essential criteria, such as informativeness, representativeness and diversity, are considered in a two clustering-based consecutive stages procedure. In addition, considering the imbalance of time series data, a range of values might be seriously under-represented but extremely important to the user. Thus, it is unreasonable to assign the same prediction cost to each sample. To address this imbalance problem, a multiple criteria cost-sensitive active learning algorithm in the virtue of weight SVR architecture, abbreviated as MAW-SVR, ad hoc for imbalanced TSF, is proposed. By introducing the cost-sensitive scheme, each sample is endowed with a penalty weight, which can be dynamically updated in the AL procedure. The experimental comparisons between MAW-SVR and the other six AL algorithms on a total of thirty time series datasets verify the effectiveness of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Contreras-Reyes JE, Idrovo-Aguirre BJ (2020) Backcasting and forecasting time series using detrended cross-correlation analysis. Physica A-Stat Mechan Appl 560:125109

    Article  MathSciNet  Google Scholar 

  2. Salles R, Belloze K, Porto F, Gonzalez PH, Ogasawara E (2019) Nonstationary time series transformation methods: An experimental review. Knowl-Based Syst 164:274–291

    Article  Google Scholar 

  3. Hyndman RJ, De Gooijer JG (2006) 25 years of time series forecasting. Int J Forecast 22:443–473

    Article  Google Scholar 

  4. Junior DSDOS, De Oliveira JFL, Neto PSGDM (2019) An intelligent hybridization of ARIMA with machine learning models for time series forecasting. Knowl-Based Syst 175:72–86

    Article  Google Scholar 

  5. De Prado MLAdvances in financial machine learning: John Wiley & Sons, 2018.

  6. Li JH, Dai Q, Ye R (2019) A novel double incremental learning algorithm for time series prediction. Neural Comput Appl 31:6055–6077

    Article  Google Scholar 

  7. Hong W-C (2012) Application of seasonal SVR with chaotic immune algorithm in traffic flow forecasting. Neural Comput Appl 21:583–593

    Article  Google Scholar 

  8. Yaseen ZM, Allawi MF, Yousif AA, Jaafar O, Hamzah FM, El-Shafie A (2018) Non-tuned machine learning approach for hydrological time series forecasting. Neural Comput Appl 30:1479–1491

    Article  Google Scholar 

  9. Peralta Donate J, Li X, Gutierrez Sanchez G, Sanchis A, de Miguel, (2013) Time series forecasting by evolving artificial neural networks with genetic algorithms, differential evolution and estimation of distribution algorithm. Neural Comput Appl 22:11–20

    Article  Google Scholar 

  10. Suykens JAK, De Brabanter J, Lukas L, Vandewalle J (2002) Weighted least squares support vector machines: robustness and sparse approximation. Neurocomputing 48:85–105

    Article  Google Scholar 

  11. Kumar P, Gupta A (2020) Active Learning Query Strategies for Classification, Regression, and Clustering: A Survey. J Comput Sci Technol 35:913–945

    Article  Google Scholar 

  12. Shu Z, Sheng VS, Li J (2018) Learning from crowds with active learning and self-healing. Neural Comput Appl 30:2883–2894

    Article  Google Scholar 

  13. Gorissen D, Tommasi LD, Crombecq K, Dhaene T (2009) Sequential modeling of a low noise amplifier with neural networks and active learning. Neural Comput Appl 18:485–494

    Article  Google Scholar 

  14. Huang S, Jin R, Zhou Z (2014) Active Learning by Querying Informative and Representative Examples. IEEE Trans Pattern Anal Machine Intelligence 36:1936–1949

    Article  Google Scholar 

  15. Yu H, Sun C, Yang W, Yang X, Zuo X (2015) AL-ELM: One uncertainty-based active learning algorithm using extreme learning machine. Neurocomputing 166:140–150

    Article  Google Scholar 

  16. Wu D, Lin CT, Huang J (2019) Active Learning for Regression Using Greedy Sampling. Inf Sci 474:90–105

    Article  MathSciNet  Google Scholar 

  17. Wu D (2019) Pool-Based Sequential Active Learning for Regression. IEEE Trans Neural Networks 30:1348–1359

    Article  MathSciNet  Google Scholar 

  18. R Burbidge, JJ Rowland, and RD King 2007 "Active learning for regression based on query by committee," in 8th International Conference on Intelligent Data Engineering and Automated Learning, Birmingham, England pp. 209–218.

  19. W Cai, Y Zhang, and J Zhou 2013 "Maximizing Expected Model Change for Active Learning in Regression," in Proceedings 13th IEEE International Conference on Data Mining, Dallas, Texas, 51–60

  20. B. Settles and M. Craven 2008 "An Analysis of Active Learning Strategies for Sequence Labeling Tasks," in Conference on Empirical Methods in Natural Language Processing, Honolulu, Hawaii, pp. 1070–1079

  21. Demir B, Bruzzone L (2014) A multiple criteria active learning method for support vector regression. Pattern Recogn 47:2558–2567

    Article  Google Scholar 

  22. Cao XY, Yao J, Xu ZB, Meng DY (2020) Hyperspectral Image Classification With Convolutional Neural Network and Active Learning. IEEE Trans Geosci Remote Sens 58:4604–4616

    Article  Google Scholar 

  23. Li M, Xiong A, Wang L, Deng S, Ye J (2020) ACO Resampling: Enhancing the performance of oversampling methods for class imbalance classification. Knowl-Based Syst 196:105–118

    Google Scholar 

  24. M. Koziarski, "Two-stage resampling for convolutional neural network training in the imbalanced colorectal cancer image classification arXiv," 7 April 2020.

  25. Yu H, Yang X, Zheng S, Sun C (2019) Active Learning From Imbalanced Data: A Solution of Online Weighted Extreme Learning Machine. IEEE Trans Neural Networks 30:1088–1103

    Article  Google Scholar 

  26. Ma C, Liu Z, Cao Z, Song W, Zeng W (2020) Cost-Sensitive Deep Forest for Price Prediction. Pattern Recogn 107:107–122

    Article  Google Scholar 

  27. Moniz N, Branco P, Torgo L (2017) Resampling strategies for imbalanced time series forecasting. J Data Sci 3:161–181

    Google Scholar 

  28. McCarthy K, Zabar B, and Weiss G 2005 "Does cost-sensitive learning beat sampling for classifying rare classes?," in Proc. Int. Workshop Utility-Based Data Mining, Chicago, Illinois, USA pp. 69–77

  29. Liu X and Zhou Z 2006 "The Influence of Class Imbalance on Cost-Sensitive Learning: An Empirical Study," in Proceedings 6th IEEE International Conference on Data Mining, Hong Kong, China pp. 970–974

  30. Drummond C and Holte RC 2000 "Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria," in Proceedings of Learning from Imbalanced Data Sets, Austin, Texas, USA pp. 239–246

  31. Smola AJ, Scholkopf B (2004) A tutorial on support vector regression. Stat Comput 14:199–222

    Article  MathSciNet  Google Scholar 

  32. Bao YK, Xiong T, Hu ZY (2014) Multi-step-ahead time series prediction using multiple-output support vector regression. Neurocomputing 129:482–493

    Article  Google Scholar 

  33. Yoon ES, Lee DE, Song JH, Song S (2005) Weighted Support Vector Machine for Quality Estimation in the Polymerization Process. Ind Eng Chem Res 44:2101–2105

    Article  Google Scholar 

  34. Elattar EE, Goulermas JY, Wu QH (2010) Electric Load Forecasting Based on Locally Weighted Support Vector Regression. IEEE Trans Syst Man Cybernetics Part C-Appl Rev 40:438–447

    Article  Google Scholar 

  35. RPA Ribeiro 2011 "Utility-based Regression," Ph.D. thesis, Department of Computer Science, Faculty of Sciences, University of Porto

  36. Dougherty RL, Edelman A, Hyman JM (1989) Nonnegativity-, monotonicity-, or convexity-preserving cubic and quintic hermite interpolation. Math Comput 52:471–494

    Article  MathSciNet  Google Scholar 

  37. R Zhang and AI Rudnicky 2002 "A large scale clustering scheme for kernel K-Means," in 16th International Conference on Pattern Recognition (ICPR), Quebec, Canada pp. 289–292

  38. Mardia KV, Kent JT, Bibby JM (1979) Multivariate Analysis. Math Gazette 37:123–131

    MATH  Google Scholar 

  39. Yahoo Finance[EB/OL]. Available: http://finance.yahoo.com/

  40. RJ Hyndman and Y Yang. (2018). Time Series Data Library. v0.1.0. Available: https://pkg.yangzhuoranyang.com/tsdl/

  41. Plutowski M, Cottrell GW, White H (1996) Experience with selecting exemplars from clean data. Neural Netw 9:273–294

    Article  Google Scholar 

  42. Dalponte M, Bruzzone L, Gianelle D (2011) A System for the Estimation of Single-Tree Stem Diameter and Volume Using Multireturn LIDAR Data. IEEE Trans Geosci Remote Sens 49:2479–2490

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the National Key R&D Program of China (Grant Nos. 2018YFC2001600, 2018YFC2001602), and the National Natural Science Foundation of China under Grant no. 61473150.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qun Dai.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, J., Dai, Q. A cost-sensitive active learning algorithm: toward imbalanced time series forecasting. Neural Comput & Applic 34, 6953–6972 (2022). https://doi.org/10.1007/s00521-021-06837-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-021-06837-3

Keywords

Navigation