An effective cost-sensitive sparse online learning framework for imbalanced streaming data classification and its application to online anomaly detection

Chen, Zhong; Sheng, Victor; Edwards, Andrea; Zhang, Kun

doi:10.1007/s10115-022-01745-x

An effective cost-sensitive sparse online learning framework for imbalanced streaming data classification and its application to online anomaly detection

Regular Paper
Published: 16 September 2022

Volume 65, pages 59–87, (2023)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

584 Accesses
3 Citations
Explore all metrics

Abstract

Class imbalance is one of the most challenging problems in streaming data mining due to its adverse impact on predictive capability of online models. Most of the existing approaches for online learning lack an effective mechanism to handle high-dimensional streaming data with skewed class distributions, resulting in deteriorated model performance and limited interpretability. In this paper, we develop a cost-sensitive regularized dual averaging (CSRDA) method to tackle this problem. Our proposed method substantially extends the influential regularized dual averaging method by formulating a new convex optimization function, in which four \(\ell _1\)-norm regularized cost-sensitive objective functions are directly optimized, respectively. We then theoretically analyze CSRDA’s regret bounds and the bounds of primal variables, demonstrating that CSRDA and its variants can achieve a theoretical convergence in terms of the balanced cost and sparsity when handling severe imbalanced and high-dimensional streaming data. To validate the proposed methods, we conduct extensive experiments on six benchmark streaming datasets with varied imbalance ratios and three online anomaly detection tasks. The experimental results demonstrate that, compared to other baseline methods, CSRDA and its variants not only improve classification performance, but also successfully capture sparse features more effectively and hence potentially have a better model interpretability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cost-sensitive sparse group online learning for imbalanced data streams

Article 20 November 2023

Sparse Stochastic Online AUC Optimization for Imbalanced Streaming Data

Convex vs Convex-Concave Objective for Rare Label Classification

References

Poggio T, Voinea S, Rosasco L (2011) Online learning, stability, and stochastic gradient descent. CoRR abs/1105.4701
Ma Y, Zheng T (2017) Stabilized sparse online learning for sparse data. J Mach Learn Res 18(1):4773–4808
MathSciNet MATH Google Scholar
Duchi J, Singer Y (2009) Efficient online and batch learning using forward backward splitting. J Mach Learn Res 10:2899–2934
MathSciNet MATH Google Scholar
Langford J, Li L, Zhang T (2009) Sparse online learning via truncated gradient. In: Proceedings of advances in neural information processing systems, pp 905–912
Zhang Q, Zhang P, Long G, Ding W, Zhang C, Wu X (2015) Towards mining trapezoidal data streams. In: IEEE international conference on data mining, pp 1111–1116
Xiao L (2010) Dual averaging methods for regularized stochastic learning and online optimization. J Mach Learn Res 11:2543–2596
MathSciNet MATH Google Scholar
Lee S, Wright SJ (2012) Manifold identification in dual averaging for regularized stochastic online learning. J Mach Learn Res 13(1):1705–1744
MathSciNet MATH Google Scholar
Ushio A, Yukawa M (2019) Projection-based regularized dual averaging for stochastic optimization. IEEE Trans Signal Process 67(10):2720–2733
Article MathSciNet MATH Google Scholar
Wang J, Zhao P, Hoi SC (2013) Cost-sensitive online classification. IEEE Trans Knowl Data Eng 26(10):2425–2438
Article Google Scholar
Liu M, Xu C, Luo Y, Xu C, Wen Y, Tao D (2017) Cost-sensitive feature selection by optimizing F-measures. IEEE Trans Image Process 27(3):1323–1335
Article MathSciNet MATH Google Scholar
Yan Y, Yang T, Yang Y, Chen J (2017) A framework of online learning with imbalanced streaming data. In: AAAI conference on artificial intelligence, pp 2817–2823
Crammer K, Dekel O, Keshet J, Shalev-Shwartz S, Singer Y (2006) Online passive-aggressive algorithms. J Mach Learn Res 7(19):551–585
MathSciNet MATH Google Scholar
Li Y, Zaragoza H, Herbrich R, Shawe-Taylor J, Kandola J (2002) The perceptron algorithm with uneven margins. In: international conference on machine learning, pp 379–386
Crammer K, Dredze M, Pereira F (2008) Exact convex confidence-weighted learning. In: Proceedings of advances in neural information processing systems, pp 345–352
Dredze M, Crammer K, Pereira (2009) Confidence-weighted linear classification. In: International conference on machine learning, pp 264–271
Zhao P, Zhang Y, Wu M, Hoi SC, Tan M, Huang J (2018) Adaptive cost-sensitive online classification. IEEE Trans Knowl Data Eng 31(2):214–228
Article Google Scholar
Chen Z, Fang Z, Fan W, Edwards A, Zhang K (2017) CSTG: An effective framework for cost-sensitive sparse online learning. In: SIAM international conference on data mining, pp 759–767
Cesa-Bianchi N, Conconi A, Gentile C (2004) On the generalization ability of online learning algorithms. IEEE Trans Info Theory 50(9):2050–2057
Article MATH Google Scholar
Liu JW, Zhou JJ, Kamel MS, Luo XL (2017) Online learning algorithm based on adaptive control theory. IEEE Trans Neural Netw Learn Syst 29(6):2278–2293
Article MathSciNet Google Scholar
Hoi SC, Sahoo D, Lu J, Zhao P (2021) Online learning: a comprehensive survey. Neurocomputing 459:249–289
Article Google Scholar
Chen Z, Fang Z, Zhao J, Fan W, Edwards A, Zhang K (2018) Online density estimation over streaming data: a local adaptive solution. In: IEEE international conference on big data, pp 201–210
Singh C, Anuj S (2020) Online learning using multiple times weight updating. Appl Artif Intell 34(6):515–536
Article Google Scholar
Chen Z, Fang Z, Sheng V, Zhao J, Fan W, Edwards A, Zhang K (2021) Adaptive robust local online density estimation for streaming data. Int J Mach Learn Cyber 12(6):1803–1824
Article Google Scholar
Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65(6):386
Article Google Scholar
Gentile C (2001) A new approximate maximal margin classification algorithm. J Mach Learn Res 2:213–242
MathSciNet MATH Google Scholar
Li Y, Long PM (2000) The relaxed online maximum margin algorithm. In: Proceedings of advances in neural information processing systems, pp 498–504
Crammer K, Kulesza A, Dredze M (2009) Adaptive regularization of weight vectors. In: Proceedings of advances in neural information processing systems, pp 414–422
Cesa-Bianchi N, Conconi A, Gentile C (2005) A second-order perceptron algorithm. SIAM J Comput 34(3):640–668
Article MathSciNet MATH Google Scholar
Wang J, Zhao P, Hoi SC (2012) Exact soft confidence-weighted learning. In: International conference on machine learning, pp 107–114
Luo H, Agarwal A, Cesa-Bianchi N, Langford J (2016) Efficient second order online learning by sketching. In: Proceedings of advances in neural information processing systems, pp 910–918
Wang J, Zhao P, Hoi SC, Jin R (2013) Online feature selection and its applications. IEEE Trans Knowl Data Eng 26(3):698–710
Article Google Scholar
Nesterov Y (2009) Primal-dual subgradient methods for convex problems. Math Prog 120(1):221–259
Article MathSciNet MATH Google Scholar
Zhou B, Chen F, Ying Y (2019) Dual averaging method for online graph-structured sparsity. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining, pp 436–446
Zhao P, Wang D, Wu P, Hoi SC (2020) A unified framework for sparse online learning. ACM Trans Knowl Discov Data 14(5):1–20
Article Google Scholar
Leevy JL, Khoshgoftaar TM, Bauder RA, Seliya N (2018) A survey on addressing high-class imbalance in big data. J Big Data 5(1):1–30
Article Google Scholar
Elkan C (2001) The foundations of cost-sensitive learning. In: International joint conference on artificial intelligence, pp 973–978
Zhao P, Hoi SC (2013) Cost-sensitive online active learning with application to malicious URL detection. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining, pp 919–927
Zhao P, Zhuang F, Wu M, Li XL, Hoi SC (2015) Cost-sensitive online classification with adaptive regularization and its applications. In: IEEE international conference on data mining, 649–658
Zinkevich M (2003) Online convex programming and generalized infinitesimal gradient ascent. In: International conference on machine learning, pp 928–936
Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: one-sided selection. In: International conference on machine learning, pp 179–186
Hurley N, Rickard S (2009) Comparing measures of sparsity. IEEE Trans Info Theory 55(10):4723–4741
Article MathSciNet MATH Google Scholar
Hoi SC, Wang J, Zhao P (2014) Libol: a library for online learning algorithms. J Mach Learn Res 15(1):495–499
MATH Google Scholar

Download references

Acknowledgements

This publication was made possible by funding from the DOD ARO Grant #W911NF-20-1-0249.

Author information

Authors and Affiliations

Department of Computer Science, Xavier University of Louisiana, New Orleans, LA, United States
Zhong Chen, Andrea Edwards & Kun Zhang
Department of Computer Science, Texas Tech University, Lubbock, TX, United States
Victor Sheng

Authors

Zhong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Victor Sheng
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Edwards
View author publications
You can also search for this author in PubMed Google Scholar
Kun Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kun Zhang.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, Z., Sheng, V., Edwards, A. et al. An effective cost-sensitive sparse online learning framework for imbalanced streaming data classification and its application to online anomaly detection. Knowl Inf Syst 65, 59–87 (2023). https://doi.org/10.1007/s10115-022-01745-x

Download citation

Received: 22 April 2021
Revised: 04 August 2022
Accepted: 06 August 2022
Published: 16 September 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s10115-022-01745-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An effective cost-sensitive sparse online learning framework for imbalanced streaming data classification and its application to online anomaly detection

Abstract

Access this article

Similar content being viewed by others

Cost-sensitive sparse group online learning for imbalanced data streams

Sparse Stochastic Online AUC Optimization for Imbalanced Streaming Data

Convex vs Convex-Concave Objective for Rare Label Classification

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An effective cost-sensitive sparse online learning framework for imbalanced streaming data classification and its application to online anomaly detection

Abstract

Access this article

Similar content being viewed by others

Cost-sensitive sparse group online learning for imbalanced data streams

Sparse Stochastic Online AUC Optimization for Imbalanced Streaming Data

Convex vs Convex-Concave Objective for Rare Label Classification

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation