Boosting imbalanced data learning with Wiener process oversampling

Li, Qian; Li, Gang; Niu, Wenjia; Cao, Yanan; Chang, Liang; Tan, Jianlong; Guo, Li

doi:10.1007/s11704-016-5250-y

Boosting imbalanced data learning with Wiener process oversampling

Research Article
Published: 15 November 2016

Volume 11, pages 836–851, (2017)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Qian Li¹,
Gang Li²,
Wenjia Niu¹,
Yanan Cao¹,
Liang Chang³,
Jianlong Tan¹ &
…
Li Guo¹

156 Accesses
18 Citations
Explore all metrics

Abstract

Learning from imbalanced data is a challenging task in a wide range of applications, which attracts significant research efforts from machine learning and data mining community. As a natural approach to this issue, oversampling balances the training samples through replicating existing samples or synthesizing new samples. In general, synthesization outperforms replication by supplying additional information on the minority class. However, the additional information needs to follow the same normal distribution of the training set, which further constrains the new samples within the predefined range of training set. In this paper, we present the Wiener process oversampling (WPO) technique that brings the physics phenomena into sample synthesization. WPO constructs a robust decision region by expanding the attribute ranges in training set while keeping the same normal distribution. The satisfactory performance of WPO can be achieved with much lower computing complexity. In addition, by integrating WPO with ensemble learning, the WPOBoost algorithm outperformsmany prevalent imbalance learning solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Imbalanced classification in sparse and large behaviour datasets

Article 22 June 2017

Learning from Imbalanced Data Streams Using Rotation-Based Ensemble Classifiers

A survey on ensemble learning

Article 30 August 2019

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Zhou Z H, Liu X Y. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(1): 63–77
Article Google Scholar
Liu X Y, Zhou Z H. The influence of class imbalance on cost-sensitive learning: An empirical study. In: Proceedings of the 6th International Conference on Data Mining. 2006, 970–974
Google Scholar
Yu L, Wang S, Lai K K. Developing an svm-based ensemble learning system for customer risk identification collaborating with customer relationship management. Frontiers of Computer Science in China, 2010, 4(2): 196–203
Article Google Scholar
Liu E, Zhao H, Guo F, Liang J, Tian J. Fingerprint segmentation based on an adaboost classifier. Frontiers of Computer Science in China, 2011, 5(2): 148–157
Article MathSciNet Google Scholar
Han H, Wang W, Mao B. Over-sampling algorithm based on adaboost in unbalanced data set. Computer Engineering, 2007, 33(10): 207–209 (in Chinese)
Google Scholar
Chawla N V, Lazarevic A, Hall L O, Bowyer K W. Smoteboost: improving prediction of the minority class in boosting. Lecture Notes in Computer Science, 2003, 2838: 107–119
Article Google Scholar
Mease D, Wyner A J, Buja A. Boosted classification trees and class probability/quantile estimation. The Journal of Machine Learning Research, 2007, 8: 409–439
MATH Google Scholar
Batista G E, Prati R C, Monard M C. A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 20–29
Article Google Scholar
Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C. Safe-levelsmote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Proceedings of Advances in Knowledge Discovery and Data Mining. 2009, 475–482
Chapter Google Scholar
Chawla N V, Bowyer K W, Hall L O, Kegelmeyer W P. Smote: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 2002, 321–357
Google Scholar
Yuan B, Liu W. Measure oriented training: a targeted approach to imbalanced classification problems. Frontiers of Computer Science, 2012, 6(5): 489–497
MathSciNet Google Scholar
Kang P, Cho S. EUS SVMS: ensemble of under-sampled svms for data imbalance problems. In: Proceedings of Neural Information Processing. 2006, 837–846
Chapter Google Scholar
Japkowicz N. The class imbalance problem: significance and strategies. In: Proceedings of International Conference on Artificial Intelligence. 2000
Google Scholar
Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F. A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man, and Cybernetics, 2012, 42(4): 463–484
Article Google Scholar
Yuan B, Ma X. Sampling+ reweighting: boosting the performance of adaboost on imbalanced datasets. In: Proceedings of International Joint Conference on Neural Networks. 2012, 1–6
Google Scholar
Hida T. Brownian motion. Springer US, 1980, 11(5): 44–113
MathSciNet MATH Google Scholar
Dietterich T G. Ensemble methods in machine learning. In: Proceedings of Multiple classifier systems. 2000, 1–15
Google Scholar
Maloof M A. Learning when data sets are imbalanced and when costs are unequal and unknown. In: Proceedings of ICML-2003 Workshop on Learning from Imbalanced Data Sets II. 2003
Google Scholar
Chawla N V, Japkowicz N, Kotcz A. Editorial: special issue on learning from imbalanced data sets. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 1–6
Article Google Scholar
Han H, Wang WY, Mao B H. Borderline-smote: A new over-sampling method in imbalanced data sets learning. In: Proceedings of Advances in Intelligent Computing. 2005, 878–887
Chapter Google Scholar
Liu X Y, Wu J, Zhou Z H. Exploratory undersampling for classimbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, 2009, 39(2): 539–550
Article Google Scholar
Schapire R E. The boosting approach to machine learning: an overview. Nonlinear Estimation and Classification, 2003, 149–171
Chapter Google Scholar
Schapire R E, Singer Y. Boostexter: a boosting-based system for text categorization. Machine Learning, 2000, 39(2–3): 135–168
Article MATH Google Scholar
Li X, Wang L, Sung E. Adaboost with svm-based component classifiers. Engineering Applications of Artificial Intelligence, 2008, 21(5): 785–795
Article Google Scholar
Boyd S, Vandenberghe L. Convex Optimization. Cambridge: Cambridge University Press, 2004
Book MATH Google Scholar
Asuncion A, Newman D. UCI machine learning repository. http://www.ics.uci.edu/mlearn/MLRepository.html, 2007
Google Scholar
Breiman L, Friedman J, Stone C J, Olshen R A. Classification and Regression Trees. Belmont: Wadsworth International Group, 1984
MATH Google Scholar
Lewis D D. Naive (Bayes) at forty: the independence assumption in information retrieval. In: Proceedings of Machine learning: ECML-98. 1998, 4–15
Chapter Google Scholar
Keller J M, Gray M R, Givens J A. A fuzzy k-nearest neighbor algorithm. IEEE Transactions on Systems, Man and Cybernetics, 1985 (4): 580–585
Article Google Scholar
Breiman L. Bagging predictors. Machine Learning, 1996, 24(2): 123–140
MATH Google Scholar

Download references

Acknowledgements

This research was partially supported by the Strategic Priority Research Program of the Chinese Academy of Sciences (XDA06030200), the National Natural Science Foundation of China (Grant Nos. M1552006, 61403369, 61272427, and 61363030), Xinjiang Uygur Autonomous Region Science and Technology Project (201230123), Beijing Key Lab of Intelligent Telecommunication Software, Multimedia (ITSM201502), Guangxi Key Laboratory of Trusted Software (kx201418).

Author information

Authors and Affiliations

Institute of Information Engineering, Chinese Academy of Sciences, Beijing, 100093, China
Qian Li, Wenjia Niu, Yanan Cao, Jianlong Tan & Li Guo
School of Information Technology, Deakin University, Geelong, VIC, 3125, Australia
Gang Li
Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin, 541004, China
Liang Chang

Authors

Qian Li
View author publications
Search author on:PubMed Google Scholar
Gang Li
View author publications
Search author on:PubMed Google Scholar
Wenjia Niu
View author publications
Search author on:PubMed Google Scholar
Yanan Cao
View author publications
Search author on:PubMed Google Scholar
Liang Chang
View author publications
Search author on:PubMed Google Scholar
Jianlong Tan
View author publications
Search author on:PubMed Google Scholar
Li Guo
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Wenjia Niu.

Additional information

Qian Li received her MS at Shandong University of Computer Software and Theory, China. Now she is a PhD student of Institute of Information Engineering, Chinese Academy of Sciences, China. Her main research interests include machine learning, data mining and services computing.

Gang Li is currently a senior lecturer in the School of Information Technology at Deakin University, Australia. His research interest are in the area of data mining, machine learning and multimedia analysis. He served on the Program Committee for over 40 international conferences in artificial intelligence, data mining and machine learning, tourism and hospitality management.

Wenjia Niu is an associate professor in the Institute of Information Engineering, Chinese Academy of Sciences, China. His research interests include Web services, agent, sensor network and data mining. He has served as a regular reviewer for Journal of Network and Computer Applications (JNCA), Knowledge and Information Systems (KAIS), and Journal of Computer Science and Technology (JCST).

Yanan Cao is an associate professor in the Institute of Information Engineering, Academy of Sciences, China. She obtained her PhD in the Institute of Computing Technology in 2012. Her research interests include data mining methodologies, machine learning algorithms and knowledge graph.

Liang Chang received his PhD in computer science from the Institute of Computing Technology, Chinese Academy of Sciences, China in 2008. He is currently a professor in the School of Computer Science and Engineering, Guilin University of Electronic Technology, China. His research interests include knowledge representation and reasoning, formal methods, trusted software and intelligent planning.

Jianlong Tan is a researcher in the Institute of Information Engineering, Chinese Academy of Sciences, China. He is also the chairman of the Intelligent Information Processing Research Center, Institute of Information Engineering, Chinese Academy of Sciences. His research interests are string matching algorithm, algorithm security and information security.

Li Guo is a researcher in the Institute of Information Engineering, Chinese Academy of Sciences, China. Her research interests include data stream management systems and information security.

Electronic supplementary material

Supplementary material, approximately 322 KB.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Q., Li, G., Niu, W. et al. Boosting imbalanced data learning with Wiener process oversampling. Front. Comput. Sci. 11, 836–851 (2017). https://doi.org/10.1007/s11704-016-5250-y

Download citation

Received: 24 January 2015
Accepted: 19 February 2016
Published: 15 November 2016
Issue Date: October 2017
DOI: https://doi.org/10.1007/s11704-016-5250-y

Keywords

Profiles

Gang Li View author profile

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Boosting imbalanced data learning with Wiener process oversampling

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Imbalanced classification in sparse and large behaviour datasets

Learning from Imbalanced Data Streams Using Rotation-Based Ensemble Classifiers

A survey on ensemble learning

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 322 KB.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Profiles

Subscribe and save

Buy Now