Abstract
AdaBoost is a famous ensemble learning method and has achieved successful applications in many fields. The existing studies illustrate that AdaBoost easily suffers from noisy points, resulting in a decline of classification performance. The main reason is that it increases the weights of all misclassified samples (especially noisy points) in the same way so that the influence of noisy points can hardly be weakened. In this paper, the clustering algorithm is used to dynamically decide noisy points in the process of iterations. More precisely, we compute a misclassification degree for every cluster in every iteration that is used to decide if a misclassified sample is a noisy point or not in the current iteration. Furthermore, we propose a flexible method to update the weights of the misclassified samples. The experimental results on 22 public datasets show that our method achieves better results than the state-of-the-art methods including AdaBoost, AdaCoast, LogitBoost, and SPLBoost. We also apply our method to the transactions fraud detection, and the experiments on our real big dataset of transactions also illustrate its good performance.
Similar content being viewed by others
References
Yu B, Xu Z B. A comparative study for content-based dynamic spam classification using four machine learning algorithms. Know-Based Syst, 2008, 24: 355–362
Ju W H, Vardi Y. A hybrid high-order Markov chain model for computer intrusion detection. J Comput Graph Stat, 2001, 10: 277–295
Shen L, Bai L, Bardsley D, et al. Gabor feature selection for face recognition using improved adaboost learning. In: Li S Z, Sun Z, Tan T, eds. Advances in Biometric Person Authentication. Berlin: Springer, 2005. 3781: 39–49
Panigrahi S, Kundu A, Sural S, et al. Credit card fraud detection: a fusion approach using Dempster-Shafer theory and Bayesian learning. Inf Fusion, 2009, 10: 354–363
Salzberg S L. C4.5: programs for machine learning. Mach Learn, 1994, 16: 235–240
Cortes C, Vapnik V. Support vector network. Mach Learn, 1995, 20: 273–297
Ng A Y, Jordan M I. On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes. In: Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic, Cambridge, MIT Press, 2001. 841–848
Zhou Z H. Ensemble Learning. In: Encyclopedia of Biometrics. Boston: Springer, 2009
Dietterich T G. Ensemble methods in machine learning. In: Proceedings of International Workshgp on Multiple Classifier Systems, 2000. 1857: 1–15
Freund Y, Schapire R E. A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci, 1997, 55: 119–139
Breiman L. Bagging predictors. Mach Learn, 1996, 24: 123–140
Freund Y, Schipare R E. Experiments with a new boosting algorithm. In: Proceedings of the 13th International Conference on Machine Learning, 1996. 148–156
Wei F, Stolfo S J, Zhang J X, et al. AdaCost: misclassification cost-sensitive boosting. In: Proceedings of International Conference on Machine Learning (ICML-99), Bled, 1999. 97–105
Friedman J, Hastie T, Tibshirani R. Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann Statist, 2000, 28: 337–407
Wang K, Wang Y, Zhao Q, et al. SPLBoost: an improved robust boosting algorithm based on self-paced learning. 2017. ArXiv: 1706.06341
Wong J A, Hartiganm A. Algorithm AS 136: a k-means clustering algorithm. J R Stat Soc, 1979, 28: 100–108
Breiman L. Random forests. Mach Learn, 2001, 45: 5–32
Xuan S Y, Liu G J, Li Z C, et al. Random forest for credit card fraud detection. In: Proceedings of IEEE 15th International Conference on Networking, Sensing and Control (ICNSN), Zhuhai, 2018. 27–29
Jiang C, Song J, Liu G, et al. Credit card fraud detection: a novel approach using aggregation strategy and feedback mechanism. IEEE Internet Things J, 2018, 5: 3637–3647
Zhang F J, Liu G J, Li Z C, et al. GMM-based undersampling and its application for credit card fraud detection. In: Proceedings of the 32nd International Joint Conference on Neural Network (IJCNN2019), Budapest, 2019. 14–19
Zheng L, Liu G, Yan C, et al. Transaction fraud detection based on total order relation and behavior diversity. IEEE Trans Comput Soc Syst, 2018, 5: 796–806
Acknowledgements
This work was supported in part by National Key Research and Development Program of China (Grant No. 2018YFB2100801), and Fundamental Research Funds for Central Universities of China (Grant No. 22120190198). Authors would like to thank anonymous reviewers for their constructive comments.
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Yang, C., Liu, G., Yan, C. et al. A clustering-based flexible weighting method in AdaBoost and its application to transaction fraud detection. Sci. China Inf. Sci. 64, 222101 (2021). https://doi.org/10.1007/s11432-019-2739-2
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-019-2739-2