Abstract
The prevention and crackdown of fraud calls have been paid more and more attention by industrial and academic societies. Most current researches based on machine learning ignore the imbalanced data distribution characteristic between normal and fraudulent call users, and the outputs neglect the probability fluctuation range of the suspected fraudulent calls. To overcome these limitations, we first construct user behavioral feature vector by a random forest method. Secondly, we propose a novel hierarchical sampling method to overcome the class imbalance problem. Thirdly, we propose a novel fraud call recognition method based on HPO-LGBM (the Bayesian hyper parameter optimization based on random forest and Light Gradient Boosting Machine). Finally, we further evaluate the method’s performance with a DRI (dynamic recognition interval) model. Experimental results on public datasets show that the proposed HPO-LGBM holds a 92.90% F1 value, a 91.90% AUC, a 92.92% G-means, and a 92.37% MCC in fraud call recognition. In addition, the proposed HPO-LGBM model can further give the dynamic recognition interval of the output result, behaving more robust than other models (i.e., LR, RF, MLP, GBDT, XGBOOST, LGBM).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
References
An, M.: Fraud telephone characteristics analysis and prevention. China Inf. Secur. 5, 86–89 (2014)
Zhou, C., Lin, Z.: Study on fraud detection of telecom industry based on rough set. In: Proceedings of the IEEE Annual Computing and Communication Workshop and Conference, Las Vegas, United states, pp. 15–19, January 2018
Naveen, P., Dlwan, B.: Relative analysis of ML algorithm QDA, LR and SVM for credit card fraud detection dataset. In: Proceedings of the International Conference on IoT in Social, Mobile, Analytics and Cloud, Palladam, India, pp. 976–981, October 2020
Wu, S., Li, J.: IDD fraud detection model based on decision tree and random forest. Commun. Technol. 51(12), (2018)
Pehlivanli, D., Eken, S., Ayan, E.: Detection of fraud risks in retailing sector using MLP and SVM techniques. Turk. J. Electr. Eng. Comput. Sci. 27, 3633–3647 (2019)
Lenka, S.R., Pant, M., Barik, R.K., Patra, S.S., Dubey, H.: Investigation into the efficacy of various machine learning techniques for mitigation in credit card fraud detection. In: Bhateja, V., Peng, S.L., Satapathy, S.C., Zhang, Y.D. (eds.) Evolution in Computational Intelligence. Advances in Intelligent Systems and Computing, vol. 1176. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-5788-0_24
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, United States, pp. 785–794, August 2016
Ke, G., Meng, Q., Finley, T., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Proceedings of the Advances in Neural Information Processing Systems, Long Beach, United States, pp. 3147–3155, December 2017
Olszewski, D.: A probabilistic approach to fraud detection in telecommunications. Knowl. Based Syst. 26, 246–258 (2012)
Tomek, I.: An experiment with the edited nearest-neighbor rule. IEEE Trans. Syst. Man Cybern. 06(06), 448–452 (1976)
Liu, X., Wu, J., Zhou, Z.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B Cybern. 39, 539–550 (2009)
Mani, I., Zhang, I.: KNN approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of the Workshop on Learning from Imbalanced Datasets, vol. 126 (2003)
Liu, Z., Cao, W., Gao, Z., et al.: Self-paced ensemble for highly imbalanced massive data classification. In: Proceedings of the International Conference on Data Engineering, pp. 841–852, April 2020
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
He, H., Bai, Y., Garcia, E.A., et al.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of the International Joint Conference on Neural Networks, Hongkong, pp. 1322–1328, June 2008
Batista, G.E., Bazzan, A.L., Monard, M.C.: Balancing training data for automated annotation of keywords: a case study. In: WOB, pp. 10–18 (2003)
Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newslett. 6(1), 20–29 (2004)
Zheng, Y., Li, G., Zhang, T.: An improved over-sampling algorithm based on iForest and SMOTE. In: Proceedings of the ACM International Conference on Software and Computer Applications, Penang, Malaysia, pp. 75–80, February 2019
Zhou, Z.-H.: Cost-sensitive learning. In: Torra, V., Narakawa, Y., Yin, J., Long, J. (eds.) MDAI 2011. LNCS (LNAI), vol. 6820, pp. 17–18. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22589-5_2
Yin, X., Yu, X., Sohn, K., et al.: Feature transfer learning for face recognition with under-represented data. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, pp. 5697–5706, June 2019
Fayoll, J., Moreau, F., Raymond, C., et al.: CRF-based combination of contextual features to improve a posteriori word-level confidence measures. In: Proceedings of the Annual Conference of the International Speech Communication Association, Makuhari, Japan, pp. 1942–1945 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Liu, X., Zhi, X., Mei, Q., Wang, P., Su, H., Wang, J. (2024). HPO-LGBM-DRI: Dynamic Recognition Interval Estimation for Imbalanced Fraud Call via HPO-LGBM. In: Meng, X., Zhang, X., Guo, D., Hu, D., Zheng, B., Zhang, C. (eds) Spatial Data and Intelligence. SpatialDI 2024. Lecture Notes in Computer Science, vol 14619. Springer, Singapore. https://doi.org/10.1007/978-981-97-2966-1_24
Download citation
DOI: https://doi.org/10.1007/978-981-97-2966-1_24
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2965-4
Online ISBN: 978-981-97-2966-1
eBook Packages: Computer ScienceComputer Science (R0)