HPO-LGBM-DRI: Dynamic Recognition Interval Estimation for Imbalanced Fraud Call via HPO-LGBM

Liu, Xiliang; Zhi, Xiaoying; Mei, Qiang; Wang, Peng; Su, Haoru; Wang, Jiayi

doi:10.1007/978-981-97-2966-1_24

Xiliang Liu¹³,
Xiaoying Zhi¹³,
Qiang Mei¹⁴,
Peng Wang¹⁵,
Haoru Su¹³ &
…
Jiayi Wang¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14619))

Included in the following conference series:

International Conference on Spatial Data and Intelligence

Abstract

The prevention and crackdown of fraud calls have been paid more and more attention by industrial and academic societies. Most current researches based on machine learning ignore the imbalanced data distribution characteristic between normal and fraudulent call users, and the outputs neglect the probability fluctuation range of the suspected fraudulent calls. To overcome these limitations, we first construct user behavioral feature vector by a random forest method. Secondly, we propose a novel hierarchical sampling method to overcome the class imbalance problem. Thirdly, we propose a novel fraud call recognition method based on HPO-LGBM (the Bayesian hyper parameter optimization based on random forest and Light Gradient Boosting Machine). Finally, we further evaluate the method’s performance with a DRI (dynamic recognition interval) model. Experimental results on public datasets show that the proposed HPO-LGBM holds a 92.90% F1 value, a 91.90% AUC, a 92.92% G-means, and a 92.37% MCC in fraud call recognition. In addition, the proposed HPO-LGBM model can further give the dynamic recognition interval of the output result, behaving more robust than other models (i.e., LR, RF, MLP, GBDT, XGBOOST, LGBM).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Oversampling Methods to Handle the Class Imbalance Problem: A Review

Auto-Insurance Fraud Detection Using Machine Learning Classification Models

Weighted XGBoost Based Active Learning Framework for Fraud Detection with Using Small Number of Samples from Imbalanced Dataset

Notes

References

An, M.: Fraud telephone characteristics analysis and prevention. China Inf. Secur. 5, 86–89 (2014)
Google Scholar
Zhou, C., Lin, Z.: Study on fraud detection of telecom industry based on rough set. In: Proceedings of the IEEE Annual Computing and Communication Workshop and Conference, Las Vegas, United states, pp. 15–19, January 2018
Google Scholar
Naveen, P., Dlwan, B.: Relative analysis of ML algorithm QDA, LR and SVM for credit card fraud detection dataset. In: Proceedings of the International Conference on IoT in Social, Mobile, Analytics and Cloud, Palladam, India, pp. 976–981, October 2020
Google Scholar
Wu, S., Li, J.: IDD fraud detection model based on decision tree and random forest. Commun. Technol. 51(12), (2018)
Google Scholar
Pehlivanli, D., Eken, S., Ayan, E.: Detection of fraud risks in retailing sector using MLP and SVM techniques. Turk. J. Electr. Eng. Comput. Sci. 27, 3633–3647 (2019)
Article Google Scholar
Lenka, S.R., Pant, M., Barik, R.K., Patra, S.S., Dubey, H.: Investigation into the efficacy of various machine learning techniques for mitigation in credit card fraud detection. In: Bhateja, V., Peng, S.L., Satapathy, S.C., Zhang, Y.D. (eds.) Evolution in Computational Intelligence. Advances in Intelligent Systems and Computing, vol. 1176. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-5788-0_24
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, United States, pp. 785–794, August 2016
Google Scholar
Ke, G., Meng, Q., Finley, T., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Proceedings of the Advances in Neural Information Processing Systems, Long Beach, United States, pp. 3147–3155, December 2017
Google Scholar
Olszewski, D.: A probabilistic approach to fraud detection in telecommunications. Knowl. Based Syst. 26, 246–258 (2012)
Article Google Scholar
Tomek, I.: An experiment with the edited nearest-neighbor rule. IEEE Trans. Syst. Man Cybern. 06(06), 448–452 (1976)
Google Scholar
Liu, X., Wu, J., Zhou, Z.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B Cybern. 39, 539–550 (2009)
Article Google Scholar
Mani, I., Zhang, I.: KNN approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of the Workshop on Learning from Imbalanced Datasets, vol. 126 (2003)
Google Scholar
Liu, Z., Cao, W., Gao, Z., et al.: Self-paced ensemble for highly imbalanced massive data classification. In: Proceedings of the International Conference on Data Engineering, pp. 841–852, April 2020
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article Google Scholar
He, H., Bai, Y., Garcia, E.A., et al.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of the International Joint Conference on Neural Networks, Hongkong, pp. 1322–1328, June 2008
Google Scholar
Batista, G.E., Bazzan, A.L., Monard, M.C.: Balancing training data for automated annotation of keywords: a case study. In: WOB, pp. 10–18 (2003)
Google Scholar
Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newslett. 6(1), 20–29 (2004)
Article Google Scholar
Zheng, Y., Li, G., Zhang, T.: An improved over-sampling algorithm based on iForest and SMOTE. In: Proceedings of the ACM International Conference on Software and Computer Applications, Penang, Malaysia, pp. 75–80, February 2019
Google Scholar
Zhou, Z.-H.: Cost-sensitive learning. In: Torra, V., Narakawa, Y., Yin, J., Long, J. (eds.) MDAI 2011. LNCS (LNAI), vol. 6820, pp. 17–18. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22589-5_2
Chapter Google Scholar
Yin, X., Yu, X., Sohn, K., et al.: Feature transfer learning for face recognition with under-represented data. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, pp. 5697–5706, June 2019
Google Scholar
Fayoll, J., Moreau, F., Raymond, C., et al.: CRF-based combination of contextual features to improve a posteriori word-level confidence measures. In: Proceedings of the Annual Conference of the International Speech Communication Association, Makuhari, Japan, pp. 1942–1945 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Information Technology, Beijing University of Technology, Beijing, 100124, China
Xiliang Liu, Xiaoying Zhi, Haoru Su & Jiayi Wang
Navigation Institute, Jimei University, Xiamen, 361000, China
Qiang Mei
Key Laboratory of the Ministry of Education, Hainan Normal University, Hainan, 570203, China
Peng Wang

Authors

Xiliang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoying Zhi
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Mei
View author publications
You can also search for this author in PubMed Google Scholar
Peng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Haoru Su
View author publications
You can also search for this author in PubMed Google Scholar
Jiayi Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoying Zhi .

Editor information

Editors and Affiliations

Renmin University of China, Beijing, China
Xiaofeng Meng
Nanjing Normal University, Nanjing, China
Xueying Zhang
Beijing University of Chemical Technology, Beijing, China
Danhuai Guo
Nanjing Normal University, Nanjing, China
Di Hu
Huazhong University of Science and Technology, Wuhan, China
Bolong Zheng
Hefei University of Technology, Hefei, China
Chunju Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, X., Zhi, X., Mei, Q., Wang, P., Su, H., Wang, J. (2024). HPO-LGBM-DRI: Dynamic Recognition Interval Estimation for Imbalanced Fraud Call via HPO-LGBM. In: Meng, X., Zhang, X., Guo, D., Hu, D., Zheng, B., Zhang, C. (eds) Spatial Data and Intelligence. SpatialDI 2024. Lecture Notes in Computer Science, vol 14619. Springer, Singapore. https://doi.org/10.1007/978-981-97-2966-1_24

Download citation

DOI: https://doi.org/10.1007/978-981-97-2966-1_24
Published: 30 April 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2965-4
Online ISBN: 978-981-97-2966-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics