Skip to main content

HPO-LGBM-DRI: Dynamic Recognition Interval Estimation for Imbalanced Fraud Call via HPO-LGBM

  • Conference paper
  • First Online:
Spatial Data and Intelligence (SpatialDI 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14619))

Included in the following conference series:

Abstract

The prevention and crackdown of fraud calls have been paid more and more attention by industrial and academic societies. Most current researches based on machine learning ignore the imbalanced data distribution characteristic between normal and fraudulent call users, and the outputs neglect the probability fluctuation range of the suspected fraudulent calls. To overcome these limitations, we first construct user behavioral feature vector by a random forest method. Secondly, we propose a novel hierarchical sampling method to overcome the class imbalance problem. Thirdly, we propose a novel fraud call recognition method based on HPO-LGBM (the Bayesian hyper parameter optimization based on random forest and Light Gradient Boosting Machine). Finally, we further evaluate the method’s performance with a DRI (dynamic recognition interval) model. Experimental results on public datasets show that the proposed HPO-LGBM holds a 92.90% F1 value, a 91.90% AUC, a 92.92% G-means, and a 92.37% MCC in fraud call recognition. In addition, the proposed HPO-LGBM model can further give the dynamic recognition interval of the output result, behaving more robust than other models (i.e., LR, RF, MLP, GBDT, XGBOOST, LGBM).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/khznxn/TF-Dataset.

  2. 2.

    https://www.kaggle.com/janiobachmann/bank-marketing-dataset.

  3. 3.

    http://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients.

  4. 4.

    http://archive.ics.uci.edu/ml/datasets/Haberman%27s+Survival.

  5. 5.

    http://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29.

  6. 6.

    https://storage.googleapis.com/kagglesdsdata/datasets/14370/19291/pima-indians-diabetes.csv.

References

  1. An, M.: Fraud telephone characteristics analysis and prevention. China Inf. Secur. 5, 86–89 (2014)

    Google Scholar 

  2. Zhou, C., Lin, Z.: Study on fraud detection of telecom industry based on rough set. In: Proceedings of the IEEE Annual Computing and Communication Workshop and Conference, Las Vegas, United states, pp. 15–19, January 2018

    Google Scholar 

  3. Naveen, P., Dlwan, B.: Relative analysis of ML algorithm QDA, LR and SVM for credit card fraud detection dataset. In: Proceedings of the International Conference on IoT in Social, Mobile, Analytics and Cloud, Palladam, India, pp. 976–981, October 2020

    Google Scholar 

  4. Wu, S., Li, J.: IDD fraud detection model based on decision tree and random forest. Commun. Technol. 51(12), (2018)

    Google Scholar 

  5. Pehlivanli, D., Eken, S., Ayan, E.: Detection of fraud risks in retailing sector using MLP and SVM techniques. Turk. J. Electr. Eng. Comput. Sci. 27, 3633–3647 (2019)

    Article  Google Scholar 

  6. Lenka, S.R., Pant, M., Barik, R.K., Patra, S.S., Dubey, H.: Investigation into the efficacy of various machine learning techniques for mitigation in credit card fraud detection. In: Bhateja, V., Peng, S.L., Satapathy, S.C., Zhang, Y.D. (eds.) Evolution in Computational Intelligence. Advances in Intelligent Systems and Computing, vol. 1176. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-5788-0_24

  7. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, United States, pp. 785–794, August 2016

    Google Scholar 

  8. Ke, G., Meng, Q., Finley, T., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Proceedings of the Advances in Neural Information Processing Systems, Long Beach, United States, pp. 3147–3155, December 2017

    Google Scholar 

  9. Olszewski, D.: A probabilistic approach to fraud detection in telecommunications. Knowl. Based Syst. 26, 246–258 (2012)

    Article  Google Scholar 

  10. Tomek, I.: An experiment with the edited nearest-neighbor rule. IEEE Trans. Syst. Man Cybern. 06(06), 448–452 (1976)

    Google Scholar 

  11. Liu, X., Wu, J., Zhou, Z.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B Cybern. 39, 539–550 (2009)

    Article  Google Scholar 

  12. Mani, I., Zhang, I.: KNN approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of the Workshop on Learning from Imbalanced Datasets, vol. 126 (2003)

    Google Scholar 

  13. Liu, Z., Cao, W., Gao, Z., et al.: Self-paced ensemble for highly imbalanced massive data classification. In: Proceedings of the International Conference on Data Engineering, pp. 841–852, April 2020

    Google Scholar 

  14. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  Google Scholar 

  15. He, H., Bai, Y., Garcia, E.A., et al.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of the International Joint Conference on Neural Networks, Hongkong, pp. 1322–1328, June 2008

    Google Scholar 

  16. Batista, G.E., Bazzan, A.L., Monard, M.C.: Balancing training data for automated annotation of keywords: a case study. In: WOB, pp. 10–18 (2003)

    Google Scholar 

  17. Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newslett. 6(1), 20–29 (2004)

    Article  Google Scholar 

  18. Zheng, Y., Li, G., Zhang, T.: An improved over-sampling algorithm based on iForest and SMOTE. In: Proceedings of the ACM International Conference on Software and Computer Applications, Penang, Malaysia, pp. 75–80, February 2019

    Google Scholar 

  19. Zhou, Z.-H.: Cost-sensitive learning. In: Torra, V., Narakawa, Y., Yin, J., Long, J. (eds.) MDAI 2011. LNCS (LNAI), vol. 6820, pp. 17–18. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22589-5_2

    Chapter  Google Scholar 

  20. Yin, X., Yu, X., Sohn, K., et al.: Feature transfer learning for face recognition with under-represented data. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, pp. 5697–5706, June 2019

    Google Scholar 

  21. Fayoll, J., Moreau, F., Raymond, C., et al.: CRF-based combination of contextual features to improve a posteriori word-level confidence measures. In: Proceedings of the Annual Conference of the International Speech Communication Association, Makuhari, Japan, pp. 1942–1945 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoying Zhi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, X., Zhi, X., Mei, Q., Wang, P., Su, H., Wang, J. (2024). HPO-LGBM-DRI: Dynamic Recognition Interval Estimation for Imbalanced Fraud Call via HPO-LGBM. In: Meng, X., Zhang, X., Guo, D., Hu, D., Zheng, B., Zhang, C. (eds) Spatial Data and Intelligence. SpatialDI 2024. Lecture Notes in Computer Science, vol 14619. Springer, Singapore. https://doi.org/10.1007/978-981-97-2966-1_24

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-2966-1_24

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-2965-4

  • Online ISBN: 978-981-97-2966-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics