Skip to main content

Scalable Machine Learning Techniques for Highly Imbalanced Credit Card Fraud Detection: A Comparative Study

  • Conference paper
  • First Online:
PRICAI 2018: Trends in Artificial Intelligence (PRICAI 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11013))

Included in the following conference series:

Abstract

In the real world of credit card fraud detection, due to a minority of fraud related transactions, has created a class imbalance problem. With the increase of transactions at massive scale, the imbalanced data is immense and has created a challenging issue on how well Machine Learning (ML) techniques can scale up to efficiently learn to detect fraud from the massive incoming data and to respond faster with high prediction accuracy and reduced misclassification costs. This paper is based on experiments that compared several popular ML techniques and investigated their suitability as a “scalable algorithm” when working with highly imbalanced massive or “Big” datasets. The experiments were conducted on two highly imbalanced datasets using Random Forest, Balanced Bagging Ensemble, and Gaussian Naïve Bayes. We observed that many detection algorithms performed well with medium-sized dataset but struggled to maintain similar predictions when it is massive.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)

    Article  Google Scholar 

  2. Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)

    MATH  Google Scholar 

  3. Juszczak, P., et al.: Off-the-peg and bespoke classifiers for fraud detection. Comput. Stat. Data Anal. 52(9), 4521–4532 (2008)

    Article  MathSciNet  Google Scholar 

  4. Dal Pozzolo, A., Caelen, O., Bontempi, G.: When is undersampling effective in unbalanced classification tasks? In: Appice, A., Rodrigues, P.P., Santos Costa, V., Soares, C., Gama, J., Jorge, A. (eds.) ECML PKDD 2015. LNCS (LNAI), vol. 9284, pp. 200–215. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23528-8_13

    Chapter  Google Scholar 

  5. Ali, A., Shamsuddin, S.M., Ralescu, A.L.: Classification with class imbalance problem: a review. Int. J. Adv. Soft Comput. Appl. 7(3), 176–204 (2015)

    Google Scholar 

  6. Zareapoor, M., Yang, J.: A novel strategy for mining highly imbalanced data in credit card transactions. Intell. Autom. Soft Comput. 1–7 (2017). https://doi.org/10.1080/10798587.2017.1321228, ISSN 1079-8587

  7. Zareapoor, M., Shamsolmoali, P.: Application of credit card fraud detection: based on bagging ensemble classifier. Procedia Comput. Sci. 48, 679–685 (2015)

    Article  Google Scholar 

  8. Carneiro, N., Figueira, G., Costa, M.: A data mining based system for credit-card fraud detection in e-tail. Decis. Support Syst. 95, 91–101 (2017)

    Article  Google Scholar 

  9. PYMNTS Homepage. AI Puts Fraudulent Credit Card Testers To The Test, 21 February 2018. https://www.pymnts.com/fraud-prevention/2018/brighterion-credit-card-fraud-prevention/. Accessed 24 Mar 2018

  10. West, J., Bhattacharya, M.: Intelligent financial fraud detection: a comprehensive review. Comput. Secur. 57, 47–66 (2016)

    Article  Google Scholar 

  11. Dal Pozzolo, A., et al.: Learned lessons in credit card fraud detection from a practitioner perspective. Expert Syst. Appl. 41(10), 4915–4928 (2014)

    Article  Google Scholar 

  12. Lu, Y., Cheung, Y.-m., Tang, Y.Y.: Hybrid sampling with bagging for class imbalance learning. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J.Z., Wang, R. (eds.) PAKDD 2016. LNCS (LNAI), vol. 9651, pp. 14–26. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31753-3_2

    Chapter  Google Scholar 

  13. West, J., Bhattacharya, M.: Some experimental issues in financial fraud mining. Procedia Comput. Sci. 80, 1734–1744 (2016)

    Article  Google Scholar 

  14. Awoyemi, J.O., Adetunmbi, A.O., Oluwadare, S.A.: Credit card fraud detection using machine learning techniques: a comparative analysis. In: 2017 International Conference on Computing Networking and Informatics (ICCNI). IEEE (2017)

    Google Scholar 

  15. Liu, B., et al.: Scalable sentiment classification for big data analysis using Naive Bayes Classifier. In: 2013 IEEE International Conference on Big Data. IEEE (2013)

    Google Scholar 

  16. Bolton, R.J., Hand, D.J.: Statistical fraud detection: a review. Stat. Sci. 17, 235–249 (2002)

    Article  MathSciNet  Google Scholar 

  17. Dai, Y., et al.: Online credit card fraud detection: a hybrid framework with big data technologies. In: Trustcom/BigDataSE/I​ SPA, 2016 IEEE. IEEE (2016)

    Google Scholar 

  18. Ryman-Tubb, N.: Understanding payment card fraud through knowledge extraction from neural networks using large-scale datasets. University of Surrey (2016)

    Google Scholar 

  19. Japkowicz, N.: Class imbalances: are we focusing on the right issue. In: Workshop on Learning from Imbalanced Data Sets II (2003)

    Google Scholar 

  20. Yap, B.W., Rani, K.A., Rahman, H.A.A., Fong, S., Khairudin, Z., Abdullah, N.N.: An application of oversampling, undersampling, bagging and boosting in handling imbalanced datasets. In: Herawan, T., Deris, M.M., Abawajy, J. (eds.) Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013). LNEE, vol. 285, pp. 13–22. Springer, Singapore (2014). https://doi.org/10.1007/978-981-4585-18-7_2

    Chapter  Google Scholar 

  21. Ma, L., Fan, S.: CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests. BMC Bioinf. 18(1), 169 (2017)

    Article  Google Scholar 

  22. Han, J., Liu, Y., Sun, X.: A scalable random forest algorithm based on mapreduce. In: 2013 4th IEEE International Conference on Software Engineering and Service Science (ICSESS). IEEE (2013)

    Google Scholar 

  23. European Credit Card dataset. U.M.L. Group, Editor, ULB Machine Learning Group (2013). https://www.kaggle.com/mlg-ulb/creditcardfraud

  24. ccFraud dataset, April 2013. https://packages.revolutionanalytics.com/datasets/

  25. Akbani, R., Kwek, S., Japkowicz, N.: Applying support vector machines to imbalanced datasets. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 39–50. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30115-8_7

    Chapter  Google Scholar 

  26. Chawla, N.V., et al.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  Google Scholar 

  27. Galar, M., et al.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybernet. Part C (Appl. Rev.) 42(4), 463–484 (2012)

    Article  Google Scholar 

  28. Provost, F.: Machine learning from imbalanced data sets 101. In: Proceedings of the AAAI 2000 Workshop on Imbalanced Data Sets (2000)

    Google Scholar 

  29. Fisher, W.D.: Machine Learning for the Automatic Detection of Anomalous Events. ProQuest Dissertations Publishing (2017)

    Google Scholar 

  30. Géron, A.: Hands-on Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly Media Inc., Sebastopol (2017)

    Google Scholar 

  31. Carcillo, F., et al.: An assessment of streaming active learning strategies for real-life credit card fraud detection. In: 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA). IEEE (2017)

    Google Scholar 

  32. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12(Oct), 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  33. Lemaitre, G., Nogueira, F., Oliveira, D., Aridas, C.: BalancedBaggingClassifier (2016). http://contrib.scikit-learn.org/imbalanced-learn/stable/generated/imblearn.ensemble.BalancedBaggingClassifier.html. Accessed 17 Mar 2018

  34. Saito, T., Rehmsmeier, M.: The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 10(3), e0118432 (2015)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Rafiq Ahmed Mohammed or Kok-Wai Wong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mohammed, R.A., Wong, KW., Shiratuddin, M.F., Wang, X. (2018). Scalable Machine Learning Techniques for Highly Imbalanced Credit Card Fraud Detection: A Comparative Study. In: Geng, X., Kang, BH. (eds) PRICAI 2018: Trends in Artificial Intelligence. PRICAI 2018. Lecture Notes in Computer Science(), vol 11013. Springer, Cham. https://doi.org/10.1007/978-3-319-97310-4_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-97310-4_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-97309-8

  • Online ISBN: 978-3-319-97310-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics