Scalable Machine Learning Techniques for Highly Imbalanced Credit Card Fraud Detection: A Comparative Study

Mohammed, Rafiq Ahmed; Wong, Kok-Wai; Shiratuddin, Mohd Fairuz; Wang, Xuequn

doi:10.1007/978-3-319-97310-4_27

Rafiq Ahmed Mohammed¹⁵,
Kok-Wai Wong¹⁵,
Mohd Fairuz Shiratuddin¹⁵ &
…
Xuequn Wang¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11013))

Included in the following conference series:

Pacific Rim International Conference on Artificial Intelligence

4210 Accesses
24 Citations

Abstract

In the real world of credit card fraud detection, due to a minority of fraud related transactions, has created a class imbalance problem. With the increase of transactions at massive scale, the imbalanced data is immense and has created a challenging issue on how well Machine Learning (ML) techniques can scale up to efficiently learn to detect fraud from the massive incoming data and to respond faster with high prediction accuracy and reduced misclassification costs. This paper is based on experiments that compared several popular ML techniques and investigated their suitability as a “scalable algorithm” when working with highly imbalanced massive or “Big” datasets. The experiments were conducted on two highly imbalanced datasets using Random Forest, Balanced Bagging Ensemble, and Gaussian Naïve Bayes. We observed that many detection algorithms performed well with medium-sized dataset but struggled to maintain similar predictions when it is massive.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Article Google Scholar
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)
MATH Google Scholar
Juszczak, P., et al.: Off-the-peg and bespoke classifiers for fraud detection. Comput. Stat. Data Anal. 52(9), 4521–4532 (2008)
Article MathSciNet Google Scholar
Dal Pozzolo, A., Caelen, O., Bontempi, G.: When is undersampling effective in unbalanced classification tasks? In: Appice, A., Rodrigues, P.P., Santos Costa, V., Soares, C., Gama, J., Jorge, A. (eds.) ECML PKDD 2015. LNCS (LNAI), vol. 9284, pp. 200–215. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23528-8_13
Chapter Google Scholar
Ali, A., Shamsuddin, S.M., Ralescu, A.L.: Classification with class imbalance problem: a review. Int. J. Adv. Soft Comput. Appl. 7(3), 176–204 (2015)
Google Scholar
Zareapoor, M., Yang, J.: A novel strategy for mining highly imbalanced data in credit card transactions. Intell. Autom. Soft Comput. 1–7 (2017). https://doi.org/10.1080/10798587.2017.1321228, ISSN 1079-8587
Zareapoor, M., Shamsolmoali, P.: Application of credit card fraud detection: based on bagging ensemble classifier. Procedia Comput. Sci. 48, 679–685 (2015)
Article Google Scholar
Carneiro, N., Figueira, G., Costa, M.: A data mining based system for credit-card fraud detection in e-tail. Decis. Support Syst. 95, 91–101 (2017)
Article Google Scholar
PYMNTS Homepage. AI Puts Fraudulent Credit Card Testers To The Test, 21 February 2018. https://www.pymnts.com/fraud-prevention/2018/brighterion-credit-card-fraud-prevention/. Accessed 24 Mar 2018
West, J., Bhattacharya, M.: Intelligent financial fraud detection: a comprehensive review. Comput. Secur. 57, 47–66 (2016)
Article Google Scholar
Dal Pozzolo, A., et al.: Learned lessons in credit card fraud detection from a practitioner perspective. Expert Syst. Appl. 41(10), 4915–4928 (2014)
Article Google Scholar
Lu, Y., Cheung, Y.-m., Tang, Y.Y.: Hybrid sampling with bagging for class imbalance learning. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J.Z., Wang, R. (eds.) PAKDD 2016. LNCS (LNAI), vol. 9651, pp. 14–26. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31753-3_2
Chapter Google Scholar
West, J., Bhattacharya, M.: Some experimental issues in financial fraud mining. Procedia Comput. Sci. 80, 1734–1744 (2016)
Article Google Scholar
Awoyemi, J.O., Adetunmbi, A.O., Oluwadare, S.A.: Credit card fraud detection using machine learning techniques: a comparative analysis. In: 2017 International Conference on Computing Networking and Informatics (ICCNI). IEEE (2017)
Google Scholar
Liu, B., et al.: Scalable sentiment classification for big data analysis using Naive Bayes Classifier. In: 2013 IEEE International Conference on Big Data. IEEE (2013)
Google Scholar
Bolton, R.J., Hand, D.J.: Statistical fraud detection: a review. Stat. Sci. 17, 235–249 (2002)
Article MathSciNet Google Scholar
Dai, Y., et al.: Online credit card fraud detection: a hybrid framework with big data technologies. In: Trustcom/BigDataSE/I SPA, 2016 IEEE. IEEE (2016)
Google Scholar
Ryman-Tubb, N.: Understanding payment card fraud through knowledge extraction from neural networks using large-scale datasets. University of Surrey (2016)
Google Scholar
Japkowicz, N.: Class imbalances: are we focusing on the right issue. In: Workshop on Learning from Imbalanced Data Sets II (2003)
Google Scholar
Yap, B.W., Rani, K.A., Rahman, H.A.A., Fong, S., Khairudin, Z., Abdullah, N.N.: An application of oversampling, undersampling, bagging and boosting in handling imbalanced datasets. In: Herawan, T., Deris, M.M., Abawajy, J. (eds.) Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013). LNEE, vol. 285, pp. 13–22. Springer, Singapore (2014). https://doi.org/10.1007/978-981-4585-18-7_2
Chapter Google Scholar
Ma, L., Fan, S.: CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests. BMC Bioinf. 18(1), 169 (2017)
Article Google Scholar
Han, J., Liu, Y., Sun, X.: A scalable random forest algorithm based on mapreduce. In: 2013 4th IEEE International Conference on Software Engineering and Service Science (ICSESS). IEEE (2013)
Google Scholar
European Credit Card dataset. U.M.L. Group, Editor, ULB Machine Learning Group (2013). https://www.kaggle.com/mlg-ulb/creditcardfraud
ccFraud dataset, April 2013. https://packages.revolutionanalytics.com/datasets/
Akbani, R., Kwek, S., Japkowicz, N.: Applying support vector machines to imbalanced datasets. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 39–50. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30115-8_7
Chapter Google Scholar
Chawla, N.V., et al.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article Google Scholar
Galar, M., et al.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybernet. Part C (Appl. Rev.) 42(4), 463–484 (2012)
Article Google Scholar
Provost, F.: Machine learning from imbalanced data sets 101. In: Proceedings of the AAAI 2000 Workshop on Imbalanced Data Sets (2000)
Google Scholar
Fisher, W.D.: Machine Learning for the Automatic Detection of Anomalous Events. ProQuest Dissertations Publishing (2017)
Google Scholar
Géron, A.: Hands-on Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly Media Inc., Sebastopol (2017)
Google Scholar
Carcillo, F., et al.: An assessment of streaming active learning strategies for real-life credit card fraud detection. In: 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA). IEEE (2017)
Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12(Oct), 2825–2830 (2011)
MathSciNet MATH Google Scholar
Lemaitre, G., Nogueira, F., Oliveira, D., Aridas, C.: BalancedBaggingClassifier (2016). http://contrib.scikit-learn.org/imbalanced-learn/stable/generated/imblearn.ensemble.BalancedBaggingClassifier.html. Accessed 17 Mar 2018
Saito, T., Rehmsmeier, M.: The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 10(3), e0118432 (2015)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Engineering and Information Technology, Murdoch University, Perth, Australia
Rafiq Ahmed Mohammed, Kok-Wai Wong, Mohd Fairuz Shiratuddin & Xuequn Wang

Authors

Rafiq Ahmed Mohammed
View author publications
You can also search for this author in PubMed Google Scholar
Kok-Wai Wong
View author publications
You can also search for this author in PubMed Google Scholar
Mohd Fairuz Shiratuddin
View author publications
You can also search for this author in PubMed Google Scholar
Xuequn Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Rafiq Ahmed Mohammed or Kok-Wai Wong .

Editor information

Editors and Affiliations

Southeast University, Nanjing, China
Xin Geng
University of Tasmania, Hobart, Tasmania, Australia
Byeong-Ho Kang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mohammed, R.A., Wong, KW., Shiratuddin, M.F., Wang, X. (2018). Scalable Machine Learning Techniques for Highly Imbalanced Credit Card Fraud Detection: A Comparative Study. In: Geng, X., Kang, BH. (eds) PRICAI 2018: Trends in Artificial Intelligence. PRICAI 2018. Lecture Notes in Computer Science(), vol 11013. Springer, Cham. https://doi.org/10.1007/978-3-319-97310-4_27

Download citation

DOI: https://doi.org/10.1007/978-3-319-97310-4_27
Published: 27 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-97309-8
Online ISBN: 978-3-319-97310-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics