Abstract
Credit card churn prediction, insurance fraud detection, and loan default prediction are all critical analytical customer relationship management (ACRM) problems. Since these events occur infrequently, datasets for these problems are highly unbalanced. Consequently, when trained on such unbalanced datasets, all machine learning classifiers tend to produce high false positive rates. We propose two methods for data balancing. To oversample the minority class, we proposed an innovative GAN called chaoticGAN, where we employed chaotic noise as input for the generator. We also employed the traditional GAN (Goodfellow et al. in Adv Neural Inf Process Syst, 2014. https://doi.org/10.1145/3422622), Wasserstein GAN (Arjovsky et al. in Wasserstein GAN, 2017. https://arxiv.org/abs/1701.07875), and CTGAN (Xu et al. in Modeling Tabular Data using Conditional GAN. https://arxiv.org/pdf/1907.00503) independently for baseline comparison. On the data balanced by GANs, we employed a host of machine learning classifiers, including Random Forest, Decision Tree, Support Vector Machine (SVM), Logistic Regression (LR), multi-layer perceptron (MLP) and Light gradient boosting machine (LGBM) to demonstrate the efficacy of our approaches. In the second approach, we augment the oversampled synthetic minority class data obtained by GAN and its variants with the undersampled majority class data obtained by one class support vector machine (OCSVM) (Tax et al. in Mach Learn 54:45–66, 2014). We passed the entire modified dataset to build the classifiers. Our proposed approaches outperform earlier studies on the same datasets in terms of the area under the ROC curve (AUC). Further, our proposed chaoticGAN and its hybrid turned out to be statistically similar to the state-of-the-art CTGAN on all datasets while being significant over other methods w.r.t AUC over tenfold cross-validation.



Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Data availability
The Credit card churn prediction and the auto insurance fraud detection datasets analysed during the current study cannot be shared as authors have no permission to do so. However, loan default prediction dataset, which is publicly available can be obtained from the corresponding author on reasonable request.
References
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial networks. Adv Neural Inf Process Syst. https://doi.org/10.1145/3422622
Arjovsky M, Chintala S, Bottou L (2017) Wasserstein GAN. https://arxiv.org/abs/1701.07875
Xu L, Skoularidou M, Cuesta-Infante A, Veeramachaneni K (2019) Modeling tabular data using conditional GAN. https://arxiv.org/pdf/1907.00503
Tax DM, Duin RP (2004) Support vector data description. Mach Learn 54:45–66. https://doi.org/10.1023/B:MACH.0000008084.60811.49
Kumar V, Reinartz W (2018) Customer relationship management: concept, strategy, and tools. Springer-Verlag GmbH, Germany
Gangwar AK, Ravi V (2019) Generative adversarial network for oversampling data in credit card fraud detection. In: ICISS, Hyderabad, India pp 123–134
Sisodia DS, Reddy NK (2017) Performance evaluation of class balancing techniques for credit card fraud detection. In: 2017 IEEE international conference on power, control, signals and instrumentation engineering (ICPCSI), pp 2747–2752
Randhawa K, Chu Kiong L, Seera M, Lim C, Nandi A (2018) Credit card fraud detection using AdaBoost and majority voting. IEEE Access. https://doi.org/10.1109/ACCESS.2018.2806420
Dos Santos Tanaka FHK, Aranha C (2019) Data augmentation using GAN. https://arxiv.org/abs/1904.09135
Motinni A, Lheritier A, Acuna-Agost R (2018) Airline passenger name record generation using generative adversarial networks. https://arxiv.org/abs/1807.06657
Fiore U, Santis AD, Perla F, Zanetti P, Palmieri F (2019) Using generative adversarial networks for improving classification effectiveness in credit card fraud detection. Inf Sci 479:448–455
Vega-Marquez B, Rubio-Escudero C, Riquelme J, Nepomuceno-Chamorro C (2020) Creation of synthetic data with conditional generative adversarial networks. In: SOCO 2019. AISC. Springer, Cham pp 231–240
Che T, Li Y, Zhang R, Hjelm RD, Li W, Song Y, Bengio Y (2017) Maximum-likelihood augmented discrete generative adversarial networks. https://arxiv.org/abs/1702.07983
Kusner MJ, Hernández-Lobato (2016) JM GANs for sequences of discrete elements with the gumbel-softmax distribution. https://arxiv.org/abs/1611.04051
Ping H, Stoyanovich J, Howe B (2017) Data synthesizer: privacy-preserving synthetic datasets. In: Proceedings of the 29th international conference on scientific and statistical database management. ACM, p 42
Esteban C, Hyland SL, Rätsch G (2017) Real-valued (medical) time series generation with recurrent conditional GANs. https://arxiv.org/abs/1706.02633
Camino R, Hammer-schmidt C (2018) State R Generating multi-categorical samples with generative adversarial networks. https://arxiv.org/abs/1807.01202
Choi E, Biswal S, Malin B, Duke J, Stewart WF, Sun J (2017) Generating multi-label discrete patient records using generative adversarial networks. https://arxiv.org/abs/1703.06490
Patel S, Kakadiya A, Mehta M, Derasari R, Patel R, Gandhi R (2018) Correlated discrete data generation using adversarial training. https://arxiv.org/abs/1804.00925
Park N, Mohammadi M, Gorde K, Jajodia S, Park H, Kim Y (2018) Data synthesis based on generative adversarial networks. Proc VLDB Endow 11(10):1071–1083
Xu L, Veeramachaneni K (2018) Synthesizing tabular data using generative adversarial networks. https://arxiv.org/pdf/1811.11264
Smith KA, Gupta JN (2000) Neural networks in business: techniques and applications for the operations researcher. Comput Oper Res 27(11–12):1023–1044
Ferreira JB, Vellasco M, Pacheco MA, Barbosa CH (2004) Data mining techniques on the evaluation of wireless churn. In: (ESANN’2004). Proceedings european symposium on artificial neural networks bruges (Belgium), d-sidepublication ISBN 2-930307-04-8, pp 483–488
Kumar DA, Ravi V (2008) Predicting credit card customer churn in banks using data mining. Int J Data Anal Tech Strat 1(1):4–28
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Larivie’re B, den Poel (2018) DV Investigating the role of product features in preventing customer churn, by using survival analysis and choice modelling: the case of financial services. Expert Syst Appl 27(2):277–285
Ali OG, ArÕtürk U (2014) Dynamic churn prediction framework with more effective use of rare event data: the case of private banking. Expert Syst Appl 41(17):7880–7903
Verbeke W, Martens D, Mues C, Baesens B (2011) Building comprehensible customer churn prediction models with advanced rule induction techniques. Expert Syst Appl 38(3):2354–2364
Tsai CF, Lu YH (2009) Customer churn prediction by hybrid neural networks. Expert Syst Appl 36(10):12547–12553
Sundarkumar GG, Ravi V (2015) A novel hybrid under-sampling method for mining unbalanced datasets in banking and insurance. Eng Appl Artif Intell 37:368–377
Sundarkumar GG, Ravi V, Siddeshwar V (2015) One-class support vector machine based under-sampling: application to churn prediction and insurance fraud detection. In: 2015 IEEE international conference on computational intelligence and computing research
Farquad MAH, Ravi V, Bapi Raju S (2011) Analytical CRM in banking and finance using SVM: a modified active learning-based rule extraction approach. Int J Electron Cust Relatsh Manag 6(1):48–73
Phua C, Damminda A, Lee V (2004) Minority report in fraud detection: classification of skewed data Issue on Imbalanced datasets. SIGKDD Explor 6(1):50-S9
Sublej L, Furlan S, Bajec M (2011) An expert system for detecting automobile insurance fraud using network analysis. Expert Syst Appl 38(1):1039–1042
Lorenz EN (1963) Deterministic nonperiodic flow. J Atmos Sci 20(2):130–141
Dhanya CT, Nagesh Kumar D (2010) Nonlinear ensemble prediction of chaotic daily rainfall. Adv Water Resour 33(3):327–347
Packard NH, Crutchfield JP, Farmer JD, Shaw RS (1980) Geometry from a time series. Phys Rev Lett 45:712
Qasim OS, Thanoon A, Algamal ZY (2020) Feature selection based on chaotic binary black hole algorithm for data classification. Chem Intell Lab Syst 204:104104
Ahmed AE, Mohamed AA, Aboul EH (2019) Chaotic multi-verse optimizer-based feature selection. Neural Comput Appl 31(4):991–1006
Hu J, Heidari AA, Zhang L, Xue X, Gui W, Chen H, Pan Z (2021) Chaotic diffusion‐limited aggregation enhanced grey wolf optimizer: Insights, analysis, binarization, and feature selection. Int J Intell Syst 1–64
Schölkopf B, Williamson RC, Smola A, Shawe-Taylor J, Platt J (1999) Support vector method for novelty detection. Adv Neural Inf Process Syst 12
Jais I, Ismail A, Nisa SQ (2019) Adam optimization algorithm for wide and deep neural network. Knowl Eng Data Sci 2:41. https://doi.org/10.17977/um018v2i12019p41-46
Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, Kudlur M, Levenberg J, Monga R, Moore S, Murray D, Steiner B, Tucker P, Vasudevan V, Warden P, Zhang X (2016) TensorFlow: a system for large-scale machine learning
Pedregosa F, Varoquaux G, Gramfort A, Thirion MB, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. 12(85): 2825–2830
Vasu M, Ravi V (2011) A hybrid under-sampling approach for mining unbalanced datasets: application to Banking and insurance. Int J Data Min Model Manag 3(1):75–105
Mudholkar GS, Hutson AD (1996) The exponentiated Weibull family: some properties and a flood data application. Commun Stat Theory Methods 25:3059–3083
KStest-https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kstest.html
Arik SO, Pfister T (2020) TabNet: attentive interpretable tabular learning. https://arxiv.org/abs/1908.07442
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article. Further, authors comply with the ethical standards of the Journal.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kate, P., Ravi, V. & Gangwar, A. FinGAN: Chaotic generative adversarial network for analytical customer relationship management in banking and insurance. Neural Comput & Applic 35, 6015–6028 (2023). https://doi.org/10.1007/s00521-022-07968-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07968-x