skip to main content
10.1145/3340531.3412111acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

Alike and Unlike: Resolving Class Imbalance Problem in Financial Credit Risk Assessment

Published: 19 October 2020 Publication History

Abstract

Financial credit risk assessment serves as the impetus to evaluate the credit admission or potential business failure of customers in order to make early actions prior to the actual financial crisis. It aims to predict the probability that a customer may belong to a high-risk group, which is usually formulated as a binary classification problem. However, due to the lack of high-risk samples, the prevailing models suffer from the severe class-imbalance problem. Oversampling those high-risk users could alleviate this problem but the effect of noise examples is also amplified. In this paper, we propose a novel adversarial data augmentation method to solve the class imbalance problem in financial credit risk assessment. We train a generator for synthetic sample generation with a discriminator to identify real or fake instances. Besides, an auxiliary risk discriminator is trained cooperatively with the generator to assess the credit risk. Experimental results on three real-world datasets demonstrate the effectiveness of the proposed

Supplementary Material

MP4 File (3340531.3412111.mp4)
This work proposes a novel adversarial data augmentation method to solve the class imbalance problem in financial credit risk assessment. Specifically, the generator is trained against the discriminator adversarially to generate synthetic samples alike the real high-risk samples. And an auxiliary discriminator is designed to assess the risk to make synthetic samples unlike the low-risk samples. Experiments on real-world datasets provided by Alibaba Group demonstrate the effectiveness of the proposed framework.

References

[1]
Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. JAIR (2002).
[2]
Ning Chen, Bernardete Ribeiro, and An Chen. 2016. Financial credit risk assessment: a recent review. Artificial Intelligence Review (2016).
[3]
Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge Belongie. 2019. Class-balanced loss based on effective number of samples. In CVPR.
[4]
Haibo He, Yang Bai, Edwardo A Garcia, and Shutao Li. 2008. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In IJCNN.
[5]
Chen Huang, Yining Li, Chen Change Loy, and Xiaoou Tang. 2016. Learning deep representation for imbalanced classification. In CVPR.
[6]
Diederik P Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR.
[7]
Xiaoya Li, Xiaofei Sun, Yuxian Meng, Junjun Liang, Fei Wu, and Jiwei Li. 2020. Dice Loss for Data-imbalanced NLP Tasks. In ACL.
[8]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In CVPR.
[9]
Can Liu, Qiwei Zhong, Xiang Ao, Sun Li, Wangli Lin, Jinghua Feng, Qing He, and Jiayu Tang. 2020. Fraud Transactions Detection via Behavior Tree with Local Intention Calibration. In KDD.
[10]
Giovanni Mariani, Florian Scheidegger, Roxana Istrate, Costas Bekas, and Cristiano Malossi. 2018. Bagan: Data augmentation with balancing gan. In ICML Workshop.
[11]
David Nadeau and Satoshi Sekine. 2007. A survey of named entity recognition and classification. Lingvisticae Investigationes (2007).
[12]
Sameer Pradhan, Alessandro Moschitti, Nianwen Xue, Hwee Tou Ng, Anders Björkelund, Olga Uryupina, Yuchen Zhang, and Zhi Zhong. 2013. Towards robust linguistic analysis using ontonotes. In ACL.
[13]
Wentao Wang, Suhang Wang, Wenqi Fan, Zitao Liu, and Jiliang Tang. 2020. Global-and-Local Aware Data Generation for the Class Imbalance Problem. In SDM.
[14]
Ya-Lin Zhang, Jun Zhou, Wenhao Zheng, Ji Feng, Longfei Li, Ziqi Liu, Ming Li, Zhiqiang Zhang, Chaochao Chen, Xiaolong Li, et almbox. 2019. Distributed deep forest and its application to automatic detection of cash-out fraud. TIST (2019).
[15]
Qiwei Zhong, Yang Liu, Xiang Ao, Binbin Hu, Jinghua Feng, Jiayu Tang, and Qing He. 2020. Financial Defaulter Detection on Online Credit Payment via Multi-view Attributed Heterogeneous Information Network. In WWW.

Cited By

View all
  • (2025)FedGen: Personalized federated learning with data generation for enhanced model customization and class imbalanceFuture Generation Computer Systems10.1016/j.future.2024.107595164(107595)Online publication date: Mar-2025
  • (2024)Do not ignore heterogeneity and heterophily: Multi-network collaborative telecom fraud detectionExpert Systems with Applications10.1016/j.eswa.2024.124974257(124974)Online publication date: Dec-2024
  • (2024)CausalFD: causal invariance-based fraud detection against camouflaged preferenceInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02209-015:11(5053-5070)Online publication date: 27-May-2024
  • Show More Cited By

Index Terms

  1. Alike and Unlike: Resolving Class Imbalance Problem in Financial Credit Risk Assessment

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management
    October 2020
    3619 pages
    ISBN:9781450368599
    DOI:10.1145/3340531
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 October 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. class imbalance problem
    2. data augmentation
    3. generative model

    Qualifiers

    • Short-paper

    Funding Sources

    • National Natural Science Foundation of China
    • National Key Research and Development Program of China
    • Alibaba Innovative Research Program

    Conference

    CIKM '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)27
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)FedGen: Personalized federated learning with data generation for enhanced model customization and class imbalanceFuture Generation Computer Systems10.1016/j.future.2024.107595164(107595)Online publication date: Mar-2025
    • (2024)Do not ignore heterogeneity and heterophily: Multi-network collaborative telecom fraud detectionExpert Systems with Applications10.1016/j.eswa.2024.124974257(124974)Online publication date: Dec-2024
    • (2024)CausalFD: causal invariance-based fraud detection against camouflaged preferenceInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02209-015:11(5053-5070)Online publication date: 27-May-2024
    • (2024)MICA: Multi-channel Representation Refinement Contrastive Learning for Graph Fraud DetectionWeb and Big Data10.1007/978-981-97-2421-5_3(31-46)Online publication date: 12-May-2024
    • (2023)Enabling Graph Neural Networks for Semi-Supervised Risk Prediction in Online Credit Loan ServicesACM Transactions on Intelligent Systems and Technology10.1145/3623401Online publication date: 21-Sep-2023
    • (2023)Generative Graph Augmentation for Minority Class in Fraud DetectionProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615255(4200-4204)Online publication date: 21-Oct-2023
    • (2023)Removing Camouflage and Revealing Collusion: Leveraging Gang-crime Pattern in Fraudster DetectionProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599895(5104-5115)Online publication date: 6-Aug-2023
    • (2023)QTIAH-GNN: Quantity and Topology Imbalance-aware Heterogeneous Graph Neural Network for Bankruptcy PredictionProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599479(1572-1582)Online publication date: 6-Aug-2023
    • (2023)Hierarchical Dense Pattern Detection in TensorsACM Transactions on Knowledge Discovery from Data10.1145/357702217:6(1-29)Online publication date: 28-Feb-2023
    • (2023)Location-Adaptive Generative Graph Augmentation for Fraud Detection2023 IEEE 5th International Conference on Cognitive Machine Intelligence (CogMI)10.1109/CogMI58952.2023.00014(24-30)Online publication date: 1-Nov-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media