short-paper

Alike and Unlike: Resolving Class Imbalance Problem in Financial Credit Risk Assessment

Authors:

Qing HeAuthors Info & Claims

CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management

Pages 2125 - 2128

https://doi.org/10.1145/3340531.3412111

Published: 19 October 2020 Publication History

Abstract

Financial credit risk assessment serves as the impetus to evaluate the credit admission or potential business failure of customers in order to make early actions prior to the actual financial crisis. It aims to predict the probability that a customer may belong to a high-risk group, which is usually formulated as a binary classification problem. However, due to the lack of high-risk samples, the prevailing models suffer from the severe class-imbalance problem. Oversampling those high-risk users could alleviate this problem but the effect of noise examples is also amplified. In this paper, we propose a novel adversarial data augmentation method to solve the class imbalance problem in financial credit risk assessment. We train a generator for synthetic sample generation with a discriminator to identify real or fake instances. Besides, an auxiliary risk discriminator is trained cooperatively with the generator to assess the credit risk. Experimental results on three real-world datasets demonstrate the effectiveness of the proposed

Supplementary Material

MP4 File (3340531.3412111.mp4)

This work proposes a novel adversarial data augmentation method to solve the class imbalance problem in financial credit risk assessment. Specifically, the generator is trained against the discriminator adversarially to generate synthetic samples alike the real high-risk samples. And an auxiliary discriminator is designed to assess the risk to make synthetic samples unlike the low-risk samples. Experiments on real-world datasets provided by Alibaba Group demonstrate the effectiveness of the proposed framework.

Download
525.34 MB

References

[1]

Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. JAIR (2002).

Digital Library

[2]

Ning Chen, Bernardete Ribeiro, and An Chen. 2016. Financial credit risk assessment: a recent review. Artificial Intelligence Review (2016).

[3]

Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge Belongie. 2019. Class-balanced loss based on effective number of samples. In CVPR.

[4]

Haibo He, Yang Bai, Edwardo A Garcia, and Shutao Li. 2008. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In IJCNN.

[5]

Chen Huang, Yining Li, Chen Change Loy, and Xiaoou Tang. 2016. Learning deep representation for imbalanced classification. In CVPR.

[6]

Diederik P Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR.

[7]

Xiaoya Li, Xiaofei Sun, Yuxian Meng, Junjun Liang, Fei Wu, and Jiwei Li. 2020. Dice Loss for Data-imbalanced NLP Tasks. In ACL.

[8]

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In CVPR.

[9]

Can Liu, Qiwei Zhong, Xiang Ao, Sun Li, Wangli Lin, Jinghua Feng, Qing He, and Jiayu Tang. 2020. Fraud Transactions Detection via Behavior Tree with Local Intention Calibration. In KDD.

[10]

Giovanni Mariani, Florian Scheidegger, Roxana Istrate, Costas Bekas, and Cristiano Malossi. 2018. Bagan: Data augmentation with balancing gan. In ICML Workshop.

[11]

David Nadeau and Satoshi Sekine. 2007. A survey of named entity recognition and classification. Lingvisticae Investigationes (2007).

[12]

Sameer Pradhan, Alessandro Moschitti, Nianwen Xue, Hwee Tou Ng, Anders Björkelund, Olga Uryupina, Yuchen Zhang, and Zhi Zhong. 2013. Towards robust linguistic analysis using ontonotes. In ACL.

[13]

Wentao Wang, Suhang Wang, Wenqi Fan, Zitao Liu, and Jiliang Tang. 2020. Global-and-Local Aware Data Generation for the Class Imbalance Problem. In SDM.

[14]

Ya-Lin Zhang, Jun Zhou, Wenhao Zheng, Ji Feng, Longfei Li, Ziqi Liu, Ming Li, Zhiqiang Zhang, Chaochao Chen, Xiaolong Li, et almbox. 2019. Distributed deep forest and its application to automatic detection of cash-out fraud. TIST (2019).

[15]

Qiwei Zhong, Yang Liu, Xiang Ao, Binbin Hu, Jinghua Feng, Jiayu Tang, and Qing He. 2020. Financial Defaulter Detection on Online Credit Payment via Multi-view Attributed Heterogeneous Information Network. In WWW.

Cited By

Zhao PGuo SLi YYang SRen X(2025)FedGen: Personalized federated learning with data generation for enhanced model customization and class imbalanceFuture Generation Computer Systems10.1016/j.future.2024.107595164(107595)Online publication date: Mar-2025
https://doi.org/10.1016/j.future.2024.107595
Ren LZang YHu RLi DWu JHuan ZHu J(2024)Do not ignore heterogeneity and heterophily: Multi-network collaborative telecom fraud detectionExpert Systems with Applications10.1016/j.eswa.2024.124974257(124974)Online publication date: Dec-2024
https://doi.org/10.1016/j.eswa.2024.124974
Song YWei YYuan HSun QFu XWang LLi X(2024)CausalFD: causal invariance-based fraud detection against camouflaged preferenceInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02209-015:11(5053-5070)Online publication date: 27-May-2024
https://doi.org/10.1007/s13042-024-02209-0
Show More Cited By

Index Terms

Alike and Unlike: Resolving Class Imbalance Problem in Financial Credit Risk Assessment
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

OBGAN: Minority oversampling near borderline with generative adversarial networks
Abstract
Class imbalance is a major issue that degrades the performance of machine learning classifiers in real-world problems. Oversampling methods have been widely used to overcome this issue by generating synthetic data from minority ...
Highlights
- OBGAN: A novel minority oversampling method with GAN for class imbalance problems.
Solving class imbalance problem using bagging, boosting techniques, with and without using noise filtering method

In numerous real-world applications/domains, the class imbalance problem is prevalent/hot topic to focus. In various existing work, for solving class imbalance problem, almost data is labeled as one class called majority class, while fewer data is ...
MDS: a novel method for class imbalance learning
ICUIMC '09: Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication

Lots of real-world data sets have imbalanced class distributions in which almost all examples belong to one class and far fewer instances belong to others. Compared with the majority examples, the minority examples are usually more interesting class, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management

October 2020

3619 pages

ISBN:9781450368599

DOI:10.1145/3340531

General Chairs:
Mathieu d'Aquin
DSI, Insight, NUI Galway, Ireland
,
Stefan Dietze
GESIS, Cologne, Germany, Heinrich-Heine-University Düsseldorf, Germany, L3S Research Center, Germany
,
Program Chairs:
Claudia Hauff
TU Delft, The Netherlands
,
Edward Curry
DSI, Insight, NUI Galway, Ireland
,
Philippe Cudre Mauroux
eXascale, University of Fribourg, Switzerland

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 October 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

National Natural Science Foundation of China
National Key Research and Development Program of China
Alibaba Innovative Research Program

Conference

CIKM '20

Sponsor:

CIKM '20: The 29th ACM International Conference on Information and Knowledge Management

October 19 - 23, 2020

Virtual Event, Ireland

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

18
Total Citations
View Citations
350
Total Downloads

Downloads (Last 12 months)27
Downloads (Last 6 weeks)2

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhao PGuo SLi YYang SRen X(2025)FedGen: Personalized federated learning with data generation for enhanced model customization and class imbalanceFuture Generation Computer Systems10.1016/j.future.2024.107595164(107595)Online publication date: Mar-2025
https://doi.org/10.1016/j.future.2024.107595
Ren LZang YHu RLi DWu JHuan ZHu J(2024)Do not ignore heterogeneity and heterophily: Multi-network collaborative telecom fraud detectionExpert Systems with Applications10.1016/j.eswa.2024.124974257(124974)Online publication date: Dec-2024
https://doi.org/10.1016/j.eswa.2024.124974
Song YWei YYuan HSun QFu XWang LLi X(2024)CausalFD: causal invariance-based fraud detection against camouflaged preferenceInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02209-015:11(5053-5070)Online publication date: 27-May-2024
https://doi.org/10.1007/s13042-024-02209-0
Wang GTang DShatsila AZhang X(2024)MICA: Multi-channel Representation Refinement Contrastive Learning for Graph Fraud DetectionWeb and Big Data10.1007/978-981-97-2421-5_3(31-46)Online publication date: 12-May-2024
https://doi.org/10.1007/978-981-97-2421-5_3
Tang HWang CZheng JJiang C(2023)Enabling Graph Neural Networks for Semi-Supervised Risk Prediction in Online Credit Loan ServicesACM Transactions on Intelligent Systems and Technology10.1145/3623401Online publication date: 21-Sep-2023
https://dl.acm.org/doi/10.1145/3623401
Meng LMostafa HNassar MZhang XZhang JFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)Generative Graph Augmentation for Minority Class in Fraud DetectionProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615255(4200-4204)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3615255
Wang LZhao HFeng CLiu WHuang CSantoni MCristofaro MJafrancesco PBian JSingh ASun YAkoglu LGunopulos DYan XKumar ROzcan FYe J(2023)Removing Camouflage and Revealing Collusion: Leveraging Gang-crime Pattern in Fraudster DetectionProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599895(5104-5115)Online publication date: 6-Aug-2023
https://dl.acm.org/doi/10.1145/3580305.3599895
Liu YGao ZLiu XLuo PYang YXiong HSingh ASun YAkoglu LGunopulos DYan XKumar ROzcan FYe J(2023)QTIAH-GNN: Quantity and Topology Imbalance-aware Heterogeneous Graph Neural Network for Bankruptcy PredictionProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599479(1572-1582)Online publication date: 6-Aug-2023
https://dl.acm.org/doi/10.1145/3580305.3599479
Feng WLiu SCheng X(2023)Hierarchical Dense Pattern Detection in TensorsACM Transactions on Knowledge Discovery from Data10.1145/357702217:6(1-29)Online publication date: 28-Feb-2023
https://dl.acm.org/doi/10.1145/3577022
Meng LZhang XZhang JYu P(2023)Location-Adaptive Generative Graph Augmentation for Fraud Detection2023 IEEE 5th International Conference on Cognitive Machine Intelligence (CogMI)10.1109/CogMI58952.2023.00014(24-30)Online publication date: 1-Nov-2023
https://doi.org/10.1109/CogMI58952.2023.00014
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten