research-article

Causal-Aware Generative Imputation for Automated Underwriting

Authors:

Tri Dung Duong,

Guandong XuAuthors Info & Claims

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Pages 3916 - 3924

https://doi.org/10.1145/3459637.3481900

Published: 30 October 2021 Publication History

Abstract

Underwriting is an important process in insurance and is concerned with accepting individuals into insurance policy with tolerable claim risk. Underwriting is a tedious and labor intensive process relying on underwriters' domain knowledge and experience, thus is labor intensive and prone to error. Machine learning models are recently applied to automate the underwriting process and thus to ease the burden on the underwriters as well as improve underwriting accuracy. However, observational data used for underwriting modelling is high dimensional, sparse and incomplete, due to the dynamic evolving nature (e.g., upgrade) of business information systems. Simply applying traditional supervised learning methods e.g., logistic regression or Gradient boosting on such highly incomplete data usually leads to the unsatisfactory underwriting result, thus requiring practical data imputation for training quality improvement. In this paper, rather than choosing off-the-shelf solutions tackling the complex data missing problem, we propose an innovative Generative Adversarial Nets (GAN) framework that can capture the missing pattern from a causal perspective. Specifically, we design a structural causal model to learn the causal relations underlying the missing pattern of data. Then, we devise a Causality-aware Generative network (CaGen) using the learned causal relationship prior to generating missing values, and correct the imputed values via the adversarial learning. We also show that CaGen significantly improves the underwriting prediction in real-world insurance applications.

References

[1]

Rhys Biddle, Shaowu Liu, Peter Tilocca, and Guandong Xu. 2018. Automated underwriting in life insurance: Predictions and optimisation. In Australasian Database Conference. Springer, 135--146.

[2]

PP Bonisone, Raj Subbu, and Kareem S Aggour. 2002. Evolutionary optimization of fuzzy decision systems for automated insurance underwriting. In 2002 IEEE World Congress on Computational Intelligence. 2002 IEEE International Con-ference on Fuzzy Systems. FUZZ-IEEE'02. Proceedings (Cat. No. 02CH37291), Vol. 2. IEEE, 1003--1008.

[3]

Hervé Bourlard and Yves Kamp. 1988. Auto-association by multilayer perceptrons and singular value decomposition. Biological cybernetics 59, 4 (1988), 291--294.

Digital Library

[4]

S van Buuren and Karin Groothuis-Oudshoorn. 2011. mice: Multivariate imputation by chained equations in R. Journal of statistical software (2011), 1--68.

[5]

Hongxu Chen, Yicong Li, Xiangguo Sun, Guandong Xu, and Hongzhi Yin. 2021. Temporal meta-path guided explainable recommendation. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 1056--1064.

Digital Library

[6]

Tri Dung Duong, Qian Li, and Guandong Xu. 2021. Stochastic Intervention for Causal Effect Estimation. arXiv preprint arXiv:2105.12898 (2021).

[7]

Pedro J García-Laencina, José-Luis Sancho-Gómez, and Aníbal R Figueiras-Vidal. 2010. Pattern classification with missing data: a review. Neural Computing and Applications 19, 2 (2010), 263--282.

Digital Library

[8]

Lovedeep Gondara and Ke Wang. 2018. Mida: Multiple imputation using denois-ing autoencoders. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 260--272.

[9]

Qian Li, Wenjia Niu, Gang Li, Yanan Cao, Jianlong Tan, and Li Guo. 2015. Lingo: linearized grassmannian optimization for nuclear norm minimization. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. 801--809.

Digital Library

[10]

Qian Li, Wenjia Niu, Gang Li, Jianlong Tan, Gang Xiong, and Li Guo. 2016. Riemannian optimization with subspace tracking for low-rank recovery. In 2016 International Joint Conference on Neural Networks (IJCNN). IEEE, 3280--3287.

[11]

Qian Li, Xiangmeng Wang, and Guandong Xu. 2021. Be Causal: De-biasing Social Network Confounding in Recommendation. arXiv preprint arXiv:2105.07775 (2021).

[12]

Qian Li and Zhichao Wang. 2017. Riemannian submanifold tracking on low-rank algebraic variety. In Thirty-First AAAI Conference on Artificial Intelligence.

Digital Library

[13]

Qian Li, Zhichao Wang, Gang Li, Yanan Cao, Gang Xiong, and Li Guo. 2017. Learning robust low-rank approximation for crowdsourcing on Riemannian manifold. Procedia Computer Science 108 (2017), 285--294.

[14]

Xueyan Liu, Bo Yang, Hechang Chen, Katarzyna Musial, Hongxu Chen, Yang Li, and Wanli Zuo. 2021. A Scalable Redefined Stochastic Blockmodel. ACM Transactions on Knowledge Discovery from Data (TKDD) 15, 3 (2021), 1--28.

[15]

Xueyan Liu, Bo Yang, Wenzhuo Song, Katarzyna Musial, Wanli Zuo, Hongxu Chen, and Hongzhi Yin. 2021. A block-based generative model for attributed network embedding. World Wide Web (2021), 1--26.

[16]

Pierre-Alexandre Mattei and Jes Frellsen. 2019. MIWAE: Deep generative mod-elling and imputation of incomplete data sets. In International Conference on Machine Learning. 4413--4423.

[17]

Rahul Mazumder, Trevor Hastie, and Robert Tibshirani. 2010. Spectral regulariza-tion algorithms for learning large incomplete matrices. The Journal of Machine Learning Research 11 (2010), 2287--2322.

Digital Library

[18]

Judea Pearl et al. 2009. Causal inference in statistics: An overview. Statistics surveys 3 (2009), 96--146.

[19]

Swati Sachan, Jian-Bo Yang, Dong-Ling Xu, David Eraso Benavides, and Yang Li. 2020. An explainable AI decision-support-system to automate loan underwriting. Expert Systems with Applications 144 (2020), 113100.

Digital Library

[20]

Daniel J Stekhoven and Peter Bühlmann. 2012. MissForest-non-parametric missing value imputation for mixed-type data. Bioinformatics 28, 1 (2012), 112--118.

Digital Library

[21]

Zhenchao Sun, Hongzhi Yin, Hongxu Chen, Tong Chen, Lizhen Cui, and Fan Yang. 2020. Disease Prediction via Graph Neural Networks. IEEE Journal of Biomedical and Health Informatics 25, 3 (2020), 818--826.

[22]

Yi Tan and Guo-Ji Zhang. 2005. The application of machine learning algorithm in underwriting process. In 2005 International Conference on Machine Learning and Cybernetics, Vol. 6. IEEE, 3523--3527.

[23]

Olga Troyanskaya, Michael Cantor, Gavin Sherlock, Pat Brown, Trevor Hastie, Robert Tibshirani, David Botstein, and Russ B Altman. 2001. Missing value estimation methods for DNA microarrays. Bioinformatics 17, 6 (2001), 520--525.

[24]

Guandong Xu, Tri Dung Duong, Qian Li, Shaowu Liu, and Xianzhi Wang. 2020. Causality Learning: A New Perspective for Interpretable Machine Learning. arXiv preprint arXiv:2006.16789 (2020).

[25]

Weizhong Yan and Piero P Bonissone. 2006. Designing a Neural Network Decision System for Automated Insurance Underwriting. In The 2006 IEEE International Joint Conference on Neural Network Proceedings. IEEE, 2106--2113.

[26]

Jinsung Yoon, James Jordon, and Mihaela Schaar. 2018. GAIN: Missing Data Imputation using Generative Adversarial Nets. In International Conference on Machine Learning. 5689--5698.

Cited By

Wang XLi QYu DLi QXu G(2024)Reinforced Path Reasoning for Counterfactual Explainable RecommendationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.335407736:7(3443-3459)Online publication date: Jul-2024
https://doi.org/10.1109/TKDE.2024.3354077
Cannon KPreis S(2024)Exploring Knowledge-Based Systems for Commercial Mortgage UnderwritingCurrent Trends in Web Engineering10.1007/978-3-031-50385-6_9(101-113)Online publication date: 4-Jan-2024
https://doi.org/10.1007/978-3-031-50385-6_9
Li QWang XWang ZXu G(2023)Be Causal: De-Biasing Social Network Confounding in RecommendationACM Transactions on Knowledge Discovery from Data10.1145/353372517:1(1-23)Online publication date: 20-Feb-2023
https://dl.acm.org/doi/10.1145/3533725
Show More Cited By

Index Terms

Causal-Aware Generative Imputation for Automated Underwriting
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms

Recommendations

Trend-Aware Data Imputation Based on Generative Adversarial Network for Time Series

To solve the problems of generative adversarial network (GAN)-based imputation method for time series, which are ignoring the implied trends in data and using multi-stage training that may lead to high training complexity, this article proposes a ...
A new imputation method for small software project data sets

Effort prediction is a very important issue for software project management. Historical project data sets are frequently used to support such prediction. But missing data are often contained in these data sets and this makes prediction more difficult. ...
A time series continuous missing values imputation method based on generative adversarial networks
Abstract
Generative adversarial networks (GANs) have been widely utilized in time series analysis and modeling, wherein generators and discriminators interact to generate realistic data. However, when addressing the challenge of imputing continuous ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

October 2021

4966 pages

ISBN:9781450384469

DOI:10.1145/3459637

General Chairs:
Gianluca Demartini
The University of Queensland, Australia
,
Guido Zuccon
The University of Queensland, Australia
,
Program Chairs:
J. Shane Culpepper
RMIT University, Australia
,
Zi Huang
The University of Queensland, Australia
,
Hanghang Tong
University of Illinois at Urbana-Champaign, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Australian Research Council

Conference

CIKM '21

Sponsor:

CIKM '21: The 30th ACM International Conference on Information and Knowledge Management

November 1 - 5, 2021

Queensland, Virtual Event, Australia

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
190
Total Downloads

Downloads (Last 12 months)22
Downloads (Last 6 weeks)3

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang XLi QYu DLi QXu G(2024)Reinforced Path Reasoning for Counterfactual Explainable RecommendationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.335407736:7(3443-3459)Online publication date: Jul-2024
https://doi.org/10.1109/TKDE.2024.3354077
Cannon KPreis S(2024)Exploring Knowledge-Based Systems for Commercial Mortgage UnderwritingCurrent Trends in Web Engineering10.1007/978-3-031-50385-6_9(101-113)Online publication date: 4-Jan-2024
https://doi.org/10.1007/978-3-031-50385-6_9
Li QWang XWang ZXu G(2023)Be Causal: De-Biasing Social Network Confounding in RecommendationACM Transactions on Knowledge Discovery from Data10.1145/353372517:1(1-23)Online publication date: 20-Feb-2023
https://dl.acm.org/doi/10.1145/3533725
Li QWang ZLiu SLi GXu G(2022)Deep treatment-adaptive network for causal inferenceThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-021-00724-y31:5(1127-1142)Online publication date: 18-Feb-2022
https://dl.acm.org/doi/10.1007/s00778-021-00724-y

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten