Building prediction models and discovering important factors of health insurance fraud using machine learning methods

Nalluri, Venkateswarlu; Chang, Jing-Rong; Chen, Long-Sheng; Chen, Jia-Chuan

doi:10.1007/s12652-023-04633-6

Building prediction models and discovering important factors of health insurance fraud using machine learning methods

Original Research
Published: 19 May 2023

Volume 14, pages 9607–9619, (2023)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Venkateswarlu Nalluri¹,
Jing-Rong Chang¹,
Long-Sheng Chen ORCID: orcid.org/0000-0002-2967-9956¹ &
…
Jia-Chuan Chen¹

431 Accesses
4 Citations
Explore all metrics

Abstract

Health insurance fraud accounts for 3–10% of total medical expenditures every year. If the growth of fraud activities is allowed, it will cause irreversible consequences to the medical system. However, medical-related data is too large and complex, and it is difficult to process such a large amount of data with traditional statistical methods. Therefore, machine learning algorithms have become one of important solutions. When faced with different data, whether the learning method can maintain its stability and give a more appropriate answer is a big question. Many related studies focused on medical insurance fraud and assessment, but few studies attempts to discover the important factors of medical fraud, and find optimal machines learning method. Therefore, this study used two unpublished datasets that might discover novel knowledge, and four machine learning methods, including Support Vector Machines (SVM), Decision Trees (DT), Random Forest (RF) and Multilayer Perceptron (MLP) to find the best machine learning method that can effectively detect medical fraud. From results of DT, we also extracted 19 crucial characteristics of medical insurance fraud, and grouped them into 4 categories, which are medical service providers, applied insurance claims amount, Healthcare Common Procedure Coding System (HCPCS), and beneficiary. Results of experiments could provide valuable suggestions for insurance management to establish an automatic audit mechanism to eliminate medical frauds.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analyses and Detection of Health Insurance Fraud Using Data Mining and Predictive Modeling Techniques

Auto-Insurance Fraud Detection Using Machine Learning Classification Models

A Comparison of Machine Learning Methods Applicable to Healthcare Claims Fraud Detection

Data Availability

Data available on request.

References

Almhaithawi D, Jafar A, Aljnidi M (2020) Example-dependent cost-sensitive credit cards fraud detection using SMOTE and Bayes minimum risk. SN Appl Sci 2(9):1–12
Article Google Scholar
Askari SMS, Hussain MA (2020) IFDTC4.5: intuitionistic fuzzy logic-based decision tree for E-transactional fraud detection. J Inf Secur Appl 52:1–13
Google Scholar
Bach MP, Dumičić K, Žmuk B, Ćurlin T, Zoroja J (2018) “Internal fraud in a project-based organization: CHAID decision tree analysis. Procedia Comput Sci 138:680–687
Article Google Scholar
Bauder RA and Khoshgoftaar TM (2018) The detection of medicare fraud using machine learning methods with excluded provider labels. In: The Thirty-First International Florida Artificial Intelligence Research Society Conference, pp 404–409
Cao H and Zhang R (2019) Using PCA to improve the detection of medical insurance fraud in SOFM Neural Networks. In: 2019 3rd International Conference on Management Engineering, Software Engineering and Service Sciences. Association for Computing Machinery, New York, NY, USA, pp 117–122
Chang J-R, Chen L-S, Lin L-W (2021) A novel cluster based over-sampling approach for classifying imbalanced sentiment data. IAENG Int J Comput Sci 48(4):1118–1128
Google Scholar
Cms.gov (2020) Retrieved from https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/medicare-Provider-Charge-Data/Part-D-Prescriber, (2020.5.30)
Da Rosa RC (2018) An evaluation of unsupervised machine learning algorithms for detecting fraud and abuse in the U.S. Medicare Insurance Program. Master Thesis, The College of Engineering and Computer Science, Florida Atlantic University
Danaa AAA, Daabo MI, Abdul-Barik A (2021) Detecting electronic banking fraud on highly imbalanced data using hidden Markov models. Earthline J Math Sci 7(2):315–332
Article Google Scholar
Dash S, Shakyawar SK, Sharma M, Kaushik S (2019) Big data in healthcare: management, analysis and future prospects. J Big Data 6(1):1–25
Article Google Scholar
Dou Y and Xiong H (2017) Research on recognition of medical insurance fraud based on modified support vector machine. In: 2017 International Conference on Computer Technology, Electronics and Communication, Dalian, China, pp 1021–1025
Ekin T, Ieva F, Ruggeri F, Soyer R (2018) Statistical medical fraud assessment: exposition to an emerging field. Int Stat Rev. https://doi.org/10.1111/insr.12269
Article MathSciNet Google Scholar
Ekin T, Lakomski G, Musal RM (2019) An unsupervised Bayesian hierarchical method for medical fraud assessment. Stat Anal Data Min. https://doi.org/10.1002/sam.11408
Article MathSciNet MATH Google Scholar
Genuer R (2021) Contributions to Random forests methods for several data analysis problems (Doctoral dissertation, Université de Bordeaux)
Greco C, Pace P, Basagni S, Fortino G (2021) Jamming detection at the edge of drone networks using multi-layer perceptrons and decision trees. Appl Soft Comput 111:107806
Article Google Scholar
Gupta RY, Mudigonda SS, Baruah PK & Kandala PK (2021) Implementation of correlation and regression models for health insurance fraud in Covid-19 environment using actuarial and data science techniques. arXiv preprint arXiv:2102.04210
Gyamfi NK and Abdulai J (2018) Bank Fraud detection using support vector machine. In: 2018 IEEE 9th Annual Information Technology, Electronics and Mobile Communication Conference, Vancouver, BC, pp 37–41
Hamad K, Khalil MA, Shanableh A (2017) Modeling roadway traffic noise in a hot climate using artificial neural networks. Transp Res Part D 53:161–177
Article Google Scholar
Health care fraud (2020) Retrieved from https://www.fbi.gov/investigate/whitecollar-crime/health-care-fraud. (2020.12.30)
Heidari AA, Faris H, Mirjalili S, Aljarah I, Mafarja M (2020) Ant lion optimizer: theory, literature review, and application in multi-layer perceptron neural networks. Nat-Inspired Optimiz. https://doi.org/10.1007/978-3-030-12127-3_3
Article Google Scholar
Herland M, Bauder RA, Khoshgoftaar TM (2019) The effects of class rarity on the evaluation of supervised healthcare fraud detection models. J Big Data 6(21):1–33
Google Scholar
https://www.justice.gov/guidance (2020.12.10)
https://www.kaggle.com/rohitrox/healthcare-provider-fraud-detection-analysis
https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge -Data/Part-D-Prescriber
Ismail A, Shehab A, El-Henawy IM (2019) Healthcare analysis in smart big data analytics: reviews, challenges and recommendations. In: Hassanien A, Elhoseny M, Ahmed S, Singh A (eds) Security in smart cities: models, applications, and challenges. Lecture Notes in Intelligent Transportation and Infrastructure. Springer, Cham, pp 27–45
Google Scholar
Itani S, Lecron F, Fortemps P (2019) Specifics of medical data mining for diagnosis aid: a survey. Expert Syst Appl 118:300–314
Article Google Scholar
Kataria S and Nafis MT (2019) Internet banking fraud detection using deep learning based on decision tree and multilayer perceptron. In: 2019 6th International Conference on Computing for Sustainable Global Development, New Delhi, India, pp 1298–1302
Kumar MS, Soundarya V, Kavitha S, Keerthika ES and Aswini E (2019) Credit card fraud detection using random forest algorithm. In: 2019 3rd International Conference on Computing and Communications Technologies, Chennai, India, pp 149–153
Lee J, Shin H, Cho S (2020) A medical treatment-based scoring model to detect abusive institutions. J Biomed Inform 107:1–12
Article Google Scholar
Li Y, Yan C, Liu W, Li M (2018) A principle component analysis-based random forest with the potential nearest neighbor method for automobile insurance fraud identification. Appl Soft Comput 70:1000–1009
Article Google Scholar
Liang J, Zheng X, Chen Z, Dai S, Xu J, Ye H, Lei J (2019) The experience and challenges of healthcare-reform-driven medical consortia and Regional Health Information Technologies in China: a longitudinal study. Int J Med Inform 131:103954
Article Google Scholar
Mackey TK, Miyachi K, Fung D, Qian S, Short J (2020) Combating health care fraud and abuse: Conceptualization and prototyping study of a blockchain antifraud framework. J Med Internet Res 22(9):e18623
Article Google Scholar
Medicare Fraud Strike Force (2021) Office of inspector general, Retrieved from https://www.oig.hhs.gov/fraud/strike-force/, (2021.3.21)
Nguyen TT, Tahir H, Abdelrazek M & Babar A (2020). Deep learning methods for credit card fraud detection. arXiv preprint arXiv:2012.03754
Ostad-Ali-Askari K, Shayannejad M, Hossein Ghorbanizadeh-Kharazi H (2017) Artificial neural network for modeling Nitrate pollution of groundwater in marginal area of Zayandeh-rood river, Isfahan, Iran. KSCE J Civ Eng 21(1):134–140
Article Google Scholar
Pan SS, Zhang WJ (2017) Fraudulent medical behavior detection based on hybrid approach. J East China Normal Univ (natural Science) 2017:125–137
Google Scholar
Pandey P, Saroliya A, Kumar R (2018) Analyses and detection of health insurance fraud using data mining and predictive modeling techniques. In: Pant M, Ray K, Sharma TK, Rawat S, Bandyopadhyay A (eds) Soft computing: theories and applications. Springer, Singapore, pp 41–49
Chapter Google Scholar
Parnian K, Sorouri F, Souha AN, Molazadeh A, Mahdavi S (2021) Fraud detection in health insurance using a combination of feature subset selection based on squirrel optimization algorithm and nearest neighbors algorithm methods. Future Gener Distrib Syst J 3(2):1–11
Google Scholar
Qu Y, Fan M, Zhang X, Ji W (2019) Analysis of smart health research context and development trend driven by big data. In: Chen H, Zeng D, Yan X, Xing C (eds) International conference on smart health. Springer, Cham, pp 142–154
Chapter Google Scholar
Roy AG, Urolagin S (2019) Credit risk assessment using decision tree and support vector machine based data analytics. In: Mateev M, Poutziouris P (eds) Creative business and social innovations for a sustainable future. Springer International Publishing, Cham, pp 79–84
Chapter Google Scholar
Saldamli G, Reddy V, Bojja KS, Gururaja MK, Doddaveerappa Y & Tawalbeh L (2020) Health care insurance fraud detection using blockchain. In: 2020 Seventh International Conference on Software Defined Systems (SDS) (pp 145–152). IEEE
Salem A, Sleit A, Sharieh AA-A, Jabri R (2019) Enhanced authentication system performance based on keystroke dynamics using classification algorithms. KSII Trans Internet Inf Syst 13(8):4076–4092
Google Scholar
Sun C, Yan Z, Li Q, Zheng Y, Lu X, Cui L (2019) Abnormal Group-Based Joint Medical Fraud Detection. IEEE Access 7:13589–13596
Article Google Scholar
Tanwar S, Parekh K, Evans R (2020) Blockchain-based electronic healthcare record system for healthcare 4.0 applications. J Inf Secur Appl 50:102407
Google Scholar
Tike A and Tavarageri S (2017) A medical price prediction system using hierarchical decision trees. In: 2017 IEEE International Conference on Big Data, Boston, MA, pp 3904–3913
Wang Z, Yang J, Dai M, Xu R, Liang X (2019) A method of detecting webshell based on multi-layer perception. Acad J Comput Inf Sci 2(1):81–91
Google Scholar
Wijenayake S, Graham T, Christen P (2018) A decision tree approach to predicting recidivism in domestic violence. In: Ganji M, Rashidi L, Fung BCM, Wang C (eds) Trends and applications in knowledge discovery and data mining. Springer International Publishing, pp 3–15
Chapter Google Scholar
Xuan S, Liu G, Li Z, Zheng L, Wang S and Jiang C (2018) Random forest for credit card fraud detection. In: 2018 IEEE 15th International Conference on Networking, Sensing and Control, Zhuhai, pp 1–6
Yang J, Li Y, Liu Q, Li L, Feng A, Wang T et al (2020) Brief introduction of medical database and data mining technology in big data era. J Evid-Based Med 13(1):57–69
Article Google Scholar
Yao J, Zhang J & Wang L (2018) A financial statement fraud detection model based on hybrid data mining methods. In: 2018 International Conference on Artificial Intelligence and Big Data (ICAIBD) (pp 57–61). IEEE
Yekkala I, Dixit S (2018) Prediction of heart disease using random forest and rough set-based feature selection. Int J Big Data Anal Healthcare (IJBDAH) 3(1):1–12
Article Google Scholar
Zhang Y, Chi G, Zhipeng Zhang Z (2018) Decision tree for credit scoring and discovery of significant features: anempirical analysis based on Chinese microfinance for farmers. Filomat 32(5):1513–1521
Article MathSciNet Google Scholar
Zhang C, Xiao X, Wu C (2020) Medical Fraud and Abuse detection system based on machine learning. Int J Environ Res Public Health 17(19):7265
Article Google Scholar
Zhang W and He X (2017) An anomaly detection method for medicare fraud detection. In: 2017 IEEE International Conference on Big Knowledge, Hefei, pp 309–314

Download references

Acknowledgements

This work was supported in part by National Science and Technology Council, Taiwan (Grant No. MOST 111-2410-H-324-006).

Author information

Authors and Affiliations

Department of Information Management, Chaoyang University of Technology, Taichung, 413310, Taiwan (R.O.C.)
Venkateswarlu Nalluri, Jing-Rong Chang, Long-Sheng Chen & Jia-Chuan Chen

Authors

Venkateswarlu Nalluri
View author publications
You can also search for this author in PubMed Google Scholar
Jing-Rong Chang
View author publications
You can also search for this author in PubMed Google Scholar
Long-Sheng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jia-Chuan Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Long-Sheng Chen.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Nalluri, V., Chang, JR., Chen, LS. et al. Building prediction models and discovering important factors of health insurance fraud using machine learning methods. J Ambient Intell Human Comput 14, 9607–9619 (2023). https://doi.org/10.1007/s12652-023-04633-6

Download citation

Received: 11 July 2022
Accepted: 02 May 2023
Published: 19 May 2023
Issue Date: July 2023
DOI: https://doi.org/10.1007/s12652-023-04633-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Building prediction models and discovering important factors of health insurance fraud using machine learning methods

Abstract

Access this article

Similar content being viewed by others

Analyses and Detection of Health Insurance Fraud Using Data Mining and Predictive Modeling Techniques

Auto-Insurance Fraud Detection Using Machine Learning Classification Models

A Comparison of Machine Learning Methods Applicable to Healthcare Claims Fraud Detection

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Building prediction models and discovering important factors of health insurance fraud using machine learning methods

Abstract

Access this article

Similar content being viewed by others

Analyses and Detection of Health Insurance Fraud Using Data Mining and Predictive Modeling Techniques

Auto-Insurance Fraud Detection Using Machine Learning Classification Models

A Comparison of Machine Learning Methods Applicable to Healthcare Claims Fraud Detection

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation