Skip to main content
Log in

Building prediction models and discovering important factors of health insurance fraud using machine learning methods

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Health insurance fraud accounts for 3–10% of total medical expenditures every year. If the growth of fraud activities is allowed, it will cause irreversible consequences to the medical system. However, medical-related data is too large and complex, and it is difficult to process such a large amount of data with traditional statistical methods. Therefore, machine learning algorithms have become one of important solutions. When faced with different data, whether the learning method can maintain its stability and give a more appropriate answer is a big question. Many related studies focused on medical insurance fraud and assessment, but few studies attempts to discover the important factors of medical fraud, and find optimal machines learning method. Therefore, this study used two unpublished datasets that might discover novel knowledge, and four machine learning methods, including Support Vector Machines (SVM), Decision Trees (DT), Random Forest (RF) and Multilayer Perceptron (MLP) to find the best machine learning method that can effectively detect medical fraud. From results of DT, we also extracted 19 crucial characteristics of medical insurance fraud, and grouped them into 4 categories, which are medical service providers, applied insurance claims amount, Healthcare Common Procedure Coding System (HCPCS), and beneficiary. Results of experiments could provide valuable suggestions for insurance management to establish an automatic audit mechanism to eliminate medical frauds.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Data Availability

Data available on request.

References

  • Almhaithawi D, Jafar A, Aljnidi M (2020) Example-dependent cost-sensitive credit cards fraud detection using SMOTE and Bayes minimum risk. SN Appl Sci 2(9):1–12

    Article  Google Scholar 

  • Askari SMS, Hussain MA (2020) IFDTC4.5: intuitionistic fuzzy logic-based decision tree for E-transactional fraud detection. J Inf Secur Appl 52:1–13

    Google Scholar 

  • Bach MP, Dumičić K, Žmuk B, Ćurlin T, Zoroja J (2018) “Internal fraud in a project-based organization: CHAID decision tree analysis. Procedia Comput Sci 138:680–687

    Article  Google Scholar 

  • Bauder RA and Khoshgoftaar TM (2018) The detection of medicare fraud using machine learning methods with excluded provider labels. In: The Thirty-First International Florida Artificial Intelligence Research Society Conference, pp 404–409

  • Cao H and Zhang R (2019) Using PCA to improve the detection of medical insurance fraud in SOFM Neural Networks. In: 2019 3rd International Conference on Management Engineering, Software Engineering and Service Sciences. Association for Computing Machinery, New York, NY, USA, pp 117–122

  • Chang J-R, Chen L-S, Lin L-W (2021) A novel cluster based over-sampling approach for classifying imbalanced sentiment data. IAENG Int J Comput Sci 48(4):1118–1128

    Google Scholar 

  • Cms.gov (2020) Retrieved from https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/medicare-Provider-Charge-Data/Part-D-Prescriber, (2020.5.30)

  • Da Rosa RC (2018) An evaluation of unsupervised machine learning algorithms for detecting fraud and abuse in the U.S. Medicare Insurance Program. Master Thesis, The College of Engineering and Computer Science, Florida Atlantic University

  • Danaa AAA, Daabo MI, Abdul-Barik A (2021) Detecting electronic banking fraud on highly imbalanced data using hidden Markov models. Earthline J Math Sci 7(2):315–332

    Article  Google Scholar 

  • Dash S, Shakyawar SK, Sharma M, Kaushik S (2019) Big data in healthcare: management, analysis and future prospects. J Big Data 6(1):1–25

    Article  Google Scholar 

  • Dou Y and Xiong H (2017) Research on recognition of medical insurance fraud based on modified support vector machine. In: 2017 International Conference on Computer Technology, Electronics and Communication, Dalian, China, pp 1021–1025

  • Ekin T, Ieva F, Ruggeri F, Soyer R (2018) Statistical medical fraud assessment: exposition to an emerging field. Int Stat Rev. https://doi.org/10.1111/insr.12269

    Article  MathSciNet  Google Scholar 

  • Ekin T, Lakomski G, Musal RM (2019) An unsupervised Bayesian hierarchical method for medical fraud assessment. Stat Anal Data Min. https://doi.org/10.1002/sam.11408

    Article  MathSciNet  MATH  Google Scholar 

  • Genuer R (2021) Contributions to Random forests methods for several data analysis problems (Doctoral dissertation, Université de Bordeaux)

  • Greco C, Pace P, Basagni S, Fortino G (2021) Jamming detection at the edge of drone networks using multi-layer perceptrons and decision trees. Appl Soft Comput 111:107806

    Article  Google Scholar 

  • Gupta RY, Mudigonda SS, Baruah PK & Kandala PK (2021) Implementation of correlation and regression models for health insurance fraud in Covid-19 environment using actuarial and data science techniques. arXiv preprint arXiv:2102.04210

  • Gyamfi NK and Abdulai J (2018) Bank Fraud detection using support vector machine. In: 2018 IEEE 9th Annual Information Technology, Electronics and Mobile Communication Conference, Vancouver, BC, pp 37–41

  • Hamad K, Khalil MA, Shanableh A (2017) Modeling roadway traffic noise in a hot climate using artificial neural networks. Transp Res Part D 53:161–177

    Article  Google Scholar 

  • Health care fraud (2020) Retrieved from https://www.fbi.gov/investigate/whitecollar-crime/health-care-fraud. (2020.12.30)

  • Heidari AA, Faris H, Mirjalili S, Aljarah I, Mafarja M (2020) Ant lion optimizer: theory, literature review, and application in multi-layer perceptron neural networks. Nat-Inspired Optimiz. https://doi.org/10.1007/978-3-030-12127-3_3

    Article  Google Scholar 

  • Herland M, Bauder RA, Khoshgoftaar TM (2019) The effects of class rarity on the evaluation of supervised healthcare fraud detection models. J Big Data 6(21):1–33

    Google Scholar 

  • https://www.justice.gov/guidance (2020.12.10)

  • https://www.kaggle.com/rohitrox/healthcare-provider-fraud-detection-analysis

  • https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge -Data/Part-D-Prescriber

  • Ismail A, Shehab A, El-Henawy IM (2019) Healthcare analysis in smart big data analytics: reviews, challenges and recommendations. In: Hassanien A, Elhoseny M, Ahmed S, Singh A (eds) Security in smart cities: models, applications, and challenges. Lecture Notes in Intelligent Transportation and Infrastructure. Springer, Cham, pp 27–45

    Google Scholar 

  • Itani S, Lecron F, Fortemps P (2019) Specifics of medical data mining for diagnosis aid: a survey. Expert Syst Appl 118:300–314

    Article  Google Scholar 

  • Kataria S and Nafis MT (2019) Internet banking fraud detection using deep learning based on decision tree and multilayer perceptron. In: 2019 6th International Conference on Computing for Sustainable Global Development, New Delhi, India, pp 1298–1302

  • Kumar MS, Soundarya V, Kavitha S, Keerthika ES and Aswini E (2019) Credit card fraud detection using random forest algorithm. In: 2019 3rd International Conference on Computing and Communications Technologies, Chennai, India, pp 149–153

  • Lee J, Shin H, Cho S (2020) A medical treatment-based scoring model to detect abusive institutions. J Biomed Inform 107:1–12

    Article  Google Scholar 

  • Li Y, Yan C, Liu W, Li M (2018) A principle component analysis-based random forest with the potential nearest neighbor method for automobile insurance fraud identification. Appl Soft Comput 70:1000–1009

    Article  Google Scholar 

  • Liang J, Zheng X, Chen Z, Dai S, Xu J, Ye H, Lei J (2019) The experience and challenges of healthcare-reform-driven medical consortia and Regional Health Information Technologies in China: a longitudinal study. Int J Med Inform 131:103954

    Article  Google Scholar 

  • Mackey TK, Miyachi K, Fung D, Qian S, Short J (2020) Combating health care fraud and abuse: Conceptualization and prototyping study of a blockchain antifraud framework. J Med Internet Res 22(9):e18623

    Article  Google Scholar 

  • Medicare Fraud Strike Force (2021) Office of inspector general, Retrieved from https://www.oig.hhs.gov/fraud/strike-force/, (2021.3.21)

  • Nguyen TT, Tahir H, Abdelrazek M & Babar A (2020). Deep learning methods for credit card fraud detection. arXiv preprint arXiv:2012.03754

  • Ostad-Ali-Askari K, Shayannejad M, Hossein Ghorbanizadeh-Kharazi H (2017) Artificial neural network for modeling Nitrate pollution of groundwater in marginal area of Zayandeh-rood river, Isfahan, Iran. KSCE J Civ Eng 21(1):134–140

    Article  Google Scholar 

  • Pan SS, Zhang WJ (2017) Fraudulent medical behavior detection based on hybrid approach. J East China Normal Univ (natural Science) 2017:125–137

    Google Scholar 

  • Pandey P, Saroliya A, Kumar R (2018) Analyses and detection of health insurance fraud using data mining and predictive modeling techniques. In: Pant M, Ray K, Sharma TK, Rawat S, Bandyopadhyay A (eds) Soft computing: theories and applications. Springer, Singapore, pp 41–49

    Chapter  Google Scholar 

  • Parnian K, Sorouri F, Souha AN, Molazadeh A, Mahdavi S (2021) Fraud detection in health insurance using a combination of feature subset selection based on squirrel optimization algorithm and nearest neighbors algorithm methods. Future Gener Distrib Syst J 3(2):1–11

    Google Scholar 

  • Qu Y, Fan M, Zhang X, Ji W (2019) Analysis of smart health research context and development trend driven by big data. In: Chen H, Zeng D, Yan X, Xing C (eds) International conference on smart health. Springer, Cham, pp 142–154

    Chapter  Google Scholar 

  • Roy AG, Urolagin S (2019) Credit risk assessment using decision tree and support vector machine based data analytics. In: Mateev M, Poutziouris P (eds) Creative business and social innovations for a sustainable future. Springer International Publishing, Cham, pp 79–84

    Chapter  Google Scholar 

  • Saldamli G, Reddy V, Bojja KS, Gururaja MK, Doddaveerappa Y & Tawalbeh L (2020) Health care insurance fraud detection using blockchain. In: 2020 Seventh International Conference on Software Defined Systems (SDS) (pp 145–152). IEEE

  • Salem A, Sleit A, Sharieh AA-A, Jabri R (2019) Enhanced authentication system performance based on keystroke dynamics using classification algorithms. KSII Trans Internet Inf Syst 13(8):4076–4092

    Google Scholar 

  • Sun C, Yan Z, Li Q, Zheng Y, Lu X, Cui L (2019) Abnormal Group-Based Joint Medical Fraud Detection. IEEE Access 7:13589–13596

    Article  Google Scholar 

  • Tanwar S, Parekh K, Evans R (2020) Blockchain-based electronic healthcare record system for healthcare 4.0 applications. J Inf Secur Appl 50:102407

    Google Scholar 

  • Tike A and Tavarageri S (2017) A medical price prediction system using hierarchical decision trees. In: 2017 IEEE International Conference on Big Data, Boston, MA, pp 3904–3913

  • Wang Z, Yang J, Dai M, Xu R, Liang X (2019) A method of detecting webshell based on multi-layer perception. Acad J Comput Inf Sci 2(1):81–91

    Google Scholar 

  • Wijenayake S, Graham T, Christen P (2018) A decision tree approach to predicting recidivism in domestic violence. In: Ganji M, Rashidi L, Fung BCM, Wang C (eds) Trends and applications in knowledge discovery and data mining. Springer International Publishing, pp 3–15

    Chapter  Google Scholar 

  • Xuan S, Liu G, Li Z, Zheng L, Wang S and Jiang C (2018) Random forest for credit card fraud detection. In: 2018 IEEE 15th International Conference on Networking, Sensing and Control, Zhuhai, pp 1–6

  • Yang J, Li Y, Liu Q, Li L, Feng A, Wang T et al (2020) Brief introduction of medical database and data mining technology in big data era. J Evid-Based Med 13(1):57–69

    Article  Google Scholar 

  • Yao J, Zhang J & Wang L (2018) A financial statement fraud detection model based on hybrid data mining methods. In: 2018 International Conference on Artificial Intelligence and Big Data (ICAIBD) (pp 57–61). IEEE

  • Yekkala I, Dixit S (2018) Prediction of heart disease using random forest and rough set-based feature selection. Int J Big Data Anal Healthcare (IJBDAH) 3(1):1–12

    Article  Google Scholar 

  • Zhang Y, Chi G, Zhipeng Zhang Z (2018) Decision tree for credit scoring and discovery of significant features: anempirical analysis based on Chinese microfinance for farmers. Filomat 32(5):1513–1521

    Article  MathSciNet  Google Scholar 

  • Zhang C, Xiao X, Wu C (2020) Medical Fraud and Abuse detection system based on machine learning. Int J Environ Res Public Health 17(19):7265

    Article  Google Scholar 

  • Zhang W and He X (2017) An anomaly detection method for medicare fraud detection. In: 2017 IEEE International Conference on Big Knowledge, Hefei, pp 309–314

Download references

Acknowledgements

This work was supported in part by National Science and Technology Council, Taiwan (Grant No. MOST 111-2410-H-324-006).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Long-Sheng Chen.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nalluri, V., Chang, JR., Chen, LS. et al. Building prediction models and discovering important factors of health insurance fraud using machine learning methods. J Ambient Intell Human Comput 14, 9607–9619 (2023). https://doi.org/10.1007/s12652-023-04633-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-023-04633-6

Keywords

Navigation