Identifying fraud in medical insurance based on blockchain and deep learning

https://doi.org/10.1016/j.future.2021.12.006Get rights and content

Highlights

  • Presents a novel framework to identify fraud of medical insurance effectively.

  • Propose an explainable method to evaluate the reasonability of disease diagnosis.

  • Design a protocol of medical data storage and access based on consortium blockchain.

  • The blockchain is applied to ensure the data is secure, immutable, and traceable.

  • The evaluation of the proposed approach is conducted on two real datasets.

Abstract

With the rapid growth of medical costs, the control of medical expenses has been becoming an important task of Health Insurance Department. Traditional medical insurance settlement is paid on a per-service basis, which leads to lots of unreasonable expenses. To cope with this problem, the single-disease payment mechanism has been widely used in recent years. However, the single-disease payment also has a risk of fraud. In this work, we propose a framework to identify fraud of medical insurance based on consortium blockchain and deep learning, which can recognize suspicious medical records automatically to ensure valid implementation on single-disease payment and lighten the work of medical insurance auditors. An explainable model BERT-LE is designed to evaluate the reasonability of ICD disease code for Medicare reimbursement by predicting the probability of a disease according to the chief complaint of a patient. We also put forward a storage and management process of medical records based on consortium blockchain to ensure the security, immutability, traceability, and auditability of the data. The experiments on two real datasets from two 3A hospitals demonstrate that the proposed solution can identify fraud effectively and greatly improve the efficiency in medical insurance reviews.

Introduction

With the rapid development of medical informatization, the information systems of hospitals have accumulated a large amount of data, which makes the medical industry enter the era of big data. Medical big data has brought tremendous value to medical field, and attracted lots of attention from both academia and industry [1]. The control of medical expense is an important research branch of medical big data.

In the traditional medical insurance system, medical expenses are paid on the basis of medical service items, which leads to excessive medical treatment and rising of medical expenses. To cope with the problems in item-based charge mechanism, the single-disease payment based on Diagnosis-Related Groups (DRGs) has been extensively studied and applied [2]. In the single-disease payment mechanism, a fixed payment standard is determined for each disease [3]. The social medical insurance agency pays the patients’ hospitalization fees to hospitals according to the prescribed standard of each disease. To better understand the difference between single-disease payment mechanism and item-based payment mechanism, let us see an example. Suppose that a patient’s diagnosis is pneumonia, according to the traditional medical insurance settlement mechanism, if the expenses covered by the medical insurance are 100,000 yuan and the reimbursement ratio is 90%, then the reimbursement expenses will be 90,000 yuan. In the DRGs payment model, assume that the DRG weighting rate of pneumonia is 6.2 and the unit payment price is 14,000 yuan, then the medical insurance reimbursement will be 6.2×14,000=86,800 yuan. After applying single-disease payment mechanism, the income of medical institutions is only related to each medical case and diagnosis and has nothing to do with the actual cost of treating a patient. If the treatment cost of a disease exceeds the standard of its group, the hospital need pay for the extra cost, so that the medical insurance expenditure can be controlled more effectively. It can standardize the medical resource utilization, that is, the consumption of medical institutions is directly proportional to the number of inpatients, the complexity of diseases, and the intensity of services. In short, the single-disease payment model stipulates a fixed medical insurance payment standard for each disease to avoid excessive medical behavior, and thereby decreases medical expenses. This mechanism guarantees the quality of medical services and is easy to operate.

However, the single-disease payment model may cause medical insurance fraud. For instance, when assigning the diagnostic code for an inpatient, the health care provider may change the actual low-cost disease code to another high-cost disease code to gain more revenue from medical insurance agency. Due to the large number of inpatients, the cost of reviewing each inpatient’s medical records manually is extremely high. Take Jiangsu province of China for example, the number of inpatients each year is about 15 million, so it is impossible to audit each discharge diagnosis manually. Therefore, how to efficiently detect possible fraud has become an urgent problem in the single-disease payment mechanism. In the light of the above problem, we propose a framework to identify fraud of medical insurance based on blockchain and deep learning. In the framework, we transform the fraud identification problem into a text classification problem, and design an explainable model BERT-LE to estimate the reasonability of disease diagnosis by predicting the probability of diagnostic code according to inpatients’ chief complaint. Only the identified abnormal diagnostic codes need to be manually audited, which can significantly improve the audit efficiency and reduce the workload of insurance auditing. In addition, in order to preserve the evidence of medical insurance fraud, we put forward a protocol of medical data storage and management based on consortium blockchain. The medical records are digitally signed and stored in the blockchain, which ensures the security and tamper-proofing of data, and at the same time enables the traceability and non-repudiation of fraudulent behavior. We also set up a credit rating mechanism for doctors and hospitals based on blockchain to carry out effective medical insurance supervision.

To validate our solution, we carry out experiments on two real datasets which contains 620,000 inpatients’ medical records from two large hospitals. The results reveal that our model works well for medical insurance anti-fraud and has good explainability. Our proposed model can also help health departments evaluate the quality of medical records and aid doctors in giving accurate diagnostic codes.

The contributions of this paper are summarized as follows:

  • 1.

    We propose a framework to transform the medical insurance anti-fraud problem into a text classification problem. The framework can effectively identify medical insurance fraud and reduce the workload of auditors.

  • 2.

    We design a label embedding method to explain the logic of classification decisions, which can help users better understand and trust our model.

  • 3.

    We put forward a protocol of medical data storage and access based on consortium blockchain, which can prevent the illegal tampering of medical records and make medical records traceable and auditable.

  • 4.

    We conduct lots of experiments on two real datasets from large hospitals, and the results show that our model can play an effective role in anti-fraud for medical insurance and have good explainability.

The rest of this paper is organized as follows. In Section 2, the preliminaries and problem statement are presented. In Section 3, an explainable deep learning-based method to identify fraud of medical insurance is described. In Section 4, the blockchain-based medical records management process is elaborated. In Section 5, the proposed framework is evaluated and discussed. In Section 6, previous studies are reviewed. Finally, Section 7 concludes the paper and provides some future work discussions.

A preliminary version of this work has been reported in a conference short paper [4].

Section snippets

Notations

The key medical term definitions in the proposed framework are provided as follows.

Definition 1

Chief complaint [5]. In medical domain, a chief complaint is a patient’s description of his/her symptoms or (and) conditions, and duration of problems, etc., written by the physician into the patient’s medical record. The chief complaint is the first item in the hospital medical record. It must reflect the characteristics of the first diagnosis of disease, and its description must be concise, refined and

Identify fraud of medical insurance

To identify abnormal medical records, the reasonability of the disease diagnosis needs to be evaluated based on patients’ chief complaints. For this purpose, we transform the identification of fraudulent medical records into a text classification task, which is to predict the probability of each ICD-10 code according to a chief complaint. Then the predicted probabilities of all ICD-10 codes will be sorted in descending order. If the ICD-10 code assigned to the medical record is in the top-k set

Blockchain-based medical records storage and management

Medical records such as chief complaints and ICD codes are important evidence of medical insurance anti-fraud. In order to provide effective protection for medical data anti-tamper, anti-repudiation, security, and integrity, we design a framework of medical data storage and management based on blockchain. In the proposed framework, our major contributions are in the following three aspects:

  • 1.

    In view of the characteristics of medical data, we design a data storage method combining on-chain and

Datasets and preprocessing

We tested the proposed approach on two datasets from two large 3 A hospitals (hospital-A and hospital-B) in Jiangsu Province of China. The chief complaints of patients were extracted from admission records, and the ICD-10 codes were extracted from the front sheet of medical records. Compared with the diagnostic codes recorded at the time of admission by doctors, the ICD-10 codes in the front sheet of medical records reveals higher accuracy, because the professional coders have revised them

Text classification based on deep learning

Deep learning architectures and algorithms have brought about tremendous advances in the fields of computer vision and traditional pattern recognition [22], [23]. Following this trend, new deep learning methods are increasingly being used in NLP research. In the past decade, machine learning approaches for solving NLP problems were generally based on shallow models (such as SVM [24]), which were trained on very high-dimensional and one-hot encoding data. In recent years, neural networks based

Conclusion

In this paper, we propose a framework to identify fraud of medical insurance based on explainable BERT-LE model and consortium blockchain. It jointly learns the representations of labels and characters to predict the probability of a disease according to a patient’s chief complaint, and evaluates the reasonability of the ICD-10 code written in medical record. We also put forward a storage and management process of medical records based on consortium blockchain technology to ensure that data are

CRediT authorship contribution statement

Guoming Zhang: Conceived and designed the analysis, Collected the data, Contributed data or analysis tools, Performed the analysis, Wrote the paper. Xuyun Zhang: Conceived and designed the analysis, Contributed data or analysis tools, Performed the analysis. Muhammad Bilal: Conceived and designed the analysis, Reviewed and edited the paper. Wanchun Dou: Conceived and designed the analysis, Reviewed and edited the paper. Xiaolong Xu: Conceived and designed the analysis, Contributed data or

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is supported in part by the National Key R&D Program of China under Grant 2019YFE0190500, the National Natural Science Foundation of China under Grant No. 61672276, Jiangsu Key R&D Program of China under Grant No. BE2019104, the National Key R&D Program of China under Grant No. 2017YFB1400600, and the Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing University. Joel J.P.C. Rodrigues is funded by FCT/MCTES through national funds and when

Guoming Zhang works at Health Statistics and Information Center of Jiangsu Province. He is now studying for a Ph.D. in Computer Science and Technology at Nanjing University. He has a B.E. in Computer Science and Technology from Shandong University (2006), and an M.E. in Computer Application Technology from Beijing University of Technology (2009). He is a member of Medical and Health Big Data Professional Committee of Jiangsu Province. His research focuses on cloud computing and big data.

References (48)

  • WagnerM.M. et al.

    Chief complaints and ICD codes

    Handb. Biosurveillance

    (2006)
  • LiuH. et al.

    Multi-label text classification via joint learning from label embedding and label correlation

    Neurocomputing

    (2021)
  • EstevaA. et al.

    A guide to deep learning in healthcare

    Nature Med.

    (2019)
  • MathauerI. et al.

    Hospital payment systems based on diagnosis-related groups: experiences in low-and middle-income countries

    Bull. World Health Organ.

    (2013)
  • GaoF. et al.

    Discussion on the implementation of single disease payment

    Chin. Med. Rec. Engl. Ed.

    (2013)
  • ZhangG. et al.

    An anti-fraud framework for medical insurance based on deep learning

  • World Health OrganizationM.M.

    International Statistical Classification of Diseases and Related Health Problems 10th Revision

    (2011)
  • DevlinJ. et al.

    Bert: Pre-training of deep bidirectional transformers for language understanding

  • WangG. et al.

    Joint embedding of words and labels for text classification

  • DuC. et al.

    Explicit interaction model towards text classification

  • VaswaniA. et al.

    Attention is all you need

  • KimY.

    Convolutional neural networks for sentence classification

  • GaoW. et al.

    A survey of blockchain: Techniques, applications, and challenges

  • KosbaA. et al.

    Hawk: The blockchain model of cryptography and privacy-preserving smart contracts

  • ZhongB. et al.

    Hyperledger fabric-based consortium blockchain for construction quality information management

    Front. Eng. Manag.

    (2020)
  • SukhwaniH. et al.

    Performance modeling of PBFT consensus process for permissioned blockchain network (hyperledger fabric)

  • MühlbergerR. et al.

    Foundational oracle patterns: Connecting blockchain to the off-chain world

  • JohnsonR. et al.

    Deep pyramid convolutional neural networks for text categorization

  • GravesA.

    Supervised sequence labelling with recurrent neural networks. 2012

    (2012)
  • Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, E. Hovy, Hierarchical attention networks for document classification, in:...
  • JoulinA. et al.

    Fasttext. zip: Compressing text classification models

    (2016)
  • CuiY. et al.

    Pre-training with whole word masking for Chinese BERT

    (2019)
  • LeCunY. et al.

    Deep learning

    Nature

    (2015)
  • HeK. et al.

    Deep residual learning for image recognition

  • Cited by (38)

    • A blockchain-based hybrid platform for multimedia data processing in IoT-Healthcare

      2023, Alexandria Engineering Journal
      Citation Excerpt :

      The organization of medicinal drug supply chains is one of the most important applications of blockchains in the medical industry. Providing organization is important in all industries, but it is especially important in healthcare due to the increasing difficulty [25]. This is because any disruption in the healthcare supply chain has an impact on a patient's health.

    • Society 5.0 and explainable artificial intelligence—implications

      2023, XAI Based Intelligent Systems for Society 5.0
    View all citing articles on Scopus

    Guoming Zhang works at Health Statistics and Information Center of Jiangsu Province. He is now studying for a Ph.D. in Computer Science and Technology at Nanjing University. He has a B.E. in Computer Science and Technology from Shandong University (2006), and an M.E. in Computer Application Technology from Beijing University of Technology (2009). He is a member of Medical and Health Big Data Professional Committee of Jiangsu Province. His research focuses on cloud computing and big data.

    Xuyun Zhang (Member, IEEE) received the B.Sc. and M.Eng. degrees in computer science and technology from Nanjing University, Nanjing, China, in 2008 and 2001, respectively, and the Ph.D. degree in computer and information science from the University of Technology Sydney, Ultimo, NSW, Australia, in 2014. He is currently a Senior Lecturer with the Department of Computing, Macquarie University, Sydney, Australia. He also has the working experience with the University of Auckland and NICTA (now Data61, CSIRO). He has so far published authored or coauthored more than 100 refereed academic papers in many high-quality and influential conferences and journals (IEEE Transactions on Parallel and Distributed Systems, IEEE Transactions on Knowledge and Data Engineering, IEEE Transactions on Computers, IEEE Transactions on Software Engineering, IEEE Transactions on Industrial Informatics (IEEE TII), IEEE Journal on Selected Areas in Communications, and ICDE) in the areas of his research interests, which include scalable and secure machine learning, big data mining and analytics, cloud/edge/service computing and IoT, big data privacy, cybersecurity, etc.

    Muhammad Bilal received the B.Sc. degree in computer systems engineering from the University of Engineering and Technology, Peshawar, Pakistan, in 2008, the M.S. degree in computer engineering from the Chosun University, Gwangju, South Korea, in 2012, and the Ph.D. degree in information and communication network engineering from the School of Electronics and Telecommunications Research Institute (ETRI), Korea University of Science and Technology, in 2017. He was a Postdoctoral Research Fellow at Smart Quantum Communication Center, Korea University, Seoul, South Korea, in 2017/2018.

    Currently, he is an Assistant Professor with the Division of Computer and Electronic Systems Engineering, Hankuk University of Foreign Studies, Yongin, South Korea. His research interests include design and analysis of network protocols, network architecture, network security, IoT, named data networking, Blockchain, cryptology, and future Internet. Dr. Bilal has served as a reviewer of various international journals, and also served as a Technical Program Committee Member on many international conferences including IEEE VTC, IEEE ICC, Infocom and IEEE CCNC. He is an editor of IEEE Future Directions Ethics and Policy in Technology Newsletter and IEEE Internet Policy Newsletter.

    Wanchun Dou (Member, IEEE) received the Ph.D. degree in mechanical and electronic engineering from the Nanjing University of Science and Technology, China, in 2001. He is currently a Lecturer with the Nanjing University of Science and Technology. He is also a Full Professor with the State Key Laboratory for Novel Software Technology, Nanjing University. From April 2005 to June 2005 and from November 2008 to February 2009, he visited the Departments of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, respectively, as a Visiting Scholar. He has published more than 100 research papers in international journals and international conferences. His research interests include workflow, cloud computing, and service computing.

    Xiaolong Xu (Member, IEEE) received the Ph.D. degree from Nanjing University, China, in 2016. He worked as a Research Scholar with Michigan State University, USA, from April 2017 to May 2018. He is currently a Professor with the School of Computer and Software, Nanjing University of Information Science and Technology. He has published more than 80 peer-reviewed papers in the international journals and conferences, including the IEEE TRANSACTIONS ON INDUSTRIALINFORMA TICS, the IEEE TRANSACTIONS ON CLOUD COMPUTING, the IEEE TRANSACTIONS ON BIG DATA, the IEEE INTERNET OF THINGS, the IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTA TIONAL INTELLIGENCE, the IEEE TRANSACTIONS ON COMPUTA TIONAL SOCIAL SYSTEMS, the Journal of Network and Computer Applications, Society of Petroleum Engineers, WWWj, IEEE ICWS, and ICSOC. His research interests include fog computing, edge computing, the Internet of Things, cloud computing, and big data.

    Joel J. P. C. Rodrigues [Fellow, IEEE & AAIA] is with Senac Faculty of Ceará, Brazil, head of research, development, and innovation; and senior researcher at the Instituto de Telecomunicações, Portugal. Prof. Rodrigues is an Highly Cited Researcher, the leader of the Next Generation Networks and Applications (NetGNA) research group (CNPq), an IEEE Distinguished Lecturer, Member Representative of the IEEE Communications Society on the IEEE Biometrics Council, and the President of the scientific council at ParkUrbis – Covilhã Science and Technology Park. He was Director for Conference Development - IEEE ComSoc Board of Governors, Technical Activities Committee Chair of the IEEE ComSoc Latin America Region Board, a Past-Chair of the IEEE ComSoc Technical Committee (TC) on eHealth and the TC on Communications Software, a Steering Committee member of the IEEE Life Sciences Technical Community and Publications co-Chair. He is the editor-in-chief of the International Journal of E-Health and Medical Communications and editorial board member of several high-reputed journals (mainly, from IEEE). He has been general chair and TPC Chair of many international conferences, including IEEE ICC, IEEE GLOBECOM, IEEE HEALTHCOM, and IEEE LatinCom. He has authored or coauthored about 1000 papers in refereed international journals and conferences, 3 books, 2 patents, and 1 ITU-T Recommendation. He had been awarded several Outstanding Leadership and Outstanding Service Awards by IEEE Communications Society and several best papers awards. Prof. Rodrigues is a member of the Internet Society, a senior member ACM, and Fellow of AAIA and IEEE.

    View full text