skip to main content
research-article

Negative Insurance Claim Generation Using Distance Pooling on Positive Diagnosis-Procedure Bipartite Graphs

Authors Info & Claims
Published:23 May 2022Publication History
Skip Abstract Section

Abstract

Negative samples in health and medical insurance domain refer to fraudulent or erroneous insurance claims that may include inconsistent diagnosis-procedure relations with respect to a medical coding system. Unfortunately, only a few datasets are publicly available for research in health insurance domain, yet none reports any negative claims. However, negative claims are essential not only to develop new machine learning approaches but also to test and validate automated artificial intelligence systems deployed by insurance providers. In this study, we introduce a synthetic negative claim generation procedure based on the bipartite graph representations of positive claims. Our empirical results demonstrate promising outcomes that will improve the development and evaluation processes of machine learning approaches in healthcare, where negative samples are required, but not available. Moreover, the proposed scheme can be applied to other domains, where bipartite graph representations are meaningful and negative samples are lacking.

REFERENCES

  1. [1] Alzantot Moustafa, Chakraborty Supriyo, and Srivastava Mani. 2017. Sensegen: A deep learning architecture for synthetic sensor data generation. In Proceedings of the IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom’17). IEEE, 188193.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Barse Emilie Lundin, Kvarnstrom Hakan, and Jonsson Erland. 2003. Synthesizing test data for fraud detection systems. In Proceedings of the 19th Annual Computer Security Applications Conference. IEEE, 384394.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Bauder Richard, Rosa Raquel da, and Khoshgoftaar Taghi. 2018. Identifying medicare provider fraud with unsupervised machine learning. In Proceedings of the IEEE International Conference on Information Reuse and Integration (IRI’18). IEEE, 285292.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Bauder Richard A. and Khoshgoftaar Taghi M.. 2018. The detection of medicare fraud using machine learning methods with excluded provider labels. In Proceedings of the 31st International Flairs Conference.Google ScholarGoogle Scholar
  5. [5] Baur Christoph, Albarqouni Shadi, and Navab Nassir. 2018. Generating highly realistic images of skin lesions with GANs. In OR 2.0 Context-Aware Operating Theaters, Computer Assisted Robotic Endoscopy, Clinical Image-based Procedures, and Skin Image Analysis. Springer, 260267.Google ScholarGoogle Scholar
  6. [6] Bertino Elisa, Mel Geeth de, Russo Alessandra, Calo Seraphin, and Verma Dinesh. 2017. Community-based self generation of policies and processes for assets: Concepts and research directions. In Proceedings of the IEEE International Conference on Big Data (Big Data’17). IEEE, 29612969.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Services Centers for Medicare and Medicaid. 2020. Research, Statistics, Data and Systems. Retrieved from https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/SynPUFs/DE_Syn_PUF.Google ScholarGoogle Scholar
  8. [8] Chen Yunqiang, Zhou Xiang Sean, and Huang Thomas S.. 2001. One-class SVM for learning in image retrieval. In Proceedings of the International Conference on Image Processing, Vol. 1. IEEE, 3437.Google ScholarGoogle Scholar
  9. [9] Di Wei and Crawford Melba M.. 2011. View generation for multiview maximum disagreement-based active learning for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 50, 5 (2011), 19421954.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Ekin Tahir, Frigau Luca, and Conversano Claudio. 2021. Health care fraud classifiers in practice. Appl. Stochast. Models Bus. Industry 37, 6 (2021) 1182–1199.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Ekin Tahir, Ieva Francesca, Ruggeri Fabrizio, and Soyer Refik. 2018. Statistical medical fraud assessment: Exposition to an emerging field. Int. Stat. Rev. 86, 3 (2018), 379402.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Ekin Tahir, Lakomski Greg, and Musal Rasim Muzaffer. 2019. An unsupervised Bayesian hierarchical method for medical fraud assessment. Stat. Anal. Data Min.: ASA Data Sci. J. 12, 2 (2019), 116124.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] FIND-A-CODE. 2020. Search for and lookup ICD 10 Codes, CPT Codes, HCPCS Codes, ICD 9 Codes, medical terms, medical newsletters, medicare documents and more. Retrieved from https://www.findacode.com/search/search.php.Google ScholarGoogle Scholar
  14. [14] Awesome Font. 2020. Image Generated by Free Icons. Retrieved from https://fontawesome.com/license/free.Google ScholarGoogle Scholar
  15. [15] Frid-Adar Maayan, Klang Eyal, Amitai Michal, Goldberger Jacob, and Greenspan Hayit. 2018. Synthetic data augmentation using GAN for improved liver lesion classification. In Proceedings of the IEEE 15th international symposium on biomedical imaging (ISBI’18). IEEE, 289293.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Gao Yongchang, Sun Chenfei, Li Ruican, Li Qingzhong, Cui Lizhen, and Gong Bin. 2018. An efficient fraud identification method combining manifold learning and outliers detection in mobile healthcare services. IEEE Access 6 (2018), 6005960068.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Golden Richard M., Henley Steven S., White Halbert, and Kashner T. Michael. 2019. Consequences of model misspecification for maximum likelihood estimation with missing data. Econometrics 7, 3 (2019), 37.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Guo Jiaxian, Lu Sidi, Cai Han, Zhang Weinan, Yu Yong, and Wang Jun. 2018. Long text generation via adversarial training with leaked information. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Haque Md Enamul. 2020. A Bipartite Graph-based Representation Learning for Healthcare Claims and Its Application to Fraudulent Claim Identification. Ph.D. Dissertation. University of Louisiana at Lafayette.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Haque Md Enamul and Tozal Mehmet Engin. 2021. Identifying health insurance claim frauds using mixture of clinical concepts. IEEE Trans. Serv. Comput. (2021).Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Ibrahim Joseph G., Chu Haitao, and Chen Ming-Hui. 2012. Missing data in clinical studies: Issues and methods. J. Clin. Oncol. 30, 26 (2012), 3297.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] ISIC. 2018. Skin Lesion Analysis Towards Melanoma Detection. Retrieved from https://challenge2018.isic-archive.com/.Google ScholarGoogle Scholar
  23. [23] Jing Xiao-Yuan, Zhang Xinyu, Zhu Xiaoke, Wu Fei, You Xinge, Gao Yang, Shan Shiguang, and Yang Jing-Yu. 2019. Multiset feature learning for highly imbalanced data classification. IEEE Trans. Pattern Anal. Mach. Intell. 43, 1 (2019), 139–156.Google ScholarGoogle Scholar
  24. [24] Johnson Donald B.. 1977. Efficient algorithms for shortest paths in sparse networks. J. ACM 24, 1 (1977), 113.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Kareem Saba, Ahmad Rohiza Binti, and Sarlan Aliza Binit. 2017. Framework for the identification of fraudulent health insurance claims using association rule mining. In Proceedings of the IEEE Conference on Big Data and Analytics (ICBDA’17). IEEE, 99104.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Kolmogorov Andreĭ Nikolaevich and Bharucha-Reid Albert T.. 2018. Foundations of the Theory of Probability: Second English Edition. Courier Dover Publications.Google ScholarGoogle Scholar
  27. [27] Li Der-Chiang, Hu Susan C., Lin Liang-Sian, and Yeh Chun-Wu. 2017. Detecting representative data and generating synthetic samples to improve learning accuracy with imbalanced data sets. PloS One 12, 8 (2017), e0181853.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Lin Kevin, Li Dianqi, He Xiaodong, Zhang Zhengyou, and Sun Ming-Ting. 2017. Adversarial ranking for language generation. In Advances in Neural Information Processing Systems. MIT Press, 31553165.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Matloob Irum, Khan Shoab Ahmed, and Rahman Habib Ur. 2020. Sequence mining and prediction-based healthcare fraud detection methodology. IEEE Access 8 (2020), 143256143273.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Mikolov Tomas, Sutskever Ilya, Chen Kai, Corrado Greg S., and Dean Jeff. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. MIT Press, 31113119.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Association National Health Care Anti-Fraud. 2020. Consumer Info and Action. Retrieved from https://www.nhcaa.org/resources/health-care-anti-fraud-resources/consumer-info-action.aspx.Google ScholarGoogle Scholar
  32. [32] Papineni Kishore, Roukos Salim, Ward Todd, and Zhu Wei-Jing. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 311318.Google ScholarGoogle Scholar
  33. [33] Petegrosso Raphael, Li Zhuliu, Srour Molly A., Saad Yousef, Zhang Wei, and Kuang Rui. 2019. Scalable remote homology detection and fold recognition in massive protein networks. Proteins: Struct., Funct. Bioinform. 87, 6 (2019), 478491.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Qiao Hong Liang. 2019. System and method of sentiment data generation. U.S. Patent 10,198,506.Google ScholarGoogle Scholar
  35. [35] Rolfe Alyssa J.. 2021. Weighted risk models for dynamic healthcare fraud detection. Risk Manage. Insur. Rev. 24, 2 (2021), 143–150.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Saldamli Gokay, Reddy Vamshi, Bojja Krishna S., Gururaja Manjunatha K., Doddaveerappa Yashaswi, and Tawalbeh Loai. 2020. Health care insurance fraud detection using blockchain. In Proceedings of the 7th International Conference on Software Defined Systems (SDS’20). IEEE, 145152.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Schölkopf Bernhard, Platt John C., Shawe-Taylor John, Smola Alex J., and Williamson Robert C.. 2001. Estimating the support of a high-dimensional distribution. Neural Comput. 13, 7 (2001), 14431471.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Settipalli Lavanya and Gangadharan G. R.. 2021. Healthcare fraud detection using primitive sub peer group analysis. Concurr. Comput.: Pract. Exper. (2021), e6275.Google ScholarGoogle Scholar
  39. [39] Shi Yuliang, Sun Chenfei, Li Qingzhong, Cui Lizhen, Yu Han, and Miao Chunyan. 2016. A fraud resilient medical insurance claim system. In Proceedings of the 30th AAAI Conference on Artificial Intelligence.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Shin Hoo-Chang, Tenenholtz Neil A., Rogers Jameson K., Schwarz Christopher G., Senjem Matthew L., Gunter Jeffrey L., Andriole Katherine P., and Michalski Mark. 2018. Medical image synthesis for data augmentation and anonymization using generative adversarial networks. In Proceedings of the International Workshop on Simulation and Synthesis in Medical Imaging. Springer, 111.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Smith Hamilton O., Hutchison Clyde A., Pfannkoch Cynthia, and Venter J. Craig. 2003. Generating a synthetic genome by whole genome assembly: \( \varphi \)X174 bacteriophage from synthetic oligonucleotides. Proc. Natl. Acad. Sci. U.S.A. 100, 26 (2003), 1544015445.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Sun Jimeng, Qu Huiming, Chakrabarti Deepayan, and Faloutsos Christos. 2005. Neighborhood formation and anomaly detection in bipartite graphs. In Proceedings of the 5th IEEE International Conference on Data Mining (ICDM’05). IEEE, 8.Google ScholarGoogle Scholar
  43. [43] Szilágyi László, Kovács Levente, and Szilágyi Sándor Miklós. 2014. Synthetic test data generation for hierarchical graph clustering methods. In Proceedings of the International Conference on Neural Information Processing. Springer, 303310.Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Walonoski Jason, Kramer Mark, Nichols Joseph, Quina Andre, Moesel Chris, Hall Dylan, Duffett Carlton, Dube Kudakwashe, Gallagher Thomas, and McLachlan Scott. 2018. Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record. J. Amer. Med. Inform. Assoc. 25, 3 (2018), 230238.Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Zafari Babak and Ekin Tahir. 2019. Topic modelling for medical prescription fraud and abuse detection. J. Roy. Stat. Soc.: Ser. C (Appl. Stat.) 68, 3 (2019), 751769.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Negative Insurance Claim Generation Using Distance Pooling on Positive Diagnosis-Procedure Bipartite Graphs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Journal of Data and Information Quality
        Journal of Data and Information Quality  Volume 14, Issue 3
        September 2022
        155 pages
        ISSN:1936-1955
        EISSN:1936-1963
        DOI:10.1145/3533272
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 May 2022
        • Online AM: 18 April 2022
        • Accepted: 1 January 2022
        • Revised: 1 September 2021
        • Received: 1 May 2020
        Published in jdiq Volume 14, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed
      • Article Metrics

        • Downloads (Last 12 months)61
        • Downloads (Last 6 weeks)5

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format