skip to main content
research-article

Negative Insurance Claim Generation Using Distance Pooling on Positive Diagnosis-Procedure Bipartite Graphs

Published: 23 May 2022 Publication History

Abstract

Negative samples in health and medical insurance domain refer to fraudulent or erroneous insurance claims that may include inconsistent diagnosis-procedure relations with respect to a medical coding system. Unfortunately, only a few datasets are publicly available for research in health insurance domain, yet none reports any negative claims. However, negative claims are essential not only to develop new machine learning approaches but also to test and validate automated artificial intelligence systems deployed by insurance providers. In this study, we introduce a synthetic negative claim generation procedure based on the bipartite graph representations of positive claims. Our empirical results demonstrate promising outcomes that will improve the development and evaluation processes of machine learning approaches in healthcare, where negative samples are required, but not available. Moreover, the proposed scheme can be applied to other domains, where bipartite graph representations are meaningful and negative samples are lacking.

References

[1]
Moustafa Alzantot, Supriyo Chakraborty, and Mani Srivastava. 2017. Sensegen: A deep learning architecture for synthetic sensor data generation. In Proceedings of the IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom’17). IEEE, 188–193.
[2]
Emilie Lundin Barse, Hakan Kvarnstrom, and Erland Jonsson. 2003. Synthesizing test data for fraud detection systems. In Proceedings of the 19th Annual Computer Security Applications Conference. IEEE, 384–394.
[3]
Richard Bauder, Raquel da Rosa, and Taghi Khoshgoftaar. 2018. Identifying medicare provider fraud with unsupervised machine learning. In Proceedings of the IEEE International Conference on Information Reuse and Integration (IRI’18). IEEE, 285–292.
[4]
Richard A. Bauder and Taghi M. Khoshgoftaar. 2018. The detection of medicare fraud using machine learning methods with excluded provider labels. In Proceedings of the 31st International Flairs Conference.
[5]
Christoph Baur, Shadi Albarqouni, and Nassir Navab. 2018. Generating highly realistic images of skin lesions with GANs. In OR 2.0 Context-Aware Operating Theaters, Computer Assisted Robotic Endoscopy, Clinical Image-based Procedures, and Skin Image Analysis. Springer, 260–267.
[6]
Elisa Bertino, Geeth de Mel, Alessandra Russo, Seraphin Calo, and Dinesh Verma. 2017. Community-based self generation of policies and processes for assets: Concepts and research directions. In Proceedings of the IEEE International Conference on Big Data (Big Data’17). IEEE, 2961–2969.
[7]
Centers for Medicare and Medicaid Services. 2020. Research, Statistics, Data and Systems. Retrieved from https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/SynPUFs/DE_Syn_PUF.
[8]
Yunqiang Chen, Xiang Sean Zhou, and Thomas S. Huang. 2001. One-class SVM for learning in image retrieval. In Proceedings of the International Conference on Image Processing, Vol. 1. IEEE, 34–37.
[9]
Wei Di and Melba M. Crawford. 2011. View generation for multiview maximum disagreement-based active learning for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 50, 5 (2011), 1942–1954.
[10]
Tahir Ekin, Luca Frigau, and Claudio Conversano. 2021. Health care fraud classifiers in practice. Appl. Stochast. Models Bus. Industry 37, 6 (2021) 1182–1199.
[11]
Tahir Ekin, Francesca Ieva, Fabrizio Ruggeri, and Refik Soyer. 2018. Statistical medical fraud assessment: Exposition to an emerging field. Int. Stat. Rev. 86, 3 (2018), 379–402.
[12]
Tahir Ekin, Greg Lakomski, and Rasim Muzaffer Musal. 2019. An unsupervised Bayesian hierarchical method for medical fraud assessment. Stat. Anal. Data Min.: ASA Data Sci. J. 12, 2 (2019), 116–124.
[13]
FIND-A-CODE. 2020. Search for and lookup ICD 10 Codes, CPT Codes, HCPCS Codes, ICD 9 Codes, medical terms, medical newsletters, medicare documents and more. Retrieved from https://www.findacode.com/search/search.php.
[14]
Font Awesome. 2020. Image Generated by Free Icons. Retrieved from https://fontawesome.com/license/free.
[15]
Maayan Frid-Adar, Eyal Klang, Michal Amitai, Jacob Goldberger, and Hayit Greenspan. 2018. Synthetic data augmentation using GAN for improved liver lesion classification. In Proceedings of the IEEE 15th international symposium on biomedical imaging (ISBI’18). IEEE, 289–293.
[16]
Yongchang Gao, Chenfei Sun, Ruican Li, Qingzhong Li, Lizhen Cui, and Bin Gong. 2018. An efficient fraud identification method combining manifold learning and outliers detection in mobile healthcare services. IEEE Access 6 (2018), 60059–60068.
[17]
Richard M. Golden, Steven S. Henley, Halbert White, and T. Michael Kashner. 2019. Consequences of model misspecification for maximum likelihood estimation with missing data. Econometrics 7, 3 (2019), 37.
[18]
Jiaxian Guo, Sidi Lu, Han Cai, Weinan Zhang, Yong Yu, and Jun Wang. 2018. Long text generation via adversarial training with leaked information. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.
[19]
Md Enamul Haque. 2020. A Bipartite Graph-based Representation Learning for Healthcare Claims and Its Application to Fraudulent Claim Identification. Ph.D. Dissertation. University of Louisiana at Lafayette.
[20]
Md Enamul Haque and Mehmet Engin Tozal. 2021. Identifying health insurance claim frauds using mixture of clinical concepts. IEEE Trans. Serv. Comput. (2021).
[21]
Joseph G. Ibrahim, Haitao Chu, and Ming-Hui Chen. 2012. Missing data in clinical studies: Issues and methods. J. Clin. Oncol. 30, 26 (2012), 3297.
[22]
ISIC. 2018. Skin Lesion Analysis Towards Melanoma Detection. Retrieved from https://challenge2018.isic-archive.com/.
[23]
Xiao-Yuan Jing, Xinyu Zhang, Xiaoke Zhu, Fei Wu, Xinge You, Yang Gao, Shiguang Shan, and Jing-Yu Yang. 2019. Multiset feature learning for highly imbalanced data classification. IEEE Trans. Pattern Anal. Mach. Intell. 43, 1 (2019), 139–156.
[24]
Donald B. Johnson. 1977. Efficient algorithms for shortest paths in sparse networks. J. ACM 24, 1 (1977), 1–13.
[25]
Saba Kareem, Rohiza Binti Ahmad, and Aliza Binit Sarlan. 2017. Framework for the identification of fraudulent health insurance claims using association rule mining. In Proceedings of the IEEE Conference on Big Data and Analytics (ICBDA’17). IEEE, 99–104.
[26]
Andreĭ Nikolaevich Kolmogorov and Albert T. Bharucha-Reid. 2018. Foundations of the Theory of Probability: Second English Edition. Courier Dover Publications.
[27]
Der-Chiang Li, Susan C. Hu, Liang-Sian Lin, and Chun-Wu Yeh. 2017. Detecting representative data and generating synthetic samples to improve learning accuracy with imbalanced data sets. PloS One 12, 8 (2017), e0181853.
[28]
Kevin Lin, Dianqi Li, Xiaodong He, Zhengyou Zhang, and Ming-Ting Sun. 2017. Adversarial ranking for language generation. In Advances in Neural Information Processing Systems. MIT Press, 3155–3165.
[29]
Irum Matloob, Shoab Ahmed Khan, and Habib Ur Rahman. 2020. Sequence mining and prediction-based healthcare fraud detection methodology. IEEE Access 8 (2020), 143256–143273.
[30]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. MIT Press, 3111–3119.
[31]
National Health Care Anti-Fraud Association. 2020. Consumer Info and Action. Retrieved from https://www.nhcaa.org/resources/health-care-anti-fraud-resources/consumer-info-action.aspx.
[32]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 311–318.
[33]
Raphael Petegrosso, Zhuliu Li, Molly A. Srour, Yousef Saad, Wei Zhang, and Rui Kuang. 2019. Scalable remote homology detection and fold recognition in massive protein networks. Proteins: Struct., Funct. Bioinform. 87, 6 (2019), 478–491.
[34]
Hong Liang Qiao. 2019. System and method of sentiment data generation. U.S. Patent 10,198,506.
[35]
Alyssa J. Rolfe. 2021. Weighted risk models for dynamic healthcare fraud detection. Risk Manage. Insur. Rev. 24, 2 (2021), 143–150.
[36]
Gokay Saldamli, Vamshi Reddy, Krishna S. Bojja, Manjunatha K. Gururaja, Yashaswi Doddaveerappa, and Loai Tawalbeh. 2020. Health care insurance fraud detection using blockchain. In Proceedings of the 7th International Conference on Software Defined Systems (SDS’20). IEEE, 145–152.
[37]
Bernhard Schölkopf, John C. Platt, John Shawe-Taylor, Alex J. Smola, and Robert C. Williamson. 2001. Estimating the support of a high-dimensional distribution. Neural Comput. 13, 7 (2001), 1443–1471.
[38]
Lavanya Settipalli and G. R. Gangadharan. 2021. Healthcare fraud detection using primitive sub peer group analysis. Concurr. Comput.: Pract. Exper. (2021), e6275.
[39]
Yuliang Shi, Chenfei Sun, Qingzhong Li, Lizhen Cui, Han Yu, and Chunyan Miao. 2016. A fraud resilient medical insurance claim system. In Proceedings of the 30th AAAI Conference on Artificial Intelligence.
[40]
Hoo-Chang Shin, Neil A. Tenenholtz, Jameson K. Rogers, Christopher G. Schwarz, Matthew L. Senjem, Jeffrey L. Gunter, Katherine P. Andriole, and Mark Michalski. 2018. Medical image synthesis for data augmentation and anonymization using generative adversarial networks. In Proceedings of the International Workshop on Simulation and Synthesis in Medical Imaging. Springer, 1–11.
[41]
Hamilton O. Smith, Clyde A. Hutchison, Cynthia Pfannkoch, and J. Craig Venter. 2003. Generating a synthetic genome by whole genome assembly: \(\varphi\)X174 bacteriophage from synthetic oligonucleotides. Proc. Natl. Acad. Sci. U.S.A. 100, 26 (2003), 15440–15445.
[42]
Jimeng Sun, Huiming Qu, Deepayan Chakrabarti, and Christos Faloutsos. 2005. Neighborhood formation and anomaly detection in bipartite graphs. In Proceedings of the 5th IEEE International Conference on Data Mining (ICDM’05). IEEE, 8.
[43]
László Szilágyi, Levente Kovács, and Sándor Miklós Szilágyi. 2014. Synthetic test data generation for hierarchical graph clustering methods. In Proceedings of the International Conference on Neural Information Processing. Springer, 303–310.
[44]
Jason Walonoski, Mark Kramer, Joseph Nichols, Andre Quina, Chris Moesel, Dylan Hall, Carlton Duffett, Kudakwashe Dube, Thomas Gallagher, and Scott McLachlan. 2018. Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record. J. Amer. Med. Inform. Assoc. 25, 3 (2018), 230–238.
[45]
Babak Zafari and Tahir Ekin. 2019. Topic modelling for medical prescription fraud and abuse detection. J. Roy. Stat. Soc.: Ser. C (Appl. Stat.) 68, 3 (2019), 751–769.

Cited By

View all
  • (2023)Identification of Fraudulent Healthcare Claims Using Fuzzy Bipartite Knowledge GraphsIEEE Transactions on Services Computing10.1109/TSC.2023.329678216:6(3931-3945)Online publication date: Nov-2023

Index Terms

  1. Negative Insurance Claim Generation Using Distance Pooling on Positive Diagnosis-Procedure Bipartite Graphs

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Journal of Data and Information Quality
      Journal of Data and Information Quality  Volume 14, Issue 3
      September 2022
      155 pages
      ISSN:1936-1955
      EISSN:1936-1963
      DOI:10.1145/3533272
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 23 May 2022
      Online AM: 18 April 2022
      Accepted: 01 January 2022
      Revised: 01 September 2021
      Received: 01 May 2020
      Published in JDIQ Volume 14, Issue 3

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Negative health insurance claims
      2. distance pooling
      3. diagnosis-procedure bipartite graphs

      Qualifiers

      • Research-article
      • Refereed

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)55
      • Downloads (Last 6 weeks)3
      Reflects downloads up to 25 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Identification of Fraudulent Healthcare Claims Using Fuzzy Bipartite Knowledge GraphsIEEE Transactions on Services Computing10.1109/TSC.2023.329678216:6(3931-3945)Online publication date: Nov-2023

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      HTML Format

      View this article in HTML Format.

      HTML Format

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media