research-article

Negative Insurance Claim Generation Using Distance Pooling on Positive Diagnosis-Procedure Bipartite Graphs

Authors:

Md Enamul Haque,

Mehmet Engin TozalAuthors Info & Claims

ACM Journal of Data and Information Quality (JDIQ), Volume 14, Issue 3

Article No.: 17, Pages 1 - 26

https://doi.org/10.1145/3531347

Published: 23 May 2022 Publication History

Abstract

Negative samples in health and medical insurance domain refer to fraudulent or erroneous insurance claims that may include inconsistent diagnosis-procedure relations with respect to a medical coding system. Unfortunately, only a few datasets are publicly available for research in health insurance domain, yet none reports any negative claims. However, negative claims are essential not only to develop new machine learning approaches but also to test and validate automated artificial intelligence systems deployed by insurance providers. In this study, we introduce a synthetic negative claim generation procedure based on the bipartite graph representations of positive claims. Our empirical results demonstrate promising outcomes that will improve the development and evaluation processes of machine learning approaches in healthcare, where negative samples are required, but not available. Moreover, the proposed scheme can be applied to other domains, where bipartite graph representations are meaningful and negative samples are lacking.

References

[1]

Moustafa Alzantot, Supriyo Chakraborty, and Mani Srivastava. 2017. Sensegen: A deep learning architecture for synthetic sensor data generation. In Proceedings of the IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom’17). IEEE, 188–193.

[2]

Emilie Lundin Barse, Hakan Kvarnstrom, and Erland Jonsson. 2003. Synthesizing test data for fraud detection systems. In Proceedings of the 19th Annual Computer Security Applications Conference. IEEE, 384–394.

[3]

Richard Bauder, Raquel da Rosa, and Taghi Khoshgoftaar. 2018. Identifying medicare provider fraud with unsupervised machine learning. In Proceedings of the IEEE International Conference on Information Reuse and Integration (IRI’18). IEEE, 285–292.

Digital Library

[4]

Richard A. Bauder and Taghi M. Khoshgoftaar. 2018. The detection of medicare fraud using machine learning methods with excluded provider labels. In Proceedings of the 31st International Flairs Conference.

[5]

Christoph Baur, Shadi Albarqouni, and Nassir Navab. 2018. Generating highly realistic images of skin lesions with GANs. In OR 2.0 Context-Aware Operating Theaters, Computer Assisted Robotic Endoscopy, Clinical Image-based Procedures, and Skin Image Analysis. Springer, 260–267.

[6]

Elisa Bertino, Geeth de Mel, Alessandra Russo, Seraphin Calo, and Dinesh Verma. 2017. Community-based self generation of policies and processes for assets: Concepts and research directions. In Proceedings of the IEEE International Conference on Big Data (Big Data’17). IEEE, 2961–2969.

[7]

Centers for Medicare and Medicaid Services. 2020. Research, Statistics, Data and Systems. Retrieved from https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/SynPUFs/DE_Syn_PUF.

[8]

Yunqiang Chen, Xiang Sean Zhou, and Thomas S. Huang. 2001. One-class SVM for learning in image retrieval. In Proceedings of the International Conference on Image Processing, Vol. 1. IEEE, 34–37.

[9]

Wei Di and Melba M. Crawford. 2011. View generation for multiview maximum disagreement-based active learning for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 50, 5 (2011), 1942–1954.

[10]

Tahir Ekin, Luca Frigau, and Claudio Conversano. 2021. Health care fraud classifiers in practice. Appl. Stochast. Models Bus. Industry 37, 6 (2021) 1182–1199.

[11]

Tahir Ekin, Francesca Ieva, Fabrizio Ruggeri, and Refik Soyer. 2018. Statistical medical fraud assessment: Exposition to an emerging field. Int. Stat. Rev. 86, 3 (2018), 379–402.

[12]

Tahir Ekin, Greg Lakomski, and Rasim Muzaffer Musal. 2019. An unsupervised Bayesian hierarchical method for medical fraud assessment. Stat. Anal. Data Min.: ASA Data Sci. J. 12, 2 (2019), 116–124.

Digital Library

[13]

FIND-A-CODE. 2020. Search for and lookup ICD 10 Codes, CPT Codes, HCPCS Codes, ICD 9 Codes, medical terms, medical newsletters, medicare documents and more. Retrieved from https://www.findacode.com/search/search.php.

[14]

Font Awesome. 2020. Image Generated by Free Icons. Retrieved from https://fontawesome.com/license/free.

[15]

Maayan Frid-Adar, Eyal Klang, Michal Amitai, Jacob Goldberger, and Hayit Greenspan. 2018. Synthetic data augmentation using GAN for improved liver lesion classification. In Proceedings of the IEEE 15th international symposium on biomedical imaging (ISBI’18). IEEE, 289–293.

[16]

Yongchang Gao, Chenfei Sun, Ruican Li, Qingzhong Li, Lizhen Cui, and Bin Gong. 2018. An efficient fraud identification method combining manifold learning and outliers detection in mobile healthcare services. IEEE Access 6 (2018), 60059–60068.

[17]

Richard M. Golden, Steven S. Henley, Halbert White, and T. Michael Kashner. 2019. Consequences of model misspecification for maximum likelihood estimation with missing data. Econometrics 7, 3 (2019), 37.

[18]

Jiaxian Guo, Sidi Lu, Han Cai, Weinan Zhang, Yong Yu, and Jun Wang. 2018. Long text generation via adversarial training with leaked information. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.

[19]

Md Enamul Haque. 2020. A Bipartite Graph-based Representation Learning for Healthcare Claims and Its Application to Fraudulent Claim Identification. Ph.D. Dissertation. University of Louisiana at Lafayette.

Digital Library

[20]

Md Enamul Haque and Mehmet Engin Tozal. 2021. Identifying health insurance claim frauds using mixture of clinical concepts. IEEE Trans. Serv. Comput. (2021).

[21]

Joseph G. Ibrahim, Haitao Chu, and Ming-Hui Chen. 2012. Missing data in clinical studies: Issues and methods. J. Clin. Oncol. 30, 26 (2012), 3297.

[22]

ISIC. 2018. Skin Lesion Analysis Towards Melanoma Detection. Retrieved from https://challenge2018.isic-archive.com/.

[23]

Xiao-Yuan Jing, Xinyu Zhang, Xiaoke Zhu, Fei Wu, Xinge You, Yang Gao, Shiguang Shan, and Jing-Yu Yang. 2019. Multiset feature learning for highly imbalanced data classification. IEEE Trans. Pattern Anal. Mach. Intell. 43, 1 (2019), 139–156.

[24]

Donald B. Johnson. 1977. Efficient algorithms for shortest paths in sparse networks. J. ACM 24, 1 (1977), 1–13.

Digital Library

[25]

Saba Kareem, Rohiza Binti Ahmad, and Aliza Binit Sarlan. 2017. Framework for the identification of fraudulent health insurance claims using association rule mining. In Proceedings of the IEEE Conference on Big Data and Analytics (ICBDA’17). IEEE, 99–104.

[26]

Andreĭ Nikolaevich Kolmogorov and Albert T. Bharucha-Reid. 2018. Foundations of the Theory of Probability: Second English Edition. Courier Dover Publications.

[27]

Der-Chiang Li, Susan C. Hu, Liang-Sian Lin, and Chun-Wu Yeh. 2017. Detecting representative data and generating synthetic samples to improve learning accuracy with imbalanced data sets. PloS One 12, 8 (2017), e0181853.

[28]

Kevin Lin, Dianqi Li, Xiaodong He, Zhengyou Zhang, and Ming-Ting Sun. 2017. Adversarial ranking for language generation. In Advances in Neural Information Processing Systems. MIT Press, 3155–3165.

Digital Library

[29]

Irum Matloob, Shoab Ahmed Khan, and Habib Ur Rahman. 2020. Sequence mining and prediction-based healthcare fraud detection methodology. IEEE Access 8 (2020), 143256–143273.

[30]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. MIT Press, 3111–3119.

Digital Library

[31]

National Health Care Anti-Fraud Association. 2020. Consumer Info and Action. Retrieved from https://www.nhcaa.org/resources/health-care-anti-fraud-resources/consumer-info-action.aspx.

[32]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 311–318.

[33]

Raphael Petegrosso, Zhuliu Li, Molly A. Srour, Yousef Saad, Wei Zhang, and Rui Kuang. 2019. Scalable remote homology detection and fold recognition in massive protein networks. Proteins: Struct., Funct. Bioinform. 87, 6 (2019), 478–491.

[34]

Hong Liang Qiao. 2019. System and method of sentiment data generation. U.S. Patent 10,198,506.

[35]

Alyssa J. Rolfe. 2021. Weighted risk models for dynamic healthcare fraud detection. Risk Manage. Insur. Rev. 24, 2 (2021), 143–150.

[36]

Gokay Saldamli, Vamshi Reddy, Krishna S. Bojja, Manjunatha K. Gururaja, Yashaswi Doddaveerappa, and Loai Tawalbeh. 2020. Health care insurance fraud detection using blockchain. In Proceedings of the 7th International Conference on Software Defined Systems (SDS’20). IEEE, 145–152.

[37]

Bernhard Schölkopf, John C. Platt, John Shawe-Taylor, Alex J. Smola, and Robert C. Williamson. 2001. Estimating the support of a high-dimensional distribution. Neural Comput. 13, 7 (2001), 1443–1471.

Digital Library

[38]

Lavanya Settipalli and G. R. Gangadharan. 2021. Healthcare fraud detection using primitive sub peer group analysis. Concurr. Comput.: Pract. Exper. (2021), e6275.

[39]

Yuliang Shi, Chenfei Sun, Qingzhong Li, Lizhen Cui, Han Yu, and Chunyan Miao. 2016. A fraud resilient medical insurance claim system. In Proceedings of the 30th AAAI Conference on Artificial Intelligence.

[40]

Hoo-Chang Shin, Neil A. Tenenholtz, Jameson K. Rogers, Christopher G. Schwarz, Matthew L. Senjem, Jeffrey L. Gunter, Katherine P. Andriole, and Mark Michalski. 2018. Medical image synthesis for data augmentation and anonymization using generative adversarial networks. In Proceedings of the International Workshop on Simulation and Synthesis in Medical Imaging. Springer, 1–11.

[41]

Hamilton O. Smith, Clyde A. Hutchison, Cynthia Pfannkoch, and J. Craig Venter. 2003. Generating a synthetic genome by whole genome assembly: \(\varphi\)X174 bacteriophage from synthetic oligonucleotides. Proc. Natl. Acad. Sci. U.S.A. 100, 26 (2003), 15440–15445.

[42]

Jimeng Sun, Huiming Qu, Deepayan Chakrabarti, and Christos Faloutsos. 2005. Neighborhood formation and anomaly detection in bipartite graphs. In Proceedings of the 5th IEEE International Conference on Data Mining (ICDM’05). IEEE, 8.

[43]

László Szilágyi, Levente Kovács, and Sándor Miklós Szilágyi. 2014. Synthetic test data generation for hierarchical graph clustering methods. In Proceedings of the International Conference on Neural Information Processing. Springer, 303–310.

[44]

Jason Walonoski, Mark Kramer, Joseph Nichols, Andre Quina, Chris Moesel, Dylan Hall, Carlton Duffett, Kudakwashe Dube, Thomas Gallagher, and Scott McLachlan. 2018. Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record. J. Amer. Med. Inform. Assoc. 25, 3 (2018), 230–238.

[45]

Babak Zafari and Tahir Ekin. 2019. Topic modelling for medical prescription fraud and abuse detection. J. Roy. Stat. Soc.: Ser. C (Appl. Stat.) 68, 3 (2019), 751–769.

Cited By

Haque MTozal M(2023)Identification of Fraudulent Healthcare Claims Using Fuzzy Bipartite Knowledge GraphsIEEE Transactions on Services Computing10.1109/TSC.2023.329678216:6(3931-3945)Online publication date: Nov-2023
https://doi.org/10.1109/TSC.2023.3296782

Index Terms

Negative Insurance Claim Generation Using Distance Pooling on Positive Diagnosis-Procedure Bipartite Graphs
1. Applied computing
2. Computing methodologies
  1. Machine learning

Recommendations

On the connectivity of bipartite distance-balanced graphs

A connected graph @C is said to be distance-balanced whenever for any pair of adjacent vertices u,v of @C the number of vertices closer to u than to v is equal to the number of vertices closer to v than to u. In [K. Handa, Bipartite graphs with balanced ...
Equistarable bipartite graphs

Recently, Milanič and Trotignon introduced the class of equistarable graphs as graphs without isolated vertices admitting positive weights on the edges such that a subset of edges is of total weight 1 if and only if it forms a maximal star. Based on ...
Bipartite subgraphs of triangle-free subcubic graphs

Suppose G is a graph with n vertices and m edges. Let n^' be the maximum number of vertices in an induced bipartite subgraph of G and let m^' be the maximum number of edges in a spanning bipartite subgraph of G. Then b(G)=m^'/m is called the bipartite ...

Comments

Information & Contributors

Information

Published In

cover image Journal of Data and Information Quality

Journal of Data and Information Quality Volume 14, Issue 3

September 2022

155 pages

ISSN:1936-1955

EISSN:1936-1963

DOI:10.1145/3533272

Editor:
Tiziana Catarci
Sapienza University of Rome, Rome, Italy

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 May 2022

Online AM: 18 April 2022

Accepted: 01 January 2022

Revised: 01 September 2021

Received: 01 May 2020

Published in JDIQ Volume 14, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
233
Total Downloads

Downloads (Last 12 months)55
Downloads (Last 6 weeks)3

Reflects downloads up to 25 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Haque MTozal M(2023)Identification of Fraudulent Healthcare Claims Using Fuzzy Bipartite Knowledge GraphsIEEE Transactions on Services Computing10.1109/TSC.2023.329678216:6(3931-3945)Online publication date: Nov-2023
https://doi.org/10.1109/TSC.2023.3296782

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents