Abstract
Negative samples in health and medical insurance domain refer to fraudulent or erroneous insurance claims that may include inconsistent diagnosis-procedure relations with respect to a medical coding system. Unfortunately, only a few datasets are publicly available for research in health insurance domain, yet none reports any negative claims. However, negative claims are essential not only to develop new machine learning approaches but also to test and validate automated artificial intelligence systems deployed by insurance providers. In this study, we introduce a synthetic negative claim generation procedure based on the bipartite graph representations of positive claims. Our empirical results demonstrate promising outcomes that will improve the development and evaluation processes of machine learning approaches in healthcare, where negative samples are required, but not available. Moreover, the proposed scheme can be applied to other domains, where bipartite graph representations are meaningful and negative samples are lacking.
- [1] . 2017. Sensegen: A deep learning architecture for synthetic sensor data generation. In Proceedings of the IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom’17). IEEE, 188–193.Google ScholarCross Ref
- [2] . 2003. Synthesizing test data for fraud detection systems. In Proceedings of the 19th Annual Computer Security Applications Conference. IEEE, 384–394.Google ScholarCross Ref
- [3] . 2018. Identifying medicare provider fraud with unsupervised machine learning. In Proceedings of the IEEE International Conference on Information Reuse and Integration (IRI’18). IEEE, 285–292.Google ScholarDigital Library
- [4] . 2018. The detection of medicare fraud using machine learning methods with excluded provider labels. In Proceedings of the 31st International Flairs Conference.Google Scholar
- [5] . 2018. Generating highly realistic images of skin lesions with GANs. In OR 2.0 Context-Aware Operating Theaters, Computer Assisted Robotic Endoscopy, Clinical Image-based Procedures, and Skin Image Analysis. Springer, 260–267.Google Scholar
- [6] . 2017. Community-based self generation of policies and processes for assets: Concepts and research directions. In Proceedings of the IEEE International Conference on Big Data (Big Data’17). IEEE, 2961–2969.Google ScholarCross Ref
- [7] . 2020. Research, Statistics, Data and Systems. Retrieved from https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/SynPUFs/DE_Syn_PUF.Google Scholar
- [8] . 2001. One-class SVM for learning in image retrieval. In Proceedings of the International Conference on Image Processing, Vol. 1. IEEE, 34–37.Google Scholar
- [9] . 2011. View generation for multiview maximum disagreement-based active learning for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 50, 5 (2011), 1942–1954.Google ScholarCross Ref
- [10] . 2021. Health care fraud classifiers in practice. Appl. Stochast. Models Bus. Industry 37, 6 (2021) 1182–1199.Google ScholarCross Ref
- [11] . 2018. Statistical medical fraud assessment: Exposition to an emerging field. Int. Stat. Rev. 86, 3 (2018), 379–402.Google ScholarCross Ref
- [12] . 2019. An unsupervised Bayesian hierarchical method for medical fraud assessment. Stat. Anal. Data Min.: ASA Data Sci. J. 12, 2 (2019), 116–124.Google ScholarDigital Library
- [13] . 2020. Search for and lookup ICD 10 Codes, CPT Codes, HCPCS Codes, ICD 9 Codes, medical terms, medical newsletters, medicare documents and more. Retrieved from https://www.findacode.com/search/search.php.Google Scholar
- [14] . 2020. Image Generated by Free Icons. Retrieved from https://fontawesome.com/license/free.Google Scholar
- [15] . 2018. Synthetic data augmentation using GAN for improved liver lesion classification. In Proceedings of the IEEE 15th international symposium on biomedical imaging (ISBI’18). IEEE, 289–293.Google ScholarCross Ref
- [16] . 2018. An efficient fraud identification method combining manifold learning and outliers detection in mobile healthcare services. IEEE Access 6 (2018), 60059–60068.Google ScholarCross Ref
- [17] . 2019. Consequences of model misspecification for maximum likelihood estimation with missing data. Econometrics 7, 3 (2019), 37.Google ScholarCross Ref
- [18] . 2018. Long text generation via adversarial training with leaked information. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.Google ScholarCross Ref
- [19] . 2020. A Bipartite Graph-based Representation Learning for Healthcare Claims and Its Application to Fraudulent Claim Identification. Ph.D. Dissertation. University of Louisiana at Lafayette.Google ScholarDigital Library
- [20] . 2021. Identifying health insurance claim frauds using mixture of clinical concepts. IEEE Trans. Serv. Comput. (2021).Google ScholarCross Ref
- [21] . 2012. Missing data in clinical studies: Issues and methods. J. Clin. Oncol. 30, 26 (2012), 3297.Google ScholarCross Ref
- [22] . 2018. Skin Lesion Analysis Towards Melanoma Detection. Retrieved from https://challenge2018.isic-archive.com/.Google Scholar
- [23] . 2019. Multiset feature learning for highly imbalanced data classification. IEEE Trans. Pattern Anal. Mach. Intell. 43, 1 (2019), 139–156.Google Scholar
- [24] . 1977. Efficient algorithms for shortest paths in sparse networks. J. ACM 24, 1 (1977), 1–13.Google ScholarDigital Library
- [25] . 2017. Framework for the identification of fraudulent health insurance claims using association rule mining. In Proceedings of the IEEE Conference on Big Data and Analytics (ICBDA’17). IEEE, 99–104.Google ScholarCross Ref
- [26] . 2018. Foundations of the Theory of Probability: Second English Edition. Courier Dover Publications.Google Scholar
- [27] . 2017. Detecting representative data and generating synthetic samples to improve learning accuracy with imbalanced data sets. PloS One 12, 8 (2017), e0181853.Google ScholarCross Ref
- [28] . 2017. Adversarial ranking for language generation. In Advances in Neural Information Processing Systems. MIT Press, 3155–3165.Google ScholarDigital Library
- [29] . 2020. Sequence mining and prediction-based healthcare fraud detection methodology. IEEE Access 8 (2020), 143256–143273.Google ScholarCross Ref
- [30] . 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. MIT Press, 3111–3119.Google ScholarDigital Library
- [31] . 2020. Consumer Info and Action. Retrieved from https://www.nhcaa.org/resources/health-care-anti-fraud-resources/consumer-info-action.aspx.Google Scholar
- [32] . 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 311–318.Google Scholar
- [33] . 2019. Scalable remote homology detection and fold recognition in massive protein networks. Proteins: Struct., Funct. Bioinform. 87, 6 (2019), 478–491.Google ScholarCross Ref
- [34] . 2019. System and method of sentiment data generation.
U.S. Patent 10,198,506. Google Scholar - [35] . 2021. Weighted risk models for dynamic healthcare fraud detection. Risk Manage. Insur. Rev. 24, 2 (2021), 143–150.Google ScholarCross Ref
- [36] . 2020. Health care insurance fraud detection using blockchain. In Proceedings of the 7th International Conference on Software Defined Systems (SDS’20). IEEE, 145–152.Google ScholarCross Ref
- [37] . 2001. Estimating the support of a high-dimensional distribution. Neural Comput. 13, 7 (2001), 1443–1471.Google ScholarDigital Library
- [38] . 2021. Healthcare fraud detection using primitive sub peer group analysis. Concurr. Comput.: Pract. Exper. (2021), e6275.Google Scholar
- [39] . 2016. A fraud resilient medical insurance claim system. In Proceedings of the 30th AAAI Conference on Artificial Intelligence.Google ScholarCross Ref
- [40] . 2018. Medical image synthesis for data augmentation and anonymization using generative adversarial networks. In Proceedings of the International Workshop on Simulation and Synthesis in Medical Imaging. Springer, 1–11.Google ScholarCross Ref
- [41] . 2003. Generating a synthetic genome by whole genome assembly: \( \varphi \)X174 bacteriophage from synthetic oligonucleotides. Proc. Natl. Acad. Sci. U.S.A. 100, 26 (2003), 15440–15445.Google ScholarCross Ref
- [42] . 2005. Neighborhood formation and anomaly detection in bipartite graphs. In Proceedings of the 5th IEEE International Conference on Data Mining (ICDM’05). IEEE, 8.Google Scholar
- [43] . 2014. Synthetic test data generation for hierarchical graph clustering methods. In Proceedings of the International Conference on Neural Information Processing. Springer, 303–310.Google ScholarCross Ref
- [44] . 2018. Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record. J. Amer. Med. Inform. Assoc. 25, 3 (2018), 230–238.Google ScholarCross Ref
- [45] . 2019. Topic modelling for medical prescription fraud and abuse detection. J. Roy. Stat. Soc.: Ser. C (Appl. Stat.) 68, 3 (2019), 751–769.Google ScholarCross Ref
Index Terms
- Negative Insurance Claim Generation Using Distance Pooling on Positive Diagnosis-Procedure Bipartite Graphs
Recommendations
On the connectivity of bipartite distance-balanced graphs
A connected graph @C is said to be distance-balanced whenever for any pair of adjacent vertices u,v of @C the number of vertices closer to u than to v is equal to the number of vertices closer to v than to u. In [K. Handa, Bipartite graphs with balanced ...
Equistarable bipartite graphs
Recently, Milanič and Trotignon introduced the class of equistarable graphs as graphs without isolated vertices admitting positive weights on the edges such that a subset of edges is of total weight 1 if and only if it forms a maximal star. Based on ...
Bipartite subgraphs of triangle-free subcubic graphs
Suppose G is a graph with n vertices and m edges. Let n^' be the maximum number of vertices in an induced bipartite subgraph of G and let m^' be the maximum number of edges in a spanning bipartite subgraph of G. Then b(G)=m^'/m is called the bipartite ...
Comments