Big data analytics of predicting annual US Medicare billing claims with health services | IEEE Conference Publication | IEEE Xplore

Big data analytics of predicting annual US Medicare billing claims with health services


Abstract:

This paper investigated the use of large public use files (PUFs) of US Medicare claims in the form of big data analytics to predict claim amounts in US dollars (USD) and ...Show More

Abstract:

This paper investigated the use of large public use files (PUFs) of US Medicare claims in the form of big data analytics to predict claim amounts in US dollars (USD) and large spending anomalies across hundreds of health services documented in the data set. There were two main research questions to better understand content and use of PUFs of US Medicare. One question was related to understanding the dataset and the parameters that could predict the total submitted billing claims for one year (i.e. 2017 fiscal year in USD). The second question was to establish whether or not anomalies in health service costs could be detected. Null hypothesis was that there are no significant variables in the general linear model (GLM) of the regression analysis. The hypothesis related to factors of type and frequency of health services, total HCPCS (Healthcare Common Procedural Coding System), population (total beneficiaries), age, provider specialty, chronic disease, states and regions could be significant in the classification and regression models.The 2017 Medicare Claims dataset, publically provided by Centers for Medicare & Medicaid Services (CMS), was 291 MB and consisted of >30 columns and ~1,048,576 rows. The methodology followed data mining techniques to general linear regression to derive model fitting that compared the model residuals. From the residuals, multivariate outlier detection was carried out that included k-means clustering and principal component analysis.The results showed a correlation R2 of 52% with health services and submitted Medicare amounts (USD) with thousands of outliers. Total services variable was highly significant with the total amount of submitted claims (maximum of 1025413240). HCPCS was not significant. There was also a strong correlation of Medicare costs to larger population in states with larger cities, especially in California, Florida, New York, and Texas. However, regions, States, cities, zip codes, and other divisions of the US states a...
Date of Conference: 17-20 December 2022
Date Added to IEEE Xplore: 26 January 2023
ISBN Information:
Conference Location: Osaka, Japan

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.