☆ 4.3 Article

The effects of varying class distribution on learner behavior for medicare fraud detection with imbalanced big data

HEALTH INFORMATION SCIENCE AND SYSTEMS (2018)

期刊

HEALTH INFORMATION SCIENCE AND SYSTEMS

卷 6, 期 -, 页码 -

出版社

SPRINGER

DOI: 10.1007/s13755-018-0051-3

关键词

Medicare fraud; Class imbalance; Random undersampling; Big data

类别

Medical Informatics

资金

NSF [CNS-1427536]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Healthcare in the United States is a critical aspect of most people's lives, particularly for the aging demographic. This rising elderly population continues to demand more cost-effective healthcare programs. Medicare is a vital program serving the needs of the elderly in the United States. The growing number of Medicare beneficiaries, along with the enormous volume of money in the healthcare industry, increases the appeal for, and risk of, fraud. In this paper, we focus on the detection of Medicare Part B provider fraud which involves fraudulent activities, such as patient abuse or neglect and billing for services not rendered, perpetrated by providers and other entities who have been excluded from participating in Federal healthcare programs. We discuss Part B data processing and describe a unique process for mapping fraud labels with known fraudulent providers. The labeled big dataset is highly imbalanced with a very limited number of fraud instances. In order to combat this class imbalance, we generate seven class distributions and assess the behavior and fraud detection performance of six different machine learning methods. Our results show that RF100 using a 90: 10 class distribution is the best learner with a 0.87302 AUC. Moreover, learner behavior with the 50: 50 balanced class distribution is similar to more imbalanced distributions which keep more of the original data. Based on the performance and significance testing results, we posit that retaining more of the majority class information leads to better Medicare Part B fraud detection performance over the balanced datasets across the majority of learners.

The effects of varying class distribution on learner behavior for medicare fraud detection with imbalanced big data

期刊

HEALTH INFORMATION SCIENCE AND SYSTEMS

出版社

SPRINGER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

The effects of varying class distribution on learner behavior for medicare fraud detection with imbalanced big data

期刊

HEALTH INFORMATION SCIENCE AND SYSTEMS

出版社

SPRINGER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文