From: Data reduction techniques for highly imbalanced medicare Big Data
Dataset
Instance count
Fraudulent
Ratio fraudulent
Number of features
Part D
5,344,106
3,700
0.0693%
80
Part B
8,669,497
3,954
0.0456%
82