Data reduction techniques for highly imbalanced medicare Big Data

Journal of Big Data

Table 13 Mean AUPRC values by classifier and number of features for ten iterations of five-fold cross validation, for part B scenario two

Features classifier	10	15	20	25	30	80
CatBoost	0.6581	0.6792	0.7069	0.7009	0.7016	0.6817
ET	0.0400	0.0462	0.0443	0.0524	0.0424	0.0433
LightGBM	0.3939	0.3830	0.4261	0.4589	0.4293	0.4146
Logistic regression	0.0093	0.0326	0.0338	0.0065	0.0064	0.0103
Random forest	0.4356	0.3990	0.3736	0.3800	0.3395	0.2462
XGBoost	0.6611	0.6715	0.6995	0.6956	0.6955	0.6886