Data reduction techniques for highly imbalanced medicare Big Data

Journal of Big Data

Table 11 Mean AUPRC values by classifier and number of features for ten iterations of five-fold cross validation, for part D scenario four

Features classifier	10	15	20	25	30	82
CatBoost	0.7546	0.7992	0.7914	0.7965	0.7926	0.7798
ET	0.4765	0.4228	0.3857	0.3967	0.3639	0.2401
LightGBM	0.7073	0.7268	0.6971	0.6974	0.6857	0.6783
Logistic regression	0.2609	0.2785	0.2358	0.2461	0.2613	0.2700
Random forest	0.4209	0.4639	0.3311	0.3553	0.3392	0.2199
XGBoost	0.7471	0.7743	0.7550	0.7524	0.7476	0.7351