Skip to main content

Table 4 Median internal AUROC differences (with interquartile range) across all prediction problems for each database and classifier when choosing the sampling strategy with the highest AUROC during CV

From: Impact of random oversampling and random undersampling on the performance of prediction models developed using observational health data

Database

Number of prediction tasks

Lasso

Random forest

XGBoost

All classifiers

CCAE

14

− 0.0025 (0.0073)

0.0001 (0.0106)

0 (0.0067)

− 0.0004 (0.0076)

MDCD

17

− 0.0004 (0.0044)

0 (0.0062)

0 (0.0071)

0 (0.0068)

MDCR

19

0.0000 (0.0052)

0.0037 (0.0143)

0 (0.0057)

0 (0.0075)

IQVIA Germany

8

− 0.0011 (0.0048)

0.0012 (0.0095)

− 0.0045 (0.0204)

− 0.0010 (0.0098)

All databases

58

− 0.0004 (0.0053)

0.0008 (0.0099)

0 (0.0074)

0 (0.0081)