From: Severely imbalanced Big Data challenges: investigating data sampling approaches
Learner | Gradient-Boosted Trees | Logistic Regression | Random Forest | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
RUS | ROS | SMOTE | SMOTEb1 | SMOTEb2 | ADASYN | RUS | ROS | SMOTE | SMOTEb1 | SMOTEb2 | ADASYN | RUS | ROS | SMOTE | SMOTEb1 | SMOTEb2 | ADASYN | ||
AUC | Ratio | a | b | b | b | b | b | a | b | b | b | b | c | a | a | a | b | ab | c |
(50:50) | a | – | – | – | – | b | a | – | – | – | – | cd | a | a | a | – | b | c | |
(65:35) | a | – | – | – | – | a | a | – | – | – | – | d | a | b | b | – | b | c | |
(75:25) | a | – | – | – | – | ab | ab | – | – | – | – | d | b | c | c | – | b | c | |
(90:10) | a | – | – | – | – | ab | b | – | – | – | – | c | c | d | d | – | a | b | |
(99:1) | b | – | – | – | – | b | c | – | – | – | – | a | c | e | e | – | b | b | |
(All:all) | c | – | – | – | – | ab | c | – | – | – | – | b | d | f | f | – | b | a | |
GM | Ratio | a | b | b | c | c | c | a | b | bc | bc | c | d | a | a | a | b | b | b |
(50:50) | a | a | a | – | – | a | a | – | – | – | – | a | a | a | a | NA | NA | NA | |
(65:35) | a | a | a | – | – | a | a | – | – | – | – | b | b | a | a | NA | NA | NA | |
(75:25) | a | ab | ab | – | – | ab | ab | – | – | – | – | c | b | b | b | NA | NA | NA | |
(90:10) | ab | ab | ab | – | – | ab | b | – | – | – | – | c | c | b | c | NA | NA | NA | |
(99:1) | b | bc | bc | – | – | bc | c | – | – | – | – | c | d | c | d | NA | NA | NA | |
(All:all) | c | c | c | – | – | c | c | – | – | – | – | c | d | c | d | NA | NA | NA |