From: Severely imbalanced Big Data challenges: investigating data sampling approaches
Learner | Gradient-Boosted Trees | Logistic Regression | Random Forest | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
RUS | ROS | SMOTE | SMOTEb1 | SMOTEb2 | ADASYN | RUS | ROS | SMOTE | SMOTEb1 | SMOTEb2 | ADASYN | RUS | ROS | SMOTE | SMOTEb1 | SMOTEb2 | ADASYN | ||
AUC | Ratio | a | b | b | d | d | c | a | b | a | b | c | a | a | b | b | c | c | d |
(50:50) | d | a | bc | c | c | c | b | a | abc | b | b | – | c | a | b | bc | b | bc | |
(65:35) | bc | b | bc | c | c | c | a | b | a | b | b | – | bc | bc | b | bc | b | bc | |
(75:25) | ab | c | bc | c | c | c | a | c | ab | b | b | – | b | c | b | bc | b | c | |
(90:10) | a | d | c | bc | bc | c | a | d | abc | b | b | – | a | c | b | c | b | bc | |
(99:1) | b | c | b | b | b | b | a | e | c | b | b | – | b | b | b | b | b | b | |
(All:all) | c | a | a | a | a | a | a | a | bc | a | a | – | c | a | a | a | a | a | |
GM | Ratio | a | b | c | c | c | d | a | ab | a | b | b | ab | a | d | b | cd | d | c |
(50:50) | a | a | a | a | a | a | a | a | a | a | a | a | a | a | a | a | a | a | |
(65:35) | b | b | b | b | b | b | a | b | b | b | b | b | b | b | b | b | b | b | |
(75:25) | c | c | c | c | c | c | b | c | c | c | c | c | c | c | c | b | b | c | |
(90:10) | d | d | d | d | d | d | c | d | d | d | d | d | d | c | d | c | c | d | |
(99:1) | e | e | e | e | e | d | d | e | e | e | e | e | e | c | e | d | d | e | |
(All:all) | f | e | e | e | f | d | e | f | f | f | f | f | f | c | f | e | d | e |