From: Severely imbalanced Big Data challenges: investigating data sampling approaches
Ratio | No sampling | Undersampling | Oversampling | |||||
---|---|---|---|---|---|---|---|---|
Negatives | Positives | Negatives | Positives | Negatives % | Negatives | Positives | Positives% | |
(All:all) | 1,575,234 | 4255 | – | – | – | – | – | – |
(99:1) | – | – | 421,245 | 4255 | 26.74 | 1,575,234 | 15,911 | 373.95 |
(90:10) | – | – | 38,295 | 4255 | 2.43 | 1,575,234 | 175,026 | 4113.42 |
(75:25) | – | – | 12,765 | 4255 | 0.81 | 1,575,234 | 525,078 | 12,340.26 |
(65:35) | – | – | 7902 | 4255 | 0.50 | 1,575,234 | 848,203 | 19,934.26 |
(50:50) | – | – | 4255 | 4255 | 0.27 | 1,575,234 | 1,575,234 | 37,020.78 |