From: Severely imbalanced Big Data challenges: investigating data sampling approaches
Ratio | No sampling | Undersampling | Oversampling | |||||
---|---|---|---|---|---|---|---|---|
Negatives | Positives | Negatives | Positives | Negatives % | Negatives | Positives | Positives% | |
(All:all) | 607,414 | 379 | – | – | – | – | – | – |
(99:1) | – | – | 37,521 | 379 | 6.18 | 607,414 | 6,135 | 1,618.86 |
(90:10) | – | – | 3,411 | 379 | 0.56 | 607,414 | 67,490 | 17,807.51 |
(75:25) | – | – | 1,137 | 379 | 0.19 | 607,414 | 202,471 | 53,422.52 |
(65:35) | – | – | 704 | 379 | 0.12 | 607,414 | 327,069 | 86,297.91 |
(50:50) | – | – | 379 | 379 | 0.06 | 607,414 | 607,414 | 160,267.55 |