Skip to main content

Table 2 Generated sampled sizes with RUS

From: Examining characteristics of predictive models with imbalanced big data

Dataset

Ratio

Positives

Negatives

Negatives %a

Total

ECBDL’14

(90:10)

687,729

6,189,561

19.772

6,877,290

(75:25)

687,729

2,063,187

6.591

2,750,916

(65:35)

687,729

1,277,211

4.080

1,964,940

(50:50)

687,729

687,729

2.197

1,375,458

(45:55)

687,729

562,687

1.797

1,250,416

(40:60)

687,729

458,486

1.465

1,146,215

POST

(90:10)

2391

21,519

1.270

23,910

(75:25)

2391

7173

0.423

9564

(65:35)

2391

4440

0.262

6831

(50:50)

2391

2391

0.141

4782

(45:55)

2391

1956

0.115

4347

(40:60)

2391

1594

0.094

3985

  1. aPercentages of negatives are calculated based on the negative (unsampled) total size