Paper
|
Data sets
|
Data type
|
Class count
|
Data set size
|
Min class size
|
Max class size
|
\(\rho\) (Eq. 1)
|
---|
[79]
|
CIFAR-10
|
Image
|
10
|
60,000
|
2340
|
3900
|
2.3
|
[20]
|
WHOI-Plankton
|
Image
|
103
|
3,400,000
|
< 3500
|
2,300,000
|
657
|
[21]
|
Public cameras
|
Image
|
19
|
10,000
|
14
|
6986
|
499
|
[18]
|
CIFAR-100 (1)
|
Image
|
2
|
6000
|
150
|
3000
|
20
|
CIFAR-100 (2)
|
Image
|
2
|
1200
|
30
|
600
|
20
|
CIFAR-100 (3)
|
Image
|
2
|
1200
|
30
|
600
|
20
|
20 News Group (1)
|
Text
|
2
|
1200
|
30
|
600
|
20
|
20 News Group (2)
|
Text
|
2
|
1200
|
30
|
600
|
20
|
[88]
|
COCO
|
Image
|
2
|
115,000
|
10
|
100,000
|
10,000
|
[103]
|
Building changes
|
Image
|
6
|
203,358
|
222
|
200,000
|
900
|
[89]
|
GHW
|
Structured
|
2
|
2565
|
406
|
2159
|
5.3
|
ORP
|
Structured
|
2
|
700
|
124
|
576
|
4.6
|
[19]
|
MNIST
|
Image
|
10
|
70,000
|
600
|
6000
|
10
|
CIFAR-100
|
Image
|
100
|
60,000
|
60
|
600
|
10
|
CALTECH-101
|
Image
|
102
|
9144
|
15
|
30
|
2
|
MIT-67
|
Image
|
67
|
6700
|
10
|
100
|
10
|
DIL
|
Image
|
10
|
1300
|
24
|
331
|
13
|
MLC
|
Image
|
9
|
400,000
|
2600
|
196,900
|
76
|
[90]
|
KEEL
|
Structured
|
2
|
3339
|
26
|
3313
|
128
|
[91]
|
CIFAR-10
|
Image
|
10
|
60,000
|
250
|
5000
|
20
|
CIFAR-100
|
Image
|
100
|
60,000
|
25
|
500
|
20
|
[22]
|
CelebA
|
Image
|
2
|
160,000
|
3200
|
156,800
|
49
|
[117]
|
MNIST
|
Image
|
10
|
60,000
|
50
|
5000
|
100
|
MNIST-back-rot
|
Image
|
10
|
62,000
|
12
|
1200
|
100
|
CIFAR-10
|
Image
|
10
|
60,000
|
5000
|
5000
|
1
|
SVHN
|
Image
|
10
|
99,000
|
73
|
7300
|
100
|
STL-10
|
Image
|
10
|
13,000
|
500
|
500
|
1
|
[118]
|
CelebA
|
Image
|
2
|
160,000
|
3200
|
156,800
|
49
|
[92]
|
EmotioNet
|
Image
|
2
|
450,000
|
45
|
449,955
|
10,000
|
[23]
|
MNIST
|
Image
|
10
|
60,000
|
1
|
5000
|
5000
|
CIFAR-10
|
Image
|
10
|
60,000
|
100
|
5000
|
50
|
ImageNet
|
Image
|
1000
|
1,050,000
|
10
|
1000
|
100
|
- Images from CelebA and EmotioNet are treated as a set of binary classification problems, because they are each annotated with 40 and 11 binary attributes, respectively. The COCO data class imbalance arises from the extreme imbalance between background and foreground concepts