Skip to main content

Table 2 Summary of the Data Sets

From: Efficiency of random swap clustering

Data set

Ref.

Type of data

Vectors (N)

Clusters (k)

Vectors per cluster

Dimension (d)

Bridge

[35]

Gray-scale image

4096

256

16

16

House a

[35]

RGB image

34,112

256

133

3

Miss America

[35]

Residual vectors

6480

256

25

16

Europe

 

Diff. coordinates

169,673

256

663

2

BIRCH 1 –BIRCH 3

[33]

Artificial

100,000

100

1000

2

S 1 –S 4

[6]

Artificial

5000

15

333

2

Unbalance

[42]

Artificial

6500

8

821

2

Dim16–Dim1024

[24]

Artificial

1024

16

64

16–1024

KDD04-Bio

[34]

DNA sequences

145,751

2000

73

74

  1. For archive of the data sets: http://cs.uef.fi/sipu/datasets/
  2. aDuplicate data vectors are combined and their frequency information is stored instead