Skip to main content

Advertisement

Table 2 Summary of the Data Sets

From: Efficiency of random swap clustering

Data set Ref. Type of data Vectors (N) Clusters (k) Vectors per cluster Dimension (d)
Bridge [35] Gray-scale image 4096 256 16 16
House a [35] RGB image 34,112 256 133 3
Miss America [35] Residual vectors 6480 256 25 16
Europe   Diff. coordinates 169,673 256 663 2
BIRCH 1 –BIRCH 3 [33] Artificial 100,000 100 1000 2
S 1 –S 4 [6] Artificial 5000 15 333 2
Unbalance [42] Artificial 6500 8 821 2
Dim16–Dim1024 [24] Artificial 1024 16 64 16–1024
KDD04-Bio [34] DNA sequences 145,751 2000 73 74
  1. For archive of the data sets: http://cs.uef.fi/sipu/datasets/
  2. aDuplicate data vectors are combined and their frequency information is stored instead