Skip to main content

Table 6 Comparison of the estimated number of clusters produced by different methods on the synthetic datasets

From: An ensemble method for estimating the number of clusters in a big data set using multiple random samples

Dataset

K

es

nselectboot

kluster

X-means

Elbow

Silhouette

Gap statistic

RPSCE

DS1

10

5%

13.6 ± 1.6 (0.36)

9.8 ± 0.4 (0.03)

18.8 ± 3.3 (0.83)

9.3 ± 1.2 (0.09)

9.6 ± 0.6 (0.05)

12.6 ± 1.2 (0.26)

9.7 ± 0.8 (0.07)

 

10%

13.3 ± 1.3 (0.33)

9.8 ± 0.4 (0.02)

22.2 ± 5.9 (1.22)

9.8 ± 1.5 (0.12)

9.8 ± 0.4 (0.02)

12.3 ± 0.9 (0.23)

10.0 ± 0.0 (0.00)

 

20%

14.0 ± 1.4 (0.40)

10.0 ± 0.0 (0.00)

20.2 ± 4.8 (1.02)

9.7 ± 1.6 (0.13)

9.8 ± 0.5 (0.03)

12.2 ± 1.1 (0.22)

10.0 ± 0.0 (0.00)

 

30%

13.8 ± 1.1 (0.38)

10.0 ± 0.0 (0.00)

23.2 ± 2.5 (1.32)

8.5 ± 0.8 (0.15)

9.8 ± 0.4 (0.02)

11.7 ± 1.2 (0.17)

10.0 ± 0.0 (0.00)

 

40%

13.4 ± 1.5 (0.34)

10.0 ± 0.0 (0.00)

22.8 ± 3.8 (1.28)

9.0 ± 1.3 (0.10)

NA

12.8 ± 0.5 (0.28)

10.0 ± 0.0 (0.00)

 

50%

14.4 ± 0.9 (0.44)

10.0 ± 0.0 (0.00)

19.4 ± 1.9 (0.94)

10.2 ± 0.8 (0.05)

NA

12.5 ± 1.0 (0.25)

10.0 ± 0.0 (0.00)

DS2

20

5%

28.7 ± 1.1 (0.47)

14.6 ± 0.5 (0.28)

43.0 ± 2.5 (1.15)

8.3 ± 1.3 (0.59)

20.2 ± 0.7 (0.02)

21.0 ± 0.7 (0.06)

18.7 ± 1.3 (0.11)

 

10%

29.3 ± 0.8 (0.47)

14.8 ± 0.4 (0.26)

43.8 ± 1.9 (1.20)

8.0 ± 1.2 (0.60)

19.8 ± 0.4 (0.02)

20.6 ± 0.5 (0.04)

19.6 ± 0.7 (0.07)

 

20%

28.7 ± 0.8 (0.44)

14.8 ± 0.4 (0.26)

42.8 ± 3.7 (1.14)

8.6 ± 1.8 (0.57)

20.2 ± 0.4 (0.02)

21.0 ± 0.7 (0.07)

19.9 ± 0.8 (0.02)

 

30%

29.5 ± 0.5 (0.48)

15.0 ± 0.0 (0.25)

42.6 ± 3.9 (1.13)

9.0 ± 2.1 (0.55)

NA

20.8 ± 0.4 (0.06)

20.0 ± 0.0 (0.01)

 

40%

29.0 ± 1.5 (0.45)

15.0 ± 0.0 (0.25)

46.8 ± 1.6 (1.14)

8.5 ± 1.0 (0.58)

NA

21.0 ± 0.7 (0.07)

20.0 ± 0.0 (0.00)

 

50%

29.2 ± 0.1 (0.46)

15.0 ± 0.0 (0.25)

40.8 ± 6.6 (1.04)

9.2 ± 1.0 (0.54)

NA

20.8 ± 0.4 (0.06)

20.0 ± 0.0 (0.00)

DS3

30

5%

39.6 ± 0.8 (0.32)

14.3 ± 0.8 (0.52)

40.1 ± 19.4 (0.68)

7.3 ± 1.2 (0.76)

21.3 ± 1.6 (0.41)

31.7 ± 0.5 (0.06)

23.1 ± 1.3 (0.24)

 

10%

39.3 ± 0.7 (0.31)

14.5 ± 0.5 (0.51)

40.2 ± 19.1 (0.66)

6.4 ± 1.3 (0.79)

27.7 ± 0.9 (0.08)

32.8 ± 1.6 (0.09)

25.2 ± 1.2 (0.16)

 

20%

38.8 ± 1.6 (0.29)

14.7 ± 0.5 (0.51)

40.4 ± 20.4 (0.70)

6.9 ± 1.3 (0.77)

25.5 ± 1.7 (0.18)

31.2 ± 1.4 (0.05)

27.7 ± 1.7 (0.09)

 

30%

39.6 ± 0.5 (0.32)

14.8 ± 0.4 (0.51)

39.0 ± 20.4 (0.70)

7.2 ± 0.4 (0.76)

NA

32.6 ± 1.7 (0.09)

28.1 ± 1.8 (0.06)

 

40%

39.4 ± 0.9 (0.31)

14.8 ± 0.5 (0.51)

40.4 ± 20.4 (0.70)

7.2 ± 1.3 (0.76)

NA

31.8 ± 0.8 (0.06)

28.4 ± 0.9 (0.05)

 

50%

38.0 ± 1.2 (0.27)

14.8 ± 0.4 (0.51)

48.2 ± 3.5 (0.61)

7.1 ± 1.4 (0.75)

NA

NA

29.3 ± 1.8 (0.03)

DS4

40

5%

49.5 ± 0.7 (0.24)

13.3 ± 1.9 (0.66)

37.0 ± 24.2 (0.38)

8.1 ± 0.9 (0.80)

29.5 ± 4.1 (0.36)

45.5 ± 3.5 (0.12)

25.4 ± 1.1 (0.37)

 

10%

48.6 ± 1.6 (0.22)

13.8 ± 1.5 (0.65)

36.7 ± 27.5 (0.58)

8.4 ± 1.1 (0.79)

42.3 ± 2.8 (0.05)

43.8 ± 2.4 (0.10)

27.2 ± 1.9 (0.30)

 

20%

48.2 ± 1.6 (0.21)

14.0 ± 1.3 (0.65)

37.7 ± 25.7 (0.61)

8.5 ± 0.5 (0.79)

34.6 ± 2.7 (0.15)

44.9 ± 1.8 (0.11)

31.6 ± 2.2 (0.22)

 

30%

47.8 ± 1.6 (0.20)

14.5 ± 0.8 (0.63)

41.3 ± 28.9 (0.63)

9.5 ± 0.8 (0.76)

36.7 ± 2.5 (0.09)

45.3 ± 1.6 (0.12)

34.2 ± 2.8 (0.11)

 

40%

48.8 ± 1.1 (0.22)

14.3 ± 1.2 (0.64)

51.7 ± 21.1 (0.55)

8.8 ± 1.3 (0.78)

NA

45.3 ± 1.8 (0.12)

36.6 ± 2.5 (0.09)

 

50%

47.8 ± 1.5 (0.20)

14.7 ± 0.5 (0.63)

42.9 ± 26.7 (0.64)

8.5 ± 1.0 (0.79)

NA

NA

38.1 ± 1.3 (0.05)

DS5

50

5%

58.4 ± 2.2 (0.17)

14.3 ± 1.0 (0.71)

35.9 ± 33.6 (0.63)

7.3 ± 0.7 (0.86)

41.3 ± 3.6 (0.21)

53.6 ± 1.4 (0.07)

37.2 ± 2.8 (0.26)

 

10%

58.7 ± 1.5 (0.17)

14.8 ± 0.4 (0.70)

36.7 ± 30.2 (0.59)

7.5 ± 0.8 (0.85)

41.7 ± 4.5 (0.20)

52.8 ± 2.5 (0.05)

41.4 ± 1.9 (0.21)

 

20%

58.6 ± 2.1 (0.18)

14.5 ± 0.8 (0.71)

30.4 ± 36.1 (0.70)

7.5 ± 1.2 (0.85)

52.1 ± 1.8 (0.04)

54.4 ± 1.3 (0.08)

43.8 ± 0.8 (0.14)

 

30%

58.2 ± 2.2 (0.16)

14.3 ± 1.0 (0.71)

37.2 ± 36.0 (0.69)

8.8 ± 0.9 (0.83)

52.5 ± 4.2 (0.04)

54.1 ± 0.9 (0.05)

45.6 ± 1.5 (0.10)

 

40%

59.2 ± 0.8 (0.18)

14.7 ± 0.5 (0.71)

32.3 ± 35.3 (0.71)

8.8 ± 0.8 (0.82)

NA

NA

45.9 ± 0.8 (0.09)

 

50%

57.4 ± 1.8 (0.15)

14.8 ± 0.4 (0.70)

20.6 ± 30.5 (0.75)

7.3 ± 1.4 (0.85)

NA

NA

46.5 ± 1.1 (0.07)

  1. The average value of 20 runs is displayed together with “ ± ” standard deviation. The best and second best results are shown in bold and underlined, respectively. The values in parenthesis indicate the mean relative error
  2. Sample size for RSPCE is \(n=5000\); K is the true number of clusters in the dataset; and es is the ensemble size. The percentage was selected randomly
  3. NA indicates not available, i.e., the value cannot be computed