Skip to main content

Table 4 Stability validations on the results of the RSPCE algorithm on synthetic datasets with 2 sample sizes, A = 5000 and B = 10,000 (lower value is better)

From: An ensemble method for estimating the number of clusters in a big data set using multiple random samples

Dataset

\(n \backslash es\)

APN

ADM

5%

10%

20%

30%

40%

50%

5%

10%

20%

30%

40%

50%

DS1

A

0.0070

0

0

0

0

0

0.4648

0.4736

0.4666

0.4660

0.4697

0.4603

B

0.0140

0.0050

0

0

0

0

0.4768

0.4751

0.4640

0.4628

0.4618

0.4599

DS2

A

0.0042

0.0009

0

0

0

0

0.5765

0.5762

0.5802

0.5802

0.5804

0.5800

B

0.0045

0.0021

0.0011

0.0006

0.0001

0

0.5837

0.5785

0.5810

0.5805

0.5803

0.5801

DS3

A

0.0110

0.0077

0.0044

0.0031

0.0022

0.0013

0.5206

0.5236

0.5274

0.5262

0.5285

0.5304

B

0.0300

0.0240

0.0215

0.0185

0.0144

0.0086

0.5211

0.5171

0.5208

0.5209

0.5220

0.5253

DS4

A

0.0417

0.0345

0.0270

0.0210

0.0160

0.0123

0.5065

0.5059

0.5089

0.5084

0.5071

0.5073

B

0.0540

0.0462

0.0370

0.0288

0.0260

0.0240

0.5045

0.5079

0.5082

0.5039

0.5063

0.5027

DS5

A

0.0396

0.0288

0.0234

0.0190

0.0169

0.0124

0.4999

0.5004

0.5015

0.5026

0.5023

0.5022

B

0.0498

0.0410

0.0306

0.0266

0.0244

0.0226

0.4980

0.4953

0.4977

0.4982

0.4987

0.4994