Skip to main content

Table 5 Results of two-samples KS-test and Z-test on two distance distributions among the actual cluster centers and the estimated cluster centers of the synthetic datasets by the RSPCE algorithm

From: An ensemble method for estimating the number of clusters in a big data set using multiple random samples

Dataset

n

 

KS-test

Z-test

es

h

k

p

h

z

p

DS1

A

5%

0

0.083

0.998

0

0.407

0.684

10%

0

0.111

0.930

0

− 0.712

0.476

20%

0

0.089

0.992

0

− 0.381

0.703

30%

0

0.089

0.992

0

− 0.461

0.645

40%

0

0.089

0.992

0

− 0.586

0.558

50%

0

0.111

0.930

0

− 0.518

0.605

B

5%

0

0.172

0.552

0

− 0.764

0.445

10%

0

0.144

0.766

0

− 0.932

0.352

20%

0

0.089

0.992

0

− 0.439

0.661

30%

0

0.089

0.992

0

− 0.419

0.675

40%

0

0.067

1.000

0

− 0.178

0.859

50%

0

0.067

1.000

0

− 0.090

0.928

DS2

A

5%

0

0.027

1.000

0

0.130

0.897

10%

0

0.047

0.981

0

− 0.422

0.673

20%

0

0.047

0.981

0

− 0.422

0.673

30%

0

0.047

0.981

0

− 0.427

0.670

40%

0

0.047

0.981

0

− 0.427

0.670

50%

0

0.037

0.999

0

− 0.356

0.722

B

5%

0

0.068

0.841

0

− 0.816

0.414

10%

0

0.036

1.000

0

0.126

0.900

20%

0

0.047

0.981

0

− 0.422

0.673

30%

0

0.053

0.950

0

− 0.445

0.656

40%

0

0.047

0.981

0

− 0.445

0.656

50%

0

0.042

0.995

0

− 0.424

0.672

DS3

A

5%

0

0.037

0.975

0

0.675

0.499

10%

0

0.029

0.998

0

− 0.250

0.802

20%

0

0.027

0.998

0

0.597

0.551

30%

0

0.027

0.998

0

0.597

0.551

40%

0

0.012

1.000

0

0.029

0.977

50%

0

0.018

1.000

0

0.088

0.930

B

5%

0

0.027

1.000

0

0.387

0.699

10%

0

0.043

0.935

0

0.793

0.428

20%

0

0.044

0.901

0

0.635

0.525

30%

0

0.057

0.606

0

1.277

0.202

40%

0

0.033

0.988

0

0.583

0.560

50%

0

0.022

1.000

0

0.092

0.927

DS4

A

5%

0

0.044

0.811

0

− 0.986

0.324

10%

0

0.031

0.986

0

0.239

0.811

20%

0

0.038

0.795

0

0.928

0.353

30%

0

0.039

0.750

0

0.884

0.377

40%

0

0.022

0.998

0

0.117

0.907

50%

0

0.030

0.930

0

0.419

0.675

B

5%

0

0.073

0.382

0

1.347

0.178

10%

0

0.031

0.995

0

− 0.066

0.947

20%

0

0.046

0.833

0

− 0.788

0.431

30%

0

0.050

0.707

0

− 1.024

0.306

40%

0

0.036

0.939

0

− 0.243

0.808

50%

0

0.036

0.853

0

0.933

0.351

DS5

A

5%

0

0.017

1.000

0

0.002

0.999

10%

0

0.018

0.999

0

0.182

0.856

20%

0

0.013

1.000

0

− 0.140

0.889

30%

0

0.042

0.381

0

− 0.547

0.584

40%

0

0.024

0.947

0

− 1.639

0.101

50%

0

0.020

0.987

0

− 0.291

0.771

B

5%

0

0.022

1.000

0

0.086

0.931

10%

0

0.029

0.959

0

0.149

0.882

20%

0

0.022

0.988

0

0.481

0.631

30%

0

0.012

1.000

0

− 0.023

0.981

40%

0

0.024

0.965

0

0.354

0.723

50%

0

0.018

0.998

0

− 0.006

0.995

  1. Here, k and z refer to the test statistics for KS-test and Z-test, respectively. \(h=0\) indicates that the test does not reject the null hypothesis at the 5% significance level, and p-values are probabilities of the positive results