Skip to main content

Table 10 Performance evaluation of all measures when, NF = 3000– the averaged results (K = 1-120; +2)

From: A set theory based similarity measure for text clustering and classification

No

Dataset

Reuters

Web-KB

Similarity/criterion

ACC

PRE

REC

FM

GM

AMP

ACC

PRE

REC

FM

GM

AMP

1

Euclidean

0.648

0.743

0.317

0.376

0.533

0.293

0.570

0.777

0.447

0.446

0.607

0.340

2

Cosine

0.902

0.904

0.717

0.771

0.837

0.654

0.769

0.814

0.702

0.723

0.801

0.621

3

Jaccard

0.868

0.807

0.557

0.610

0.735

0.495

0.782

0.858

0.683

0.701

0.790

0.621

4

Bhattacharya

0.888

0.861

0.682

0.693

0.817

0.590

0.533

0.688

0.525

0.433

0.666

0.406

5

kullback–Leibler

0.508

0.151

0.129

0.091

0.336

0.336

0.389

0.167

0.250

0.141

0.433

0.250

6

Manhattan

0.534

0.408

0.166

0.152

0.377

0.161

0.445

0.748

0.309

0.246

0.488

0.297

7

PDSM

0.912

0.891

0.709

0.748

0.834

0.632

0.799

0.851

0.712

0.730

0.811

0.644

8

STB-SM

0.916

0.916

0.749

0.795

0.858

0.689

0.788

0.845

0.702

0.717

0.803

0.629

  1. Italic values indicate the highest values that top measures achieved for corresponding evaluation metrics