Skip to main content

Table 11 Performance evaluation of all measures when, NF = 6000–the averaged results (K = 1–120; + 2)

From: A set theory based similarity measure for text clustering and classification

No

Dataset

Reuters

Web-KB

Similarity/criterion

ACC

PRE

REC

FM

GM

AMP

ACC

PRE

REC

FM

GM

AMP

1

Euclidean

0.627

0.724

0.302

0.356

0.519

0.279

0.550

0.771

0.428

0.422

0.590

0.385

2

Cosine

0.899

0.902

0.717

0.769

0.837

0.651

0.766

0.810

0.698

0.717

0.798

0.616

3

Jaccard

0.867

0.813

0.555

0.609

0.734

0.493

0.784

0.858

0.684

0.700

0.791

0.621

4

Bhattacharya

0.888

0.867

0.683

0.693

0.818

0.590

0.533

0.689

0.525

0.433

0.666

0.406

5

kullback–Leibler

0.503

0.163

0.128

0.089

0.335

0.128

0.218

0.078

0.248

0.090

0.431

0.225

6

Manhattan

0.530

0.401

0.164

0.148

0.375

0.160

0.435

0.688

0.300

0.230

0.479

0.290

7

PDSM

0.912

0.899

0.709

0.754

0.834

0.639

0.802

0.854

0.717

0.734

0.814

0.647

8

STB-SM

0.916

0.916

0.750

0.787

0.854

0.680

0.792

0.844

0.707

0.721

0.807

0.634

  1. Italic values indicate the highest values that top measures achieved for corresponding evaluation metrics