Skip to main content

Table 8 Performance evaluation of all measures when, NF = 200–the averaged results (K = 1–120; +2)

From: A set theory based similarity measure for text clustering and classification

No

Dataset

Reuters

Web-KB

Similarity/criterion

ACC

PRE

REC

FM

GM

AMP

ACC

PRE

REC

FM

GM

AMP

1

Euclidean

0.814

0.741

0.510

0.573

0.697

0.451

0.636

0.774

0.525

0.543

0.669

0.464

2

Cosine

0.894

0.739

0.635

0.661

0.788

0.558

0.757

0.792

0.689

0.711

0.791

0.603

3

Jaccard

0.846

0.668

0.507

0.548

0.701

0.432

0.736

0.843

0.625

0.635

0.748

0.563

4

Bhattacharya

0.863

0.631

0.590

0.589

0.759

0.450

0.515

0.688

0.510

0.404

0.654

0.397

5

kullback–Leibler

0.530

0.612

0.161

0.149

0.375

0.155

0.391

0.350

0.254

0.152

0.437

0.253

6

Manhattan

0.759

0.712

0.427

0.485

0.630

0.379

0.556

0.815

0.428

0.425

0.591

0.394

7

PDSM

0.905

0.724

0.655

0.663

0.803

0.558

0.789

0.845

0.707

0.732

0.806

0.638

8

STB-SM

0.914

0.767

0.669

0.685

0.811

0.590

0.791

0.830

0.721

0.741

0.815

0.644

  1. Italic values indicate the highest values that top measures achieved for corresponding evaluation metrics