Skip to main content

Table 7 Performance evaluation of all measures when, NF = 100–the averaged results (K = 1–120; +2)

From: A set theory based similarity measure for text clustering and classification

No

Dataset

Reuters

Web-KB

Similarity/criterion

ACC

PRE

REC

FM

GM

AMP

ACC

PRE

REC

FM

GM

AMP

1

Euclidean

0.834

0.688

0.543

0.588

0.723

0.456

0.643

0.759

0.541

0.564

0.681

0.474

2

Cosine

0.875

0.688

0.602

0.624

0.767

0.502

0.737

0.769

0.666

0.689

0.775

0.578

3

Jaccard

0.819

0.613

0.450

0.486

0.657

0.373

0.702

0.837

0.584

0.592

0.717

0.526

4

Bhattacharya

0.832

0.639

0.521

0.530

0.710

0.440

0.500

0.643

0.499

0.390

0.644

0.389

5

kullback–Leibler

0.555

0.646

0.211

0.231

0.432

0.197

0.396

0.393

0.260

0.167

0.443

0.256

6

Manhattan

0.830

0.685

0.553

0.594

0.729

0.463

0.594

0.796

0.475

0.490

0.629

0.432

7

PDSM

0.892

0.665

0.631

0.632

0.787

0.515

0.768

0.827

0.676

0.696

0.784

0.606

8

STB-SM

0.901

0.700

0.645

0.658

0.796

0.547

0.777

0.819

0.699

0.715

0.800

0.620

  1. Italic values indicate the highest values that top measures achieved for corresponding evaluation metrics