Skip to main content

Table 9 Performance evaluation of all measures when, NF = 350–the averaged results (K = 1–120; + 2)

From: A set theory based similarity measure for text clustering and classification

No

Dataset

Reuters

Web-KB

Similarity/criterion

ACC

PRE

REC

FM

GM

AMP

ACC

PRE

REC

FM

GM

AMP

1

Euclidean

0.778

0.771

0.472

0.544

0.666

0.427

0.632

0.775

0.516

0.529

0.663

0.455

2

Cosine

0.902

0.773

0.660

0.694

0.804

0.590

0.766

0.800

0.698

0.719

0.798

0.614

3

Jaccard

0.861

0.698

0.533

0.573

0.719

0.459

0.764

0.855

0.658

0.670

0.772

0.595

4

Bhattacharya

0.876

0.719

0.618

0.613

0.777

0.525

0.517

0.684

0.510

0.403

0.654

0.395

5

kullback–Leibler

0.520

0.551

0.147

0.125

0.358

0.143

0.390

0.368

0.252

0.147

0.435

0.252

6

Manhattan

0.669

0.731

0.320

0.365

0.535

0.291

0.543

0.826

0.411

0.400

0.577

0.380

7

PDSM

0.912

0.770

0.673

0.689

0.813

0.582

0.801

0.853093

0.721095

0.745

0.816101

0.652

8

STB-SM

0.922

0.793

0.694

0.714

0.827

0.619

0.801

0.840

0.731

0.750

0.823

0.655

  1. Italic values indicate the highest values that top measures achieved for corresponding evaluation metrics