Skip to main content

Table 6 Performance evaluation of all measures when, NF = 50–the averaged results (K = 1–120; + 2)

From: A set theory based similarity measure for text clustering and classification

No

Dataset

Reuters-8

Web-KB

Similarity/criterion

ACC

PRE

REC

FM

GM

AMP

ACC

PRE

REC

FM

GM

AMP

1

Euclidean

0.827

0.641

0.520

0.554

0.708

0.430

0.662

0.740

0.567

0.591

0.701

0.490

2

Cosine

0.847

0.656

0.565

0.592

0.741

0.467

0.719

0.735

0.650

0.671

0.763

0.554

3

Jaccard

0.790

0.580

0.417

0.443

0.631

0.343

0.666

0.808

0.546

0.557

0.687

0.487

4

Bhattacharya

0.803

0.606

0.472

0.468

0.674

0.390

0.492

0.665

0.491

0.382

0.639

0.384

5

kullback–Leibler

0.628

0.617

0.288

0.327

0.510

0.245

0.426

0.645

0.296

0.233

0.476

0.279

6

Manhattan

0.833

0.657

0.551

0.582

0.730

0.455

0.642

0.789

0.538

0.566

0.678

0.479

7

PDSM

0.857

0.614

0.571

0.561

0.746

0.443

0.757

0.825

0.670

0.697

0.779

0.598

8

STB-SM

0.863

0.640

0.588

0.596

0.757

0.473

0.766

0.792

0.693

0.711

0.795

0.606

  1. Italic values indicate the highest values that top measures achieved for corresponding evaluation metrics