A set theory based similarity measure for text clustering and classification

Journal of Big Data

Table 7 Performance evaluation of all measures when, NF = 100–the averaged results (K = 1–120; +2)

No	Dataset	Reuters						Web-KB
No	Similarity/criterion	ACC	PRE	REC	FM	GM	AMP	ACC	PRE	REC	FM	GM	AMP
1	Euclidean	0.834	0.688	0.543	0.588	0.723	0.456	0.643	0.759	0.541	0.564	0.681	0.474
2	Cosine	0.875	0.688	0.602	0.624	0.767	0.502	0.737	0.769	0.666	0.689	0.775	0.578
3	Jaccard	0.819	0.613	0.450	0.486	0.657	0.373	0.702	0.837	0.584	0.592	0.717	0.526
4	Bhattacharya	0.832	0.639	0.521	0.530	0.710	0.440	0.500	0.643	0.499	0.390	0.644	0.389
5	kullback–Leibler	0.555	0.646	0.211	0.231	0.432	0.197	0.396	0.393	0.260	0.167	0.443	0.256
6	Manhattan	0.830	0.685	0.553	0.594	0.729	0.463	0.594	0.796	0.475	0.490	0.629	0.432
7	PDSM	0.892	0.665	0.631	0.632	0.787	0.515	0.768	0.827	0.676	0.696	0.784	0.606
8	STB-SM	0.901	0.700	0.645	0.658	0.796	0.547	0.777	0.819	0.699	0.715	0.800	0.620

Italic values indicate the highest values that top measures achieved for corresponding evaluation metrics