A set theory based similarity measure for text clustering and classification

Journal of Big Data

Table 13 Performance evaluation of all measures when taken the average of averaged results,—average results (K = 1–120; +2)

No	Dataset	Reuters						Web-KB
No	Similarity/criterion	ACC	PRE	REC	FM	GM	AMP	ACC	PRE	REC	FM	GM	AMP
1	Euclidean	0.732	0.668	0.406	0.453	0.610	0.353	0.606	0.746	0.496	0.505	0.645	0.435
2	Cosine	0.864	0.737	0.615	0.645	0.769	0.536	0.738	0.767	0.669	0.688	0.776	0.581
3	Jaccard	0.790	0.661	0.478	0.515	0.675	0.411	0.721	0.817	0.613	0.623	0.737	0.550
4	Bhattacharya	0.837	0.670	0.557	0.558	0.726	0.475	0.510	0.644	0.503	0.408	0.648	0.395
5	kullback–Leibler	0.555	0.411	0.190	0.174	0.405	0.195	0.370	0.338	0.292	0.193	0.470	0.278
6	Manhattan	0.666	0.537	0.328	0.343	0.533	0.284	0.531	0.744	0.409	0.388	0.572	0.374
7	PDSM	0.869	0.725	0.619	0.632	0.772	0.527	0.768	0.821	0.681	0.700	0.787	0.609
8	STB-SM	0.880	0.747	0.646	0.666	0.790	0.564	0.764	0.780	0.687	0.702	0.791	0.607

Italic values indicate the highest values that top measures achieved for corresponding evaluation metrics