A set theory based similarity measure for text clustering and classification

Journal of Big Data

Table 8 Performance evaluation of all measures when, NF = 200–the averaged results (K = 1–120; +2)

No	Dataset	Reuters						Web-KB
No	Similarity/criterion	ACC	PRE	REC	FM	GM	AMP	ACC	PRE	REC	FM	GM	AMP
1	Euclidean	0.814	0.741	0.510	0.573	0.697	0.451	0.636	0.774	0.525	0.543	0.669	0.464
2	Cosine	0.894	0.739	0.635	0.661	0.788	0.558	0.757	0.792	0.689	0.711	0.791	0.603
3	Jaccard	0.846	0.668	0.507	0.548	0.701	0.432	0.736	0.843	0.625	0.635	0.748	0.563
4	Bhattacharya	0.863	0.631	0.590	0.589	0.759	0.450	0.515	0.688	0.510	0.404	0.654	0.397
5	kullback–Leibler	0.530	0.612	0.161	0.149	0.375	0.155	0.391	0.350	0.254	0.152	0.437	0.253
6	Manhattan	0.759	0.712	0.427	0.485	0.630	0.379	0.556	0.815	0.428	0.425	0.591	0.394
7	PDSM	0.905	0.724	0.655	0.663	0.803	0.558	0.789	0.845	0.707	0.732	0.806	0.638
8	STB-SM	0.914	0.767	0.669	0.685	0.811	0.590	0.791	0.830	0.721	0.741	0.815	0.644

Italic values indicate the highest values that top measures achieved for corresponding evaluation metrics