A set theory based similarity measure for text clustering and classification

Journal of Big Data

Table 9 Performance evaluation of all measures when, NF = 350–the averaged results (K = 1–120; + 2)

No	Dataset	Reuters						Web-KB
No	Similarity/criterion	ACC	PRE	REC	FM	GM	AMP	ACC	PRE	REC	FM	GM	AMP
1	Euclidean	0.778	0.771	0.472	0.544	0.666	0.427	0.632	0.775	0.516	0.529	0.663	0.455
2	Cosine	0.902	0.773	0.660	0.694	0.804	0.590	0.766	0.800	0.698	0.719	0.798	0.614
3	Jaccard	0.861	0.698	0.533	0.573	0.719	0.459	0.764	0.855	0.658	0.670	0.772	0.595
4	Bhattacharya	0.876	0.719	0.618	0.613	0.777	0.525	0.517	0.684	0.510	0.403	0.654	0.395
5	kullback–Leibler	0.520	0.551	0.147	0.125	0.358	0.143	0.390	0.368	0.252	0.147	0.435	0.252
6	Manhattan	0.669	0.731	0.320	0.365	0.535	0.291	0.543	0.826	0.411	0.400	0.577	0.380
7	PDSM	0.912	0.770	0.673	0.689	0.813	0.582	0.801	0.853093	0.721095	0.745	0.816101	0.652
8	STB-SM	0.922	0.793	0.694	0.714	0.827	0.619	0.801	0.840	0.731	0.750	0.823	0.655

Italic values indicate the highest values that top measures achieved for corresponding evaluation metrics