A set theory based similarity measure for text clustering and classification

Journal of Big Data

Table 5 Performance evaluation of all measures when, NF = 10–the averaged results (K = 1–120; + 2)

No	Dataset	Reuters-8						Web-KB
No	Similarity/criterion	ACC	PRE	REC	FM	GM	AMP	ACC	PRE	REC	FM	GM	AMP
1	Euclidean	0.713	0.317	0.293	0.286	0.527	0.217	0.605	0.607	0.515	0.524	0.661	0.429
2	Cosine	0.694	0.328	0.311	0.281	0.542	0.218	0.621	0.610	0.548	0.562	0.687	0.451
3	Jaccard	0.689	0.299	0.258	0.251	0.492	0.202	0.544	0.617	0.438	0.433	0.560	0.371
4	Bhattacharya	0.654	0.173	0.204	0.180	0.435	0.174	0.458	0.545	0.435	0.381	0.595	0.373
5	kullback–Leibler	0.689	0.383	0.329	0.292	0.557	0.228	0.613	0.625	0.525	0.526	0.670	0.436
6	Manhattan	0.648	0.327	0.284	0.273	0.516	0.205	0.605	0.623	0.515	0.524	0.661	0.432
7	PDSM	0.651	0.339	0.301	0.267	0.533	0.216	0.626	0.655	0.533	0.539	0.676	0.448
8	STB-SM	0.699	0.334	0.333	0.303	0.562	0.234	0.609	0.590	0.539	0.544	0.679	0.436

Italic values indicate the highest values that top measures achieved for corresponding evaluation metrics