Journal of Big Data

Table 14 External Metric-Purity (mostly known as “Accuracy”)-K-means performance

From: A set theory based similarity measure for text clustering and classification

Similarity measure/metric	K = 4		K = 8
Similarity measure/metric	Reuters–18308 features	Web-KB-33025 features	Reuters–18308 features	Web-KB-33025 features
Euclidean	0.6745546742946301	0.4420100023815194	0.6651930828240801	0.5363181709930936
Cosine	0.6300871148095176	0.5651345558466302	0.5161877519178261	0.5710883543700881
Jaccard	0.5418021063580809	0.4060490592998333	0.5631257313743336	0.42462491069302216
Bhattacharya	0.6602522428812898	0.550845439390331	0.6573917565986218	0.3908073350797809
kullback–Leibler	0.5103367572487323	0.39104548702071923	0.5103367572487323	0.39175994284353416
Manhattan	0.528799895982317	0.3912836389616575	0.5342608243401378	0.40128602048106693
PDSM	0.6628526849564426	0.4165277447011193	0.6329476010921856	0.40533460347701833
STB-SM	0.626706540111819	0.6110978804477256	0.6059030035105968	0.571802810192903

Italic values indicate the highest values that top measures achieved for corresponding evaluation metrics

Back to article page