From: A set theory based similarity measure for text clustering and classification
Similarity measure/metric | K = 4 | K = 8 | ||
---|---|---|---|---|
Reuters–18308 features | Web-KB-33025 features | Reuters–18308 features | Web-KB-33025 features | |
Euclidean | 0.6745546742946301 | 0.4420100023815194 | 0.6651930828240801 | 0.5363181709930936 |
Cosine | 0.6300871148095176 | 0.5651345558466302 | 0.5161877519178261 | 0.5710883543700881 |
Jaccard | 0.5418021063580809 | 0.4060490592998333 | 0.5631257313743336 | 0.42462491069302216 |
Bhattacharya | 0.6602522428812898 | 0.550845439390331 | 0.6573917565986218 | 0.3908073350797809 |
kullback–Leibler | 0.5103367572487323 | 0.39104548702071923 | 0.5103367572487323 | 0.39175994284353416 |
Manhattan | 0.528799895982317 | 0.3912836389616575 | 0.5342608243401378 | 0.40128602048106693 |
PDSM | 0.6628526849564426 | 0.4165277447011193 | 0.6329476010921856 | 0.40533460347701833 |
STB-SM | 0.626706540111819 | 0.6110978804477256 | 0.6059030035105968 | 0.571802810192903 |