From: A set theory based similarity measure for text clustering and classification
Similarity measure/metric | K = 4 | K = 8 | ||
---|---|---|---|---|
Reuters–18308 features | Web-KB-33025 features | Reuters–18308 features | Web-KB-33025 features | |
Euclidean | 4.186722232186011 | 4.485270517901569 | 4.138785792328177 | 4.414646698351782 |
Cosine | 4.974207600701769 | 5.184748331594617 | 5.120474137629764 | 6.511148800037687 |
Jaccard | 13.633300152033158 | 15.865256973163703 | 14.42610495247111 | 15.858516305005779 |
Bhattacharya | 4.597427791367133 | 4.893369736545789 | 3.280451019807807 | 5.177469928065736 |
kullback–Leibler | 1.872529736499688 | 2.8045629002126167 | 2.266827595447801 | 2.1725394033652123 |
Manhattan | 2.627533355357464 | 2.9998709693117767 | 2.1237224749990613 | 1.6587667141935847 |
PDSM | 3.7833377396816443 | 7.519681593999572 | 5.179788734900043 | 5.386991923494522 |
STB-SM | 5.764560759902967 | 5.85720075727297 | 5.391258478217752 | 5.654144650340377 |