Skip to main content

Table 2 Results of experiments for different methods, representations and data sets with categorical one-hot vectors

From: Evaluation of different machine learning approaches and input text representations for multilingual classification of tweets for disease surveillance in the social web

Methods Measure Datasets Aggregated Scores
English (Validation dataset) French German Spanish Arabic Japanese Un-weighted Average Performance Performance Variance Overall Performance
Unigram BOW (Baseline) Precision 0.84 0.58 0.56 0.81 0.42 0.48 0.57 0.02 0.70
Recall 0.81 0.74 0.8 0.81 0.72 0.29 0.67 0.05 0.74
F1 Score 0.87 0.65 0.66 0.81 0.53 0.36 0.6 0.03 0.72
Unigram BOW (Stemmed) Precision 0.80 0.51 0.54 0.75 0.42 0.43 0.53 1.30E−02 0.68
Recall 0.76 0.77 0.82 0.86 0.79 0.34 0.72 3.60E−02 0.78
F1 Score 0.85 0.61 0.65 0.8 0.55 0.38 0.6 1.80E−02 0.73
Unigram BOW CNF Precision 0.73 0.53 0.5 0.76 0.44 0.39 0.52 0.02 0.67
Recall 0.74 0.79 0.68 0.87 0.73 0.59 0.73 0.01 0.83
F1 Score 0.72 0.63 0.58 0.81 0.56 0.47 0.61 0.02 0.74
Uni/Big rams + CNF Precision 0.86 0.55 0.5 0.79 0.5 0.41 0.55 1.60E02 0.69
Recall 0.82 0.77 0.67 0.79 0.69 0.51 0.68 9.50E03 0.80
F1 Score 0.89 0.64 0.57 0.79 0.58 0.46 0.61 1.10E02 0.74
Unigram Snomed Precision 0.81 0.59 0.55 0.78 0.47 0.44 0.57 1.40E−02 0.71
Recall 0.79 0.78 0.77 0.82 0.6 0.4 0.67 2.40E−02 0.77
F1 Score 0.84 0.67 0.64 0.8 0.53 0.42 0.61 1.60E−02 0.74
Bigram Snomed Precision 0.82 0.58 0.56 0.79 0.53 0.47 0.59 1.20E02 0.72
Recall 0.89 0.73 0.74 0.83 0.6 0.41 0.66 2.00E02 0.77
F1 Score 0.86 0.65 0.64 0.81 0.56 0.44 0.62 1.40E02 0.75