Skip to main content

Table 2 Results of experiments for different methods, representations and data sets with categorical one-hot vectors

From: Evaluation of different machine learning approaches and input text representations for multilingual classification of tweets for disease surveillance in the social web

Methods

Measure

Datasets

Aggregated Scores

English (Validation dataset)

French

German

Spanish

Arabic

Japanese

Un-weighted Average Performance

Performance Variance

Overall Performance

Unigram BOW (Baseline)

Precision

0.84

0.58

0.56

0.81

0.42

0.48

0.57

0.02

0.70

Recall

0.81

0.74

0.8

0.81

0.72

0.29

0.67

0.05

0.74

F1 Score

0.87

0.65

0.66

0.81

0.53

0.36

0.6

0.03

0.72

Unigram BOW (Stemmed)

Precision

0.80

0.51

0.54

0.75

0.42

0.43

0.53

1.30E−02

0.68

Recall

0.76

0.77

0.82

0.86

0.79

0.34

0.72

3.60E−02

0.78

F1 Score

0.85

0.61

0.65

0.8

0.55

0.38

0.6

1.80E−02

0.73

Unigram BOW CNF

Precision

0.73

0.53

0.5

0.76

0.44

0.39

0.52

0.02

0.67

Recall

0.74

0.79

0.68

0.87

0.73

0.59

0.73

0.01

0.83

F1 Score

0.72

0.63

0.58

0.81

0.56

0.47

0.61

0.02

0.74

Uni/Big rams + CNF

Precision

0.86

0.55

0.5

0.79

0.5

0.41

0.55

1.60E−02

0.69

Recall

0.82

0.77

0.67

0.79

0.69

0.51

0.68

9.50E−03

0.80

F1 Score

0.89

0.64

0.57

0.79

0.58

0.46

0.61

1.10E−02

0.74

Unigram Snomed

Precision

0.81

0.59

0.55

0.78

0.47

0.44

0.57

1.40E−02

0.71

Recall

0.79

0.78

0.77

0.82

0.6

0.4

0.67

2.40E−02

0.77

F1 Score

0.84

0.67

0.64

0.8

0.53

0.42

0.61

1.60E−02

0.74

Bigram Snomed

Precision

0.82

0.58

0.56

0.79

0.53

0.47

0.59

1.20E−02

0.72

Recall

0.89

0.73

0.74

0.83

0.6

0.41

0.66

2.00E−02

0.77

F1 Score

0.86

0.65

0.64

0.81

0.56

0.44

0.62

1.40E−02

0.75