Evaluation of different machine learning approaches and input text representations for multilingual classification of tweets for disease surveillance in the social web

Journal of Big Data

Table 3 Results of experiments for different methods, representations and data sets with continuous vector encodings

Methods	Measure	Datasets						Aggregated Scores
Methods	Measure	English (Validation dataset)	French	German	Spanish	Arabic	Japanese	un-weighted average performance	Performance variance	Overall performance
CNN + word2vec	Precision	0.76	0.55	0.53	0.78	0.4	0.48	0.55	1.60E−02	0.69
	Recall	0.73	0.84	0.77	0.9	0.6	0.51	0.72	2.10E−02	0.81
	F1 Score	0.78	0.67	0.63	0.84	0.51	0.5	0.63	1.50E−02	0.75
CNF LSTM stack	Precision	0.73	0.55	0.51	0.76	0.49	0.33	0.53	0.02	0.67
	Recall	0.71	0.76	0.62	0.87	0.67	0.46	0.68	0.02	0.77
	F1 Score	0.89	0.64	0.56	0.81	0.56	0.38	0.59	0.02	0.71
CNN CNF	Precision	0.72	0.48	0.46	0.73	0.36	0.29	0.46	0.03	0.61
	Recall	0.57	0.86	0.87	0.94	0.76	0.7	0.83	0.01	0.89
	F1 Score	0.61	0.61	0.6	0.82	0.49	0.41	0.59	0.02	0.71
Bi LSTM CNF	Precision	0.81	0.57	0.54	0.76	0.5	0.33	0.54	0.02	0.68
	Recall	0.74	0.72	0.67	0.77	0.63	0.51	0.66	0.01	0.78
	F1 Score	0.89	0.64	0.6	0.76	0.56	0.4	0.59	0.02	0.72
CNN-LSTM CNF	Precision	0.75	0.55	0.47	0.79	0.37	0.37	0.51	0.03	0.64
	Recall	0.75	0.83	0.65	0.8	0.61	0.61	0.70	0.01	0.81
	F1 Score	0.75	0.67	0.55	0.79	0.46	0.46	0.59	0.02	0.72
m-BERT-uncased- CNN	Precision	0.65	0.34	0.34	0.66	0.37	0.29	0.44	0.02	0.59
	Recall	1.00	1.00	1.00	1.00	1.00	1.00	1.00	0.00	1.00
	F1 Score	0.79	0.51	0.51	0.79	0.54	0.45	0.60	0.02	0.73