Adapting transformer-based language models for heart disease detection and risk factors extraction

Journal of Big Data

Table 4 The weighted-averaged evaluation results of fine-tuned transformer-based models and the most recent models and systems from 2014 i2b2 shared task

Model	Precision	Recall	F1-score	Micro F1-score (Accuracy
BERT	0.9251	0.9373	0.9284	0.9373
RoBERTa	0.9390	0.9427	0.9394	0.9427
BioBERT	0.9337	0.9399	0.9357	0.9399
BioClinicalBERT	0.9338	0.9403	0.9357	0.9403
XLNet	0.9361	0.9397	0.9371	0.9397
Roberts et al. [37]	0.9625	0.8951	0.9276	0.9276
Chen et al. [41]	0.9436	0.9106	0.9268	0.9268
Cormack et al. [82]	0.9375	0.8975	0.9171	0.9171
Yang and Garibaldi [81]	0.9488	0.8847	0.9156	0.9156
Khalifa and Meystre [83]	0.8951	0.8552	0.8747	0.8747
Chokkwijitkul et al. [10]	0.9180	0.8983	0.9081	0.9081