Skip to main content

Table 4 The weighted-averaged evaluation results of fine-tuned transformer-based models and the most recent models and systems from 2014 i2b2 shared task

From: Adapting transformer-based language models for heart disease detection and risk factors extraction

Model

Precision

Recall

F1-score

Micro F1-score (Accuracy

BERT

0.9251

0.9373

0.9284

0.9373

RoBERTa

0.9390

0.9427

0.9394

0.9427

BioBERT

0.9337

0.9399

0.9357

0.9399

BioClinicalBERT

0.9338

0.9403

0.9357

0.9403

XLNet

0.9361

0.9397

0.9371

0.9397

Roberts et al. [37]

0.9625

0.8951

0.9276

0.9276

Chen et al. [41]

0.9436

0.9106

0.9268

0.9268

Cormack et al. [82]

0.9375

0.8975

0.9171

0.9171

Yang and Garibaldi [81]

0.9488

0.8847

0.9156

0.9156

Khalifa and Meystre [83]

0.8951

0.8552

0.8747

0.8747

Chokkwijitkul et al. [10]

0.9180

0.8983

0.9081

0.9081

  1. Bold indicates the best value