Skip to main content

Table 15 Hyperparameters optimized via training

From: Adapting transformer-based language models for heart disease detection and risk factors extraction

Hyperparameter

BERT

RoBERTa

BioBERT

BioClinicalBERT

XLNet

Hidden size

768

768

768

768

144

Number of layers

12

12

12

13

6

Number of attention heads

12

12

12

12

6

Feed-forward layer hidden size

128

128

128

128

128

Learning rate

\(1\times 10^{-6}\)

\(5\times 10^{-7}\)

\(5\times 10^{-5}\)

\(5\times 10^{-6}\)

\(5\times 10^{-6}\)

Batch size

16

16

16

16

16

Dropout

0.5

0.1

0.1

0.4

0.4