Skip to main content

Table 4 Hyperparameters used to fine-tune pre-trained models

From: Pre-trained transformer-based language models for Sundanese

Model

Learning rate

Weight decay

Batch size

Sundanese GPT-2

\(1 \times 10^{-5}\)

0.01

16

Sundanese BERT

\(4 \times 10^{-5}\)

0.01

8

Sundanese RoBERTa

\(2 \times 10^{-5}\)

0.01

16

IndoBERT [31]

\(2 \times 10^{-5}\)

0.0

16

mBERT [6]

\(2 \times 10^{-5}\)

0.0

16

XLM-RoBERTa [20]

\(2 \times 10^{-5}\)

0.0

16