Skip to main content

Table 2 Learning rate and weight decay of each of the pre-trained Sundanese models

From: Pre-trained transformer-based language models for Sundanese

Model

Learning rate

Weight decay

Sundanese GPT-2

\(1 \times 10^{-4}\)

0.1

Sundanese BERT

\(2 \times 10^{-4}\)

0.0

Sundanese RoBERTa

\(2 \times 10^{-4}\)

0.0