From: Pre-trained transformer-based language models for Sundanese
Model
Learning rate
Weight decay
Sundanese GPT-2
\(1 \times 10^{-4}\)
0.1
Sundanese BERT
\(2 \times 10^{-4}\)
0.0
Sundanese RoBERTa