Model | Learning rate | Weight decay | Batch size |
---|
Sundanese GPT-2 | \(1 \times 10^{-5}\) | 0.01 | 16 |
Sundanese BERT | \(4 \times 10^{-5}\) | 0.01 | 8 |
Sundanese RoBERTa | \(2 \times 10^{-5}\) | 0.01 | 16 |
IndoBERT [31] | \(2 \times 10^{-5}\) | 0.0 | 16 |
mBERT [6] | \(2 \times 10^{-5}\) | 0.0 | 16 |
XLM-RoBERTa [20] | \(2 \times 10^{-5}\) | 0.0 | 16 |