Journal of Big Data

Table 2 Learning rate and weight decay of each of the pre-trained Sundanese models

From: Pre-trained transformer-based language models for Sundanese

Model	Learning rate	Weight decay
Sundanese GPT-2	\(1 \times 10^{-4}\)	0.1
Sundanese BERT	\(2 \times 10^{-4}\)	0.0
Sundanese RoBERTa	\(2 \times 10^{-4}\)	0.0

Back to article page