Skip to main content

Table 15 The detailed hyperparameters

From: Survey of transformers and towards ensemble learning using transformers for natural language processing

Optimizer

Activation

Dropout ratio

Adam

Softmax

0.5