From: Survey of transformers and towards ensemble learning using transformers for natural language processing
Optimizer
Activation
Dropout ratio
Adam
Softmax
0.5