From: Survey of transformers and towards ensemble learning using transformers for natural language processing
Layer
Number of neurons/dropout rate
Dense
20
Dropout
0.5
10
5