From: Survey of transformers and towards ensemble learning using transformers for natural language processing
Optimizer
Learning rate
Batch size
Epochs
Sentiment analysis
Adam
1e−5
32
5
Question answering
Name entity recognition
64
3
ext summarization