From: Survey of transformers and towards ensemble learning using transformers for natural language processing
Model
BERT
XLNet
GPT2
RoBERTa
ALBERT
GPU/GB
4.47
4.97
3.99
4.30
3.27