Skip to main content

Table 13 GPU usage of different models

From: Survey of transformers and towards ensemble learning using transformers for natural language processing

Model

BERT

XLNet

GPT2

RoBERTa

ALBERT

GPU/GB

4.47

4.97

3.99

4.30

3.27